Nucleic Acid Structures, Energetics, and Dynamics - The Journal of

Aug 1, 1996 - In 1953 DNA was shown to be a double helix of hydrogen-bonded complementary bases. Since then, knowledge of nucleic acid structures, the...
1 downloads 11 Views 553KB Size
J. Phys. Chem. 1996, 100, 13311-13322

13311

Nucleic Acid Structures, Energetics, and Dynamics Ignacio Tinoco, Jr. Department of Chemistry, UniVersity of California at Berkeley, and Structural Biology DiVision, Lawrence Berkeley National Laboratory, Berkeley, California 94720-1460 ReceiVed: October 16, 1995; In Final Form: January 2, 1996X

In 1953 DNA was shown to be a double helix of hydrogen-bonded complementary bases. Since then, knowledge of nucleic acid structures, thermodynamic stabilities, and dynamics of conformational changes has grown exponentially. This knowledge has led to the development of the biotechnology industry, the identification of plants and animals from a few cells, and many advances in the diagnosis and treatment of diseases. All the methods of physical chemistry have been used to characterize the primary structures (sequences), the secondary structures (base pairing), and the tertiary structures (folded 3D conformations) of the nucleic acids. The interactions of nucleic acids with themselves, with proteins, and with small-molecule ligands control their many functions. Novel methods are being developed to probe the structures and functions of nucleic acids. These include methods to study a single molecule and methods to select, amplify, and characterize one sequence among 1017 different sequences.

Introduction The April 25, 1953, issue of Nature presented three short reports on the structure of nucleic acids. The first, by Watson and Crick,1 proposed that DNA was a double helix of antiparallel strands held together by hydrogen bonds between complementary bases: guanine and cytosine or adenine and thymine. Their model followed from the fact that this was the only way that pairs of hydrogen-bonded bases in their correct keto tautomers could fit into a continuous, regular helix. The G‚C and A‚T base pairs each have a twofold axis in the plane of the base pair; rotation by 180° around this axis interchanges their attached sugars and phosphates. Thus a G‚C pair is interchangeable with a C‚G, and A‚T and T‚A are interchangeable. As the distances between the C1′ carbons where the bases attach to the sugars are the same for G‚C and A‚T, any sequence of bases can form a uniform double helix of constant pitch and radius. The other two reports2 gave detailed X-ray diffraction data on B-form DNA fibers which established the 10 base pairs per turn of 34 Å length with the phosphate groups on the outside of a double helix of 20 Å diameter. The report by Rosalind Franklin showed the best diffraction data and also described the crystalline A-form DNA fiber seen at low humidity. Unfortunately, she died before the Nobel Prize for this work was awarded. The Nobel committee would have had a very difficult task to choose who shared the award with Watson and Crick. By rule only three people can enjoy one award; they were Crick, Watson, and Wilkins. The discovery of the DNA double helix was accomplished by a biologist (Watson), two physicists (Crick and Wilkins), and a physical chemist (Franklin). This combination of knowledge seems ideal for solving nearly any problem in science. As present and future students of physical chemistry, we should all learn some biology and some physics in order to solve the important problems of the next 100 years. The goal of scientists studying structures, energetics, and dynamics of nucleic acids should be to understand, predict, and control their functions. The goal has been amply fulfilled following the 1953 publications on the structure of DNA. The complementary Watson-Crick3 base pairs immediately exX

Abstract published in AdVance ACS Abstracts, July 1, 1996.

S0022-3654(95)03053-X CCC: $12.00

plained how genes duplicated. The biotechnology industry, the identification and characterization of plants or animals from a few cells, advances in diagnosis and treatment of disease, and many other applications have all followed. Is the fun over now? Have all the important discoveries been made? When I arrived in Berkeley in 1956 and mentioned to a biologist that I was planning to study DNA structure, he responded, “Hasn’t that already been done?” I do not remember the answer I gave then, but the answer I would give now is that every important discovery reveals new questions. The fun is just beginning. Some of the progress made since 1953 in determining the structures of nucleic acidssand understanding their functionsswill be described here. Useful knowledge of a structure means knowing where the atoms are, how much energy it takes to move them, and how fast they can move. Thus, structures, thermodynamics, and kinetics of nucleic acids will be considered. Structures Primary Structures. Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are polynucleotides. Each nucleotide contains a sugar (D-2′-deoxyribose for DNA and D-ribose for RNA), a phosphate, and a base (A, C, G, T for DNA and A, C, G, U for RNA). The primary structures of a DNA oligonucleotide, an RNA oligonucleotide, and the five bases are shown in Figure 1; detailed structural information is available in textbooks.4 All the biological, chemical, and physical properties of a nucleic acid depend on its primary structuresits sequence of nucleotides. Clearly, it is important to be able to determine the sequence of any nucleic acid and then to deduce all its properties from the sequence. We will start with the easier problem, the determination of sequence. Determination of Sequence. The Human Genome Project has as its goal the sequence determination of a complete set of human chromosomes, about 3 billion base pairs. The first complete DNA sequence for a free living organism has been published:5 Haemophilus influenza (1 830 137 base pairs). A complete yeast genome of 14 million base pairs is expected in 1996. All the sequence information has come from a method invented by Fred Sanger; he shared the chemistry Nobel Prize in 1980 with Walter Gilbert for nucleic acid sequencing. It was Sanger’s second Nobel; his first was for protein sequencing. © 1996 American Chemical Society

13312 J. Phys. Chem., Vol. 100, No. 31, 1996

Tinoco

Figure 1. Structures of the four naturally occurring bases in DNA (A, C, G, T) and RNA (A, C, G, U) are given along with a portion of a poly(deoxyribonucleotide) chain and of a poly(ribonucleotide) chain. The sequence of a chain is read from the 5′-end to the 3′-end.

In Sanger’s method6 DNA polymerase is used to synthesize a DNA strand complementary to the strand being sequencedsthe template strand. A primer oligonucleotide is used to select the site where the polymerase starts synthesizing the complementary strand. Four deoxynucleoside triphosphates (deoxy-NTPs) are the monomer units polymerized by the enzyme. Normally the enzyme will continue synthesis until it runs out of deoxy-NTPs or of template strands. However, if a few percent of a dideoxyNTP is added to the reaction mixture, the dideoxy-NTP acts as a strand termination monomer. For example, if dideoxy-GTP is added, all strands will end in G. Most of the time a natural G will be incorporated opposite C on the template strand, and the new strand will continue. However, eventually a dideoxy-G is added, and the strand terminates. The position of G’s relative to the starting position for synthesis is read from the chain lengths of the synthesized strands. The experiment is repeated with each of the four bases acting as the chain termination monomer. Gilbert’s method7 also uses chain length to obtain the sequence, but base-specific chemical cleavage produces the ends. A DNA or RNA strand is labeled at the 5′-end (see Figure 1) and then specifically, but lightly, cleaved at G’s, for example. The chain length of each labeled strand after cleavage provides the sequence of G’s. Determination of sequence has thus been changed to determination of chain length of single strands with a resolution of one nucleotide. The standard method is electrophoresis in a denaturing poly(acrylamide) gel to separate chains on the basis of charge. At neutral pH there is one negatively charged phosphate group per nucleotide, so strands can be separated according to chain length. Gel electrophoresis is also used to separate large fragments of double-stranded DNAsthousands to millions of base pairssfor coarse-grained characterization: so-called mapping. The speed and resolution of separation depend on the pore size of the gel (controlled by gel concentration and cross-linking), the space and time distribution of the electric field,8 the solvent, and so forth. To optimize conditions, it is important to understand how chain macromolecules travel during gel electrophoresis. Chains are thought to migrate through gels like a snake through a warren of interconnected rabbit holes. This reptation model9,10 approximately represents the motion of the chain molecule, but it is necessary to watch

a DNA molecule move through a gel to appreciate the complexity of the real behavior. Individual DNA molecules stained with fluorophores can be followed in a fluorescence microscope11 and displayed in a video.12 The DNA acts like a successful kickoff returner in football. It moves fast in spurts, is transiently slowed or stopped, but suddenly changes direction, and continues. This is its motion in a constant electric field; the erratic motion is caused by its choices at different points in the gel warren. For optimum separation of different DNA size ranges the electric field is pulsed so as to make the DNA move forward, backward, or sideways with rest periods in between (with the electric field off).13 It is obviously very difficult to model the motion with great fidelity. One model uses charged beads connected by entropic springs. The beads are acted on by the electric field, viscous drag, Brownian diffusion, and interbead elasticity.14 Reasonable agreement between simulation and experiment is obtained, but there is clearly much room for improvement. In the future, faster separations with higher resolution on smaller amounts of material are urgently required. One human chromosome with about 100 million base pairs is a worthy target for sequencing. Of course, methods of chain length measurement other than gel electrophoresis, such as mass spectroscopy, are possible. Silicon mazes can replace poly(acrylamide) gels as electrophoresis materials. Finally, completely different sequencing techniques can be used. Among the many methods elicited by the Human Genome Project, sequence determination by hybridization stands out. The logic of the method is that an N-mer sequence is a set of overlapping shorter sequences. From knowledge of these shorter sequences the N-mer sequence can be deduced.15 How long the shorter sequences need to be to obtain a unique sequence for the N-mer depends on the sequence. Long repeating sequences are the most difficult. For example, in a 100-mer there are two overlapping 99-mers, three overlapping 98-mers, and (101 - n) overlapping n-mers. Thus, a sequence of 100 nucleotides will contain 93 overlapping 8-mers, not necessarily all different. The 8-mers that occur in the target sequence, and how often each occurs, can be used to deduce the target sequence or a number of possible target sequences. There are 48 ) 65 536 possible 8-mers, so only a small fraction can occur in any 100-mer. If long repeating

Nucleic Acid Structures, Energetics, and Dynamics sequences occur in the target, a longer n-mer sequence can be used to resolve ambiguities. To implement the method, a complete set of 4n oligonucleotides of length n is immobilized on a solid support, and a target strand is presented. The target strand hybridizes by Watson-Crick base pairing to each n-mer contained in its sequence. From the n-mers hybridized the sequence of the target is reconstructed.16 The 4n oligonucleotides can be synthesized photochemically on a chip using photolithographic methods; hybridization is detected by fluorescence of the fluorescently labeled target.17 Determining sequence by hybridization is very useful for testing sequences determined by the standard Sanger method. It may be most used in detecting small changes in sequence, such as singlebase mutations in some genetic diseases. The ultimate goal is to be able to quickly sequence any DNA or RNA from a single molecule without first amplifying it by the polymerase chain reaction (PCR).18 The method will be left as an exercise for the reader. Analysis of DNA Sequence. The Human Genome Project is supported by NIH and DOE to identify all human genes and thus to revolutionize the diagnosis, prevention, and treatment of disease. Less than 10% of the human DNA codes for genes; most of the rest has no known function, although some is involved in the regulation of gene expression. The DNA that codes for genes is first transcribed into RNA with a one-to-one code based on Watson-Crick base pairs. The RNA can either function directly, for example as the transfer RNAs and ribosomal RNAs involved in protein synthesis, or the RNAs messenger RNAscan be translated into protein. Here the code is three nucleotides to one amino acid. The 64 trinucleotide sequences code redundantly for 20 amino acids, except for three trinucleotides (UAA, UAG, UGA) that are stop signals, which cause protein synthesis to end at that point in the messenger RNA. The start signals for protein synthesis are a nontranslated RNA sequence followed by an AUG triplet; this triplet, which also codes for methionine, is the first triplet translated. The noncoding DNA may have long runs of repeating sequences. Some repeating trinucleotide sequences are causes (and diagnostics) for human diseases such as fragile X syndrome.19 We will emphasize here the structural and physical aspects of sequence. DNA is normally double stranded, but chromosomes have single-stranded ends necessary for complete replication of the chromosome. DNA polymerase needs a primer to start making a complement from a template DNA strand. The primer should bind to the DNA strand before the important sequences in the DNA that are replicated. This is accomplished by attaching a repeating sequence, such as a variable number of copies of the sequence T2G4, to the 5′-end of each strand of the double helix. The ends of chromosomes are called telomeres, and the enzyme that synthesizes the single strands is telomerase. The telomeres have a unique structure as revealed by NMR20 and X-ray diffraction.21 The single-strand ends fold back to form cubelike structures made of stacked G-quartets (Figure 2). They are very sensitive thermodynamically and kinetically to the nature of the counterions; potassium ions specifically stabilize the structure presumably because they fit well in the central hole of the quartet.22 This unique, nonWatson-Crick interaction of guanines in G-quartets was first seen by fiber diffraction in guanosine aggregates and in guanine polynucleotides (poly-G) in the 1960s.23 At that time biologists and biochemists thought that the finding was not biologically relevant. Their thought was more or less that in an X-ray capillary or NMR tube anything can happen, but in a living organism proteins and other biological molecules limit the structures to the correctsWatson-Cricksones. Now G-quar-

J. Phys. Chem., Vol. 100, No. 31, 1996 13313

Figure 2. Structures of the G-quartets formed by GGGGTTTTGGGG in solution20 (top) and in a crystal21 (bottom). Reproduced with permission from ref 22. Copyright 1994 Annual Reviews.

tets are not only relevant to biology, they are being studied as a possible AIDS prophylactic.24 In the next sections we will discuss the measurement and the prediction of DNA and RNA Watson-Crick and nonWatson-Crick structures. Secondary Structures, Energetics, and Dynamics. Soon after Watson’s and Crick’s paper appeared, physical chemists tried to test whether adenine and thymine, or guanine and cytosine, actually formed hydrogen-bonded coplanar pairs in solution (Figure 3). They found instead that the bases, or bases connected to sugars (nucleosides), stacked in aqueous solution. The donor and acceptor groups on the bases hydrogen bond to water, and the aromatic rings stack on top of each other. In chloroform25 or in vacuum,26 hydrogen-bonded pairs do form, but not just Watson-Crick pairs. This is not surprising because the crucial discovery of Watson and Crick was that only their pairs would fit into a uniform helix in any sequence. Without the geometrical constraint each base can hydrogen bond to every other base. This was emphasized by the fact that the first crystal structure of A‚T base pairs did not have Watson-Crick hydrogen bonding.27 In fact, there are 28 possible pairings between the neutral bases that involve at least two hydrogen bonds;4 protonation increases the number of possible pairings. Many have been found in natural nucleic acids, synthetic oligonucleotides, and crystals. Eight pairs involving A‚T, G‚C, and G‚C+ are shown in Figure 3. In order to study Watson-Crick base pairing, it is necessary to use polynucleotides or oligonucleotides. The important questions are the following: What are the thermodynamics (and statistical thermodynamics) of forming a DNA, RNA, or hybrid double helix from any sequence in any solvent? What are the kinetics of formation? These questions are far from answered. We leave for the following sections the questions about the structures of the helices and the formation of more complex structures than simple helices.

13314 J. Phys. Chem., Vol. 100, No. 31, 1996

Figure 3. Hydrogen-bonded base pairs. Only Watson-Crick pairs can form in any order to produce a continuous, uniform double helix. The other structures illustrate a few of the 28 other hydrogen-bonded pairs that can occur.

Thermodynamics of Double-Helix Formation. Double-helix formation is generally measured by absorption vs temperature curves, so-called melting curves. The absorbance in the ultraviolet region (260 nm) increases as the double helix “melts” to two single strands. This dependence of molar absorptivity on the conformation of a nucleic acid was very puzzling originally. The molar absorptivity of a DNA (absorbance per mole of nucleotide) depended on the previous history of the DNA. For example, heating and quick cooling of a DNA could cause a 40% increase in absorbance without any obvious chemical changes. Hydrolyzing the polynucleotide to mononucleotides increased the absorbance by 60% over the absorbance of the native double helix. All these changes in magnitude of absorbance occurred without significant changes in shape of the broad absorption band. New bands were not being produced, nor were significant changes in energy levels occurring. The hypochromismsthe decrease in absorbance of chromophores arranged in an ordered arrayscould be explained by electronic interaction between chromophore units in the array.28 No electron exchange in the ground state and no charge transfer in the excited state were considered, so a simple exciton model sufficed. A semiquantitative account of the hypochromism resulted. The electronic transition dipoles of the stacked chromophores interact so as to shift absorption intensity to the higher energy bands. The bands at 200 nm and lower wavelengths borrow intensity from the observed 260 nm band. The absorption intensity integrated over all bandssthe sum over oscillator strengthssdoes not change. This understanding of

Tinoco the cause of hypochromism meant that the changes in absorption of a polynucleotide (or polypeptide) could be interpreted structurally. A more sensitive optical signal for changes in conformation is given by circular dichroismsthe differential absorbance of left and right circularly polarized light. The same sort of theory applies, but now rotational strengths instead of oscillator strengths are calculated; both electric and magnetic transition dipoles are used.29 Recent reviews30,31 on absorption and circular dichroism of nucleic acids and proteins can be found in a book entitled Biochemical Spectroscopy.32 Now that we knew how to interpret the absorption, we could confidently analyze the absorbance melting curves in terms of the standard Gibbs free energy, ∆G°, standard enthalpy, ∆H°, and standard entropy, ∆S°, of the double-helix-to-single-strands (helix-coil) reactions. A systematic study of double-helix formation in RNA was begun by measurements on 19 oligonucleotides ranging in length from 6 to 14 base pairs.33 The reactions were assumed to be all-or-none equilibria; that is, only completely base-paired duplex and single strands were assumed present. The absorbance at any temperature of the strands and the helix could be obtained by extrapolation from the high- and low-temperature base lines, respectively. Thus, from the measured absorbance (and the known total concentration) the equilibrium constant for double-helix formation was obtained at each temperature. The equilibrium constant is an effective one based on concentrations, instead of activities, in the solvent chosen, 1 M NaCl (pH 7). From the equilibrium constant and its temperature dependence (van’t Hoff equation), the thermodynamic parameters were obtained. Even the earliest experiments showed that the thermodynamics depended not only on the chain length and the number of G‚C and A‚U pairs, as expected, but also on the sequence. The simplest interpretation was to assume nearest-neighbor-only contributions of the sequence. In this model the thermodynamics of double-helix formation from single strands for any sequence and chain length can be calculated from an initiation parameter for forming the first base pair plus 10 nearest-neighbor parameters for the 10 possible Watson-Crick nearest neighbors. The Watson-Crick pairing requirement reduces the 16 possible single-strand nearest neighbors to 10 possible double-strand neighbors. The nearest-neighbor thermodynamic parameters have been obtained by a least-squares fit to an overdetermined set of measured values for double helices of 4-9 base pairs in length.34 The parameters provide useful predictions of doublestrand formation for RNA in 1 M NaCl.35 In order to have a qualitative understanding of the thermodynamic stability of an RNA (or DNA) double helix relative to its single strands, it is sufficient to remember the following data.35 The standard free energy at 37 °C for forming the first base pair is about +3.4 kcal mol-1. This positive free energy is mainly loss of translational entropy in bringing the two strands together. There are 10 different nearest-neighbor values for adding a base pair. They range from -0.9 to -3.4 kcal mol-1 for ∆G°37. The average values are ∆G°37 ) -1.9 kcal mol-1, ∆H° ) -9.8 kcal mol-1, and ∆S° ) -24.9 cal mol-1 K-1. RNA is synthesized in nature as a single strand from a DNA template. The RNA then folds into a compact form that involves intramolecular double helices of Watson-Crick base pairs, plus many other hydrogen-bonded and stacked secondary structural elements. These elements are36 single strands, double helices (also called stems), hairpin loops (the strand makes a U-turn), internal loops (non-Watson-Crick base appositions within a stem), bulge loops (one or more bases are not paired on one strand, but pairing is continuous on the other), and junctions (the joining region of three or more stems). The

Nucleic Acid Structures, Energetics, and Dynamics contributions of these structural elements to the thermodynamics of the folded RNA are much more complex than the simple formation of a double helix. We can still assume that the thermodynamics of the different elements are additive, but each type of loop or junction has its own sequence and length dependence. The loops’ sequence dependences are not nearestneighbor properties, so many experiments are required to establish a sufficient database for accurate predictions. Nevertheless, progress is being made by estimating the effect of loop length from the configurational entropy loss of forming loops and taking sequence into account if data are available. Extensive discussion of RNA structures, properties, and functions is given in The RNA World and in the many references given therein.37 Nearest-neighbor thermodynamic parameters for DNA doublehelix formation from two single strands were obtained from a combination of 19 oligonucleotides and nine polynucleotides.38 Direct microcalorimetric measurements of enthalpies of doublehelix formation were measured. Clearly, for the polynucleotides a van’t Hoff method based on a single equilibrium constant is not possible. An important application of the DNA nearestneighbor data is to calculate thermodynamic stabilities for oligonucleotide probes or primers to hybridize to natural DNAs. Radioactive probes are used to identify specific sequences that are diagnostic of disease or that identify a unique person. Primers determine the beginning of complementary strand synthesis for sequencing or for PCR amplification. For these applications it is very important to know how much one basebase mismatch will destabilize the hybridization. The specificity of the hybridization is crucial for diagnostics, where one base change can mean the difference between a normal gene and a cancer gene. Some results on mismatches have been obtained,39 but a complete data set is not available. The nearest-neighbor assumption does not seem to be as useful in DNA as it is in RNA. One reason may be that the conformation of DNA is more sequence dependent than RNA is. There is evidence for this from measurements of optical properties such as absorption and circular dichroism. The circular dichroism of the polynucleotide with all A’s on one strand and all T’s on the other (polyA‚polyT) can not be used for a nearest-neighbor calculation of sequences containing two or three consecutive A’s.40,41 More direct evidence of sequencedependent conformation is that consecutive A‚T base pairs cause the DNA to bend.42 Thermodynamic analyses of transitions from single strands to double helices, or to folded single strands, require the identification of the initial and final states of the system. For nucleic acids the single-stranded states in particular are very temperature dependent and not easily defined. Any partially single-stranded region of a folded RNA, or a partially melted DNA double helix, has this complication. What is needed is a partition function to characterize the nucleic acid at each temperature and a statistical thermodynamic interpretation of the system. Theoretical equations were derived to explain the melting of long double helices, and the main features of the melting were correctly reproduced. The theory is described in Theory of Helix-Coil Transitions in Biopolymers,43 and many of the original papers are included there. Kinetics of Double-Helix Formation. The rates of formation and dissociation of a double helix are crucial for the correct functioning of nucleic acids. There is competition between rates of conformational changes of macromolecules and rates of chemical reactions in a biological cell. The kinetics of doublehelix formation were studied systematically by temperaturejump kinetics at the Max Planck Institute in Go¨ttingen,

J. Phys. Chem., Vol. 100, No. 31, 1996 13315 Germany. Many of the same oligonucleotides used for the early thermodynamic studies were used for the kinetics.44-46 For RNA oligonucleotides in the range from 6 to 14 base pairs the second-order association rate constant for forming a double helix at room temperature is of order 1 × 106 M-1 s-1. This is slower than a diffusion-limited reaction, but it is not very sensitive to sequence or chain length (as expected for diffusion-limited reactions). The activation energy is sequence dependent. Molecules with G‚C pairs have an activation energy in the range +5 to +10 kcal mol-1. For molecules with no G‚C pairs the activation energy is negative! The explanation is that a stable nucleus of two or three A‚U base pairs must be formedswith a negative free energysbefore a fast zippering of the remaining base pairs occurs.46 The dissociation rate constants and activation energies for forming single strands from the double helix are very sequence and length dependent as expected. The dissociation activation energies are closely related to the equilibrium enthalpies of dissociation. The double-helix to single-strand reactions for oligonucleotides have relaxation times in the range of milliseconds as measured by conventional temperature-jump methods using an electric discharge from a capacitor to raise the temperature. A much faster signal is also seen that is ascribed to single-strand conformational changessstacking-unstacking of the bases. These dynamics in the nanosecond to microsecond range have been measured by laser temperature-jump methods.47 NMR measurements can also be used to measure the kinetics of double-helix association in the millisecond range and the nanosecond dynamics of individual nucleotides.48 The rate constants for formation and dissociation of helices are directly applicable to understanding the kinetics of ribozymessRNA catalystssthat cleave RNA.49 The RNA ribozyme binds a substrate RNA, cleaves the RNA strand, and releases the products to bind a new substrate. The efficiency of the enzyme depends on the chemical steps of breaking phosphodiester bonds and on the physical steps of binding substrate and releasing products. Any of these steps can be rate limiting. Ribozymes are being investigated as antiviral agents that will cleave the RNA of a target virus. Specificity can be increased by using a larger binding sequence on the target RNA, but this clearly decreases the rate of product release after cleavage. Knowledge of the oligonucleotide rate constants allows a rational optimization of specificity and of kinetic efficiency. Prediction of Base Pairing. The folded, three-dimensional conformations of RNA molecules have evolved to perform their many functions effectively. The first step in assessing the RNA conformations is to learn which bases form Watson-Crick base pairs. This is far from knowing a 3D structure, but it greatly limits the conformational space possible. The imino NMR spectrum can provide useful information about the base pairing in small nucleic acids (less than 50 nucleotides). Guanine and uracil (or thymine) each have one imino proton (see Figure 1) on the base. These protons resonate in a region (15 to 10 ppm) well separated from the nonexchangeable protons (below 10 ppm). Furthermore, only the slowly exchanging protons are seen in the spectrum; protons exchanging rapidly (lifetimes less than about 1 ms) will not be seen.50 Thus, in H2O a proton spectrum in the imino region will show one peak for each base pair (or slowly exchanging imino proton). As the rate of exchange increases due to increasing temperature, or due to change of pH or added catalyst, the proton spectrum will broaden and eventually disappear.51 Base pairs at the ends of helices, or next to perturbations of the helix, broaden first.

13316 J. Phys. Chem., Vol. 100, No. 31, 1996

Tinoco

TABLE 1: A-, B-, and Z-Form Double-Stranded Nucleic Acids4 sense of helix sugar puckera glycosidic angleb repeating unit base pairs per turn major groovec

A-form

B-form

Z-form

right-handed C3′-endo anti 1 base pair 11 deep and narrow

right-handed C2′-endo anti 1 base pair 10 wide

left-handed alternating C3′-endo and C2′-endo alternating anti and syn 2 base pairs 12 very shallow

a

In nucleic acids the deoxyribose and ribose five-membered rings have either the 2′-carbon (C2′-endo), r the 3′-carbon (C3′-endo) above the approximate plane of the other four carbons. The base is attached above the 1′-carbon. b The glycosidic angle is the torsion angle around the C-N bond between the sugar ring and the base. Syn and anti torsion angles differ by 180°. c The major groove is the side of the double helix the bases face; the minor groove is the side the sugars are on. The main interactions between proteins and DNA occur through the major groove.

Sequence Correlation (Phylogenetic) Methods. The most common method for establishing base pairing is to compare the sequences of a large group of RNAs that all have the same function. For example, all cells contain ribosomes, part of the machinery that synthesizes proteins. Hundreds of 16S ribosomal RNAs have been sequenced; their sequences of about 1500 bases are similar but not identical. By finding covariation of complementary bases, one can identify base-paired regions.52 Possible helical regions of four or more consecutive base pairs are identified first. Then one looks, for example, for a change from an A‚U to a G‚C pair in a possible helix. A helix is assumed proven if there are at least two covariations of sequence that leave the number of base pairs unchanged in a possible double helix. Clearly, if the helix sequence is completely conserved, this method cannot identify it. However, the most highly conserved sequences seem to be in single-strand regions. This phylogenetic method has been very accurate when tested by chemical and physical methods.53 Thermodynamic Methods. A free energy can be estimated for any folded RNA molecule relative to the unfolded single strand by adding the free energies of its stems (double helices), loops, and junctions. The free energies of these secondary structural elements can be obtained experimentally. We realize, of course, that additivity is only a first approximation because interactions among the secondary structural elements may also be significant. That is, the folded RNA may contain tertiary structural elements such as pseudoknots,54 kissing hairpins,55,56 and so forth. We will ignore tertiary structure temporarily and attempt to predict secondary structure by finding the lowest free energy structure. We clearly need to search among a great many possible structures. A dynamic programming algorithm was devised to find the lowest free energy structure among all possible Watson-Crick paired secondary structures.57 The strategy is to calculate the minimum free energy for a secondary structure element that has base i paired to base j. The formation of this base pair separates the linear sequence into two parts: the sequence between i and j and all the rest. Adding base pairs to one part has no effect on the free energy of the other part. We first examine all base pairs separated by three unpaired bases (assuming that this is the minimum hairpin loop size). Other base pairs are added, and the minimum free energy is calculated. At the end of the process the calculated global free energy is obtained. The key to the algorithm is that by neglecting interactions between loops the global minimum is a sum of local minimasthe free energies of the component optimal secondary structure elements. Unfortunately, it quickly became evident that the calculated folded RNA structure was often inconsistent with experiment. Some of the calculated structural elements were correct, and some were not. It was important to calculate not just the lowest free energy, but also other low free energy structures. Experiments, such as chemical reactivity of certain bases or enzymatic

cleavage of single-strand regions, could be used to select among the predicted structures. Even if the correct structure was obtained from the calculated global free energy, alternate structures could be important during the biological functioning of the RNA. To obtain structures of low free energy, but not the lowest free energy, it was necessary to extend the original algorithm57 to save suboptimal structural elements.58,59 It was necessary to identify significantly different low free energy structures. Otherwise, too many structures would be obtained that only differ from the global minimum by breaking one base pair, for example. In the Zuker program59 the experimental thermodynamic parameters have been expanded over the years to include G‚U base pairs, specific base-base mismatches, extrastable tetraloops, dangling bases at the ends of helices, coaxial stacking of helices, and so forth.60 The predictive accuracy has improved significantly, making the program very useful in identifying possible secondary structures. For example, when designing an oligonucleotide sequence to form a particular structure, it is important to avoid alternate structures. The Zuker program is very helpful in choosing a likely sequence. Similarly, in studies of natural RNAs the program can identify secondary structures to test by direct experiment. Three-Dimensional Conformations. A-, B-, and Z-Form Antiparallel Double-Stranded Helices. DNA fibers were first found in two forms: a well-ordered A-form at low relative humidity and a more disordered B-form at humidities above 90%. In aqueous solution (and presumably in biological cells) DNA is B-form. Decreasing the water activitysby adding ethanol, for examplescan induce a B to A transition. The conformation of the DNA double helix is sensitive to sequence, solvent, and temperature; therefore, it is best to think about families of structures: A-type and B-type. However, the two families are distinct and can undergo a first-order transition from one family to the other. The structural differences among the right-handed A- and B-families and the left-handed Z-form are summarized in Table 1. All three forms have antiparallel strands. Left-handed Z-DNA was first seen in 4 M NaCl by circular dichroism measurements on synthetic DNA polynucleotides containing alternating deoxyribo-C’s and deoxyriboG’s on each strand (polydCdG‚polydCdG).61 The lefthandedness and the detailed structure were determined by X-ray crystallography of a six-base-pair helix.62 In solution the conformations of DNA are not identical to those seen in crystals. The number of base pairs per turn for B-DNA in solution is near 10.3 (depending on sequence, solvent, etc.) rather than 10.0. This number can be measured very accurately in solution by exploiting the topological constraints in a covalently closed circular DNA to count the number of strand crossings.63 RNA is A-form in fibers independent of relative humidity and is A-form in most aqueous solutions. In 6 M NaClO4 RNA polynucleotides of alternating ribo-C’s and ribo-G’s (polyrCrG‚ polyrCrG) undergo a transition to left-handed Z-form.64 NMR

Nucleic Acid Structures, Energetics, and Dynamics studies showed that the conformation was very close to that of Z-form DNA.65 Attempts to produce B-form RNA have failed so far; the A-form helix melts to single strands without passing through a B-form geometry. In biological cells, the predominant form of DNA is B-form and that of RNA is A-form. However, Z-form DNA and Z-form RNA exist naturally as shown by in ViVo binding of antibodies specific for these forms.66,67 There are strong indications that Z-DNA is involved in gene regulation; the function of Z-RNA, if any, is unknown. The structures and thermodynamics of the A-, B- and Z-families of nucleic acids and the statistical mechanics and kinetics of their interconversions provide a challenging arena for developing and testing theories of interactions in charged polymers. Of course, the different conformations are not static entities. The double helices undergo bending and torsional vibrations; each nucleotide has its own dynamics. The global and local motions can be followed by such methods as dynamic light scattering, anisotropy of fluorescence, nuclear magnetic resonance, and electron paramagnetic resonance.68 Parallel Double-Stranded Helices.69 Watson-Crick double helices have the two strands with the 5′-end of one next to the 3′-end of the other (see Figure 1); they are antiparallel. However, by linking two strands with a loop containing a 5′5′ link or a 3′-3′ link, a hairpin (stem-loop) can be synthesized in which the strands in the stem are parallel.70 Parallel doublestranded helices form containing A‚T pairs and G‚C pairs; the helices are less stable than the identical sequence with antiparallel strands. The hydrogen bonding between bases is reverse Watson-Crick pairing71 as shown in Figure 3. Parallel doublestranded helices can also be held together by A‚A pairs and G‚G pairs.72,73 A parallel double-stranded helix of poly-A‚polyA+ has been known for many years,74 and G-quartets can have either parallel or antiparallel strands.75 Single nucleotides that reverse the direction of the chain occur in natural RNA molecules.76,77 Triple-Stranded Helices.78,79 Synthetic double-stranded helices containing all A’s on one strand and all U’s or T’s on the other can undergo a disproportionation to form a triple-stranded helix and a single strand of U’s or T’s.80

polyA‚polyU + polyA‚polyU f (polyA‚polyU)‚polyU + polyA The reaction is favored by molar concentrations of univalent cations or millimolar concentrations of divalent cations. Triple strands also form with (polyG‚polyC)‚polyC+. Although the pK of cytosine is about 4.2, stabilization of the protonated form by triple-strand formation allows the helix to form near neutral pH. The helix is made up of hydrogen-bonded base triples with the purine (A or G) forming one Watson-Crick pair and one Hoogsteen pair (see Figure 3). The Watson-Crick pair has the usual antiparallel strand directions; the third strand (containing the pyrimidines U or C) is in the major groove and is antiparallel to the pyrimidine strand. Triple-stranded helices with a Watson-Crick pair plus a homopurine strand, such as (polyA‚polyT)‚polyA and (polyG‚polyC)‚poly G, also form. Again, the Watson-Crick strands are antiparallel, and the third strand is in the major groove and antiparallel to the strand containing the same type of base. Mixed-sequence triple strands have been studied in many combinations to determine their structures and thermodynamic stabilities; a small sampling of references is given.81-85 Very little work has been done on the kinetics of the formation of triple helices.86 DNA triple-helix regions exist in biological cells. Sequences of homopurines-homopyrimidines in natural DNAs can dis-

J. Phys. Chem., Vol. 100, No. 31, 1996 13317

Figure 4. Two topoisomers of a double-stranded DNA. On the left is shown a molecule with linking number, Lk ) 20; twist, Tw ) 20; and writhe, Wr ) 0. On the right is a topological isomer with Lk ) 18, Tw ) 20, and Wr ) -2. Twist and writhe need not be integers, but their sum must be an integer. The Watson-Crick windings are righthanded (Tw is positive), so a negative writhe means that left-handed supercoiling is present. The topoisomers can be separated by gel electrophoresis. Reproduced by permission from ref 75. Copyright 1994 Academic Press.

proportionate as above to form triple helices and unpaired single strands.78,87 There is great interest in using triple-helix formation in an antigene strategy to control gene function for medical use. The idea is to treat a patient with an oligonucleotide that forms a triple helix and prevents a specific DNA, such as a bacterial DNA or a cancer-causing gene, from being expressed.88,89 Understanding the thermodynamics of which base triples form and how mismatches decrease stabilities is obviously of crucial importance for the clinical applications of antigene therapy. Supercoiled DNA and Topological Isomers of DNA.90 The ends of a double-stranded DNA can be be covalently linked to form a circle. Once this is done the number of times one strand crosses the other is permanently fixed as long as neither strand is broken. The number of strand crossings (counted by projecting the strands on a plane) is a topological constant called the linking numbersan integer. Natural DNAs often occur as covalently closed circles. For example, plasmids that provide bacteria with antibiotic resistance are naturally occurring covalently closed circles. DNA in higher organisms are sometimes topologically constrained by proteins that hold two regions in a fixed relation. In either case this means topological isomersstopoisomerssexist. Topoisomers were first identified and characterized in polyoma viral DNA by ultracentrifugation and electron microscopy.91 Double-stranded DNA can have “sticky ends” with a 5′ single strand overhang on one end complementary to a 3′ single strand overhang on the other end. The enzyme DNA ligase can covalently close the sticky ends to form intramolecular circles or intermolecular dimers, trimers, etc. The circles formed are a mixture of topoisomers having linking numbers that differ by one (see Figure 4). When the ends are ligated, the mean number of strand crossings depends on the number of base pairs (one strand crossing for every ≈10.3 base pairs), but there will be a distribution of linking numbers. For example, for 1000 base pairs there could be a linking number of 1000/10.3 ) 97 and also 95, 96, 98, and 99. The distribution is at equilibrium (the ligase action is slow compared to the fluctuations in conformations); therefore, the relative concentrations of topoisomers provides the free energy diffferences among topoisomers.92 The topological constraints characterizing the topoisomers are represented by90

13318 J. Phys. Chem., Vol. 100, No. 31, 1996

Lk ) Tw + Wr The integer linking number, Lk, is a constant for each topoisomer as long as no bonds are broken. It is equal to the sum of twist, Tw, and writhe, Wr. Twist is the number of double helical turns in the DNA, and writhe is a measure of the superhelical turns as indicated in Figure 4. Neither twist nor writhe is necessarily an integer, but their sum is. The hydrodynamic properties of the DNA depend on its writhe. This means that molecules with different writhes can be separated by ultracentrifugation or, most conveniently, by gel electrophoresis. We now have an extremely sensitive method of studying twistsa direct measure of the number of base pairs per turnsin DNA. The number of 10.3 base pairs per turn used above came from studies of topoisomers.63 Of course, the exact number will depend on the sequence, the pH, the ionic strength, etc. Instead of base pairs per turn, we can consider the winding angle (360°/base pairs per turn = 35°); it decreases about 0.01° per °C rise in temperature.92 The effect of any molecule that unwinds the DNA double strand, such as proteins that bind or antibiotics that intercalate, can be characterized precisely. Conformational transitions from B to A, or particularly to lefthanded Z, can be easily monitored.93,94 Creative applications of the simple topological equation above have provided a wealth of information about structures and thermodynamics of DNA. Theories and calculations of the shapes and dynamics of supercoiled DNAsthe spatial manifestation of the writhesare numerous; we cite two.95,96 The correct topological state of DNA molecules in living cells is crucial for the replication of the DNA and the division of the cells. Enzymes called topoisomerases and gyrases are continually unwinding and winding the DNA double strands during replication. Topoisomerase inhibitors are being tested as anticancer agents and as antibiotics.97,98 Folded RNA Structures. The first RNA structure determined at atomic resolution was the crystal structure of phenylalanine transfer RNA99sthe 76-nucleotide RNA that reads the threeletter codon on the messenger RNA and places the correct amino acid on the growing polypeptide chain. In 1973, this RNA already provided coordinates for many of the structural elements that are now being investigated in other RNAs: G‚A nonWatson-Crick base pairs, base triples, hairpin loops, four-stem junctions, and so forth. The number of RNA X-ray structures published since then has not been large because of the difficulty of obtaining useful crystals. However, NMR studies of RNA oligonucleotides in solution are now providing coordinates and dynamics for all kinds of non-Watson-Crick base pairs, hairpin loops, internal loops, bulges, pseudoknots, junctions, and so forth. Several reviews are available.36,100-102 Tetraloops are hairpin loops containing four nucleotides in the loop. They occur often in natural RNA sequences, but among the 256 possible permutations a few tetraloop sequences are very common. These tetraloops are also more stable thermodynamically relative to the unfolded single strand than the less common loops. For example, the extrastable tetraloop 5′-GGAC(UUCG)GUCC-3′ has a standard free energy change at 37 °C of -6.3 kcal mol-1 for forming four base pairs and a UUCG loop from the single strand in aqueous solution in 1 M NaCl.103 Changing the loop sequence to (UUUG) or (UUUU) increases the standard free energy to -4.2 kcal mol-1. To understand the reason for the difference in 2.1 kcal mol-1 in loop stability, we began an NMR study of the (UUCG) and (UUUG) tetraloops and furnished the oligonucleotides for crystallization for an X-ray study. The NMR structures of the tetraloop 12-mer oligonucleotides were determined by a now standard method.104 The 1D imino

Tinoco spectrum measured in H2O reveals the hydrogen-bonded, or slowly exchanging, imino protons of guanine and uracil. There are seven imino protons in the (UUCG) 12-mer; they are all visible in the spectrum. The amino protons of adenine, cytosine, and guanine are more difficult to observe because of rapid rotation of the amino group around the C-N bond. Labeling with 15N allows their identification.105 The 2′-hydroxyl hydrogens are the most difficult to see; most of them exchange too fast with water. However, the 2′-hydroxyl of the first U of the (UUCG) tetraloop is visible. It is present in a H2O spectrum and vanishes in D2O, but 1H-15N correlated spectra prove it is not bonded to nitrogen.106 Its identification and assignment were confirmed by a three-bond coupling to the C3′ of the ribose group of that U.107 The 92 nonexchangeable protons were assigned by a combination of two-dimensional correlated spectroscopy (COSY), which gives through-bond scalar couplings, and 2D nuclear Overhauser effect spectroscopy (NOESY), which gives throughspace dipolar couplings.108 The scalar coupling (J-coupling) constants are related to torsion angles by Karplus-type equations; the nuclear Overhauser enhancements are proportional to the inverse sixth power of the interproton distances. These constraints are added to standard bond lengths and bond angles to obtain a conformationsor range of conformationssconsistent with all the NMR data. The results for the (UUCG) tetraloop are shown in Figure 5a.109 Reasons for the extra thermodynamic stability of the loop apparent in the structure are the U‚G nonWatson-Crick base pair, which involves an unusual syn guanosine and a hydrogen bond between the C amino group and a nearby phosphate oxygen. Changing this C to a U in the less stable (UUUG) tetraloop replaces the C amino by a U carbonyl oxygen that repels the phosphate. The conformation changes very slightly, but the stability markedly decreases. There is an increase in ∆G° of +2.1 kcal mol-1. Of course, the stabilities of the two tetraloops relative to the single strands depend on many interactions other than the hydrogen bonds mentioned. Stacking of the bases, hydrogen bonds to water, locations of counterions, and so forth are all important. It would be wonderful to be able to calculate all these effects and to be able to predict structures and thermodynamics of all 256 tetraloops; it would leave so much more NMR time for all 1024 pentaloops. To assess the quality of the NMR structures, we wanted to compare NMR coordinates with coordinates from X-ray crystal structures. Both the UUCG110 and UUUG111 oligonucleotides were determined, but unfortunately neither form hairpin loops in the crystal. They both form double-stranded helices with four non-Watson-Crick base pairs in the center of each helix. The crystals give high-resolution structures for G‚U, U‚C, and U‚U base pairs and show that the mismatches do not disrupt standard A-form helices very much. Presumably, helices rather than hairpins form in the crystals because of the higher symmetry of the helices. We tried to force double-stranded helices to form in solution, but we only achieved aggregates and precipitates. A more complex structure is a pseudoknot;112,113 it is a hairpinsa stem-loopswith base pairing of bases in the loop with a single-strand sequence that can extend from either side of the stem. Thus, two hairpins form in which bases in the loop of each hairpin form part of the stem of the other. If each stem had a complete turn (11 base pairs), a knot would be formed if the ends were linked. At Berkeley we investigated the effect of loop size on pseudoknot stability114 and used NMR to show that the stems are coaxially stacked and that the two loops are on the same side of the quasicontinuous helix.113 We

Nucleic Acid Structures, Energetics, and Dynamics

Figure 5. (top) NMR structure109 of a 12-nucleotide extrastable hairpin tetraloop. There are four Watson-Crick base pairs in the stem; the loop sequence is UUCG. (bottom) NMR structure of a pseudoknot that causes frame shifting in MMTV retrovirus.115 There is an A nucleotide between the two stems that favors the bent structure shown. Deletion of the A leads to a linear pseudoknot and eliminates the frameshifting.

have recently determined the structure of a type of pseudoknot necessary for the infectivity and replication of many retroviruses.115 The pseudoknot facilitates a programmed frameshift during the translation of the retroviral RNA that leads to the synthesis of such vital viral enzymes as reverse transcriptase and integrase.116 The structure115 of the 34-nucleotide molecule (Figure 5b) reveals a bent molecule with an intervening A between the two stems. The loops are too short to bridge the stems, so a bend occurs at the junction of the stems. The intervening A provides the hinge. Removing this A eliminates the frameshifting117 and produces a linear pseudoknot with coaxially stacked stems.118 We are continuing study of the structure-function relation of pseudoknots and retroviral frameshifting. We hope that it may suggest new methods to inhibit viruses. Prediction of Folded RNA Structures. Calculations of three-dimensional structuressconformationssof RNA can be done by the usual methods of molecular dynamics and free energy calculations.119-121 However, the potential functions for

J. Phys. Chem., Vol. 100, No. 31, 1996 13319 intra- and intermolecular interactions are not good enough to produce believable results in ionic solutions. If empirical data, such as distance and torsion angle constraints from NMR experiments, are used to find an approximate conformation, then these methods can be used to improve and refine the structure. The theoretical calculation provides the minimum free energy structure in the local minimum chosen by experiment. The NMR data determine the structure; the calculation just polishes the result. Other semiempirical methods use more qualitative experimental data. For example, thermodynamic and phylogenetic methods can provide the secondary structuressingle-strand and double-strand regions. Chemical and enzymatic reactivity of each nucleotide and of different groups within each nucleotide show which groups are available to solvent and which groups are involved in other interactions. These data provide a starting conformation for calculation of three-dimensional structures. One method uses databases from crystal structures of mononucleotides and oligonuclceotides to limit the posssible calculated structures.122,123 The conformation of each nucleotide is specified by seven torsion angles; only torsion angle values found in known stuctures are allowed in the starting conformation. For simple, small hairpin root-mean-square deviations of 1.5-2.0 Å between calculated and observed can be expected.124 Modeling larger RNA containing hundreds of nucleotides can give useful structures, although coordinates cannot be compared with experiment.125-127 The predicted structures are vital for designing experiments to test the predictions and to improve the models. Studies of Single DNA Molecules.128 The observation of the mechanical properties and the viscoelastic behavior of a single DNA molecule provides a new level of information about its energetics and dynamics. Micron-sized fluorescent beads have been attached to the ends of a single DNA molecule. The beads serve both to manipulate the molecule and to visualize its ends. Direct measurement of the elasticity of a DNA was obtained by tethering one end of the DNA to a glass slide and using flow and magnetic fields to apply forces in the range 10-14 to 10-10 N to a bead on the other end.129 The force vs extension curves can be explained reasonably well in terms of entropic elasticity.130 A receding meniscus has also been used to stretch a DNA attached to a surface at both ends.131 Optical tweezers can be used to manipulate beads attached to DNA to study stretching, relaxation, and motion of single molecules.132-134 The ability to visualize and study a single nucleic acid molecule in solution allows many new possibilities. Instead of averaging physical properties over many molecules, we can average one molecule over time. We trust the two methods will give the same answer. Spectroscopic studies, such as linear dichroism, circular dichroism, and hypochromism, of a single molecule at different extensions and with different ligands bound can be very informative. We can learn about structure and about our understanding of nucleic acid optical properties. Functions Catalytic RNA Molecules: Ribozymes. Thomas Cech was trying to isolate and purify the enzyme in the protozoan Tetrahymena that catalyzes the excision of a 414-nucleotide piece of its pre-ribosomal RNA. This RNA processing commonly occurs in pre-ribosomal RNAs, pre-transfer RNAs, and pre-messenger RNAs. Cech found that during purification as the amount of protein decreased the catalytic activity of the enzyme increased. In fact, catalysis occurred in the complete absence of protein. The RNA cut and spliced itself in the presence of Mg2+ and a guanosine cofactor.135 The biochemists

13320 J. Phys. Chem., Vol. 100, No. 31, 1996 were convinced at first that the result was an artifact. They thought that Cech did not know how to properly purify a sample to remove all traces of contaminating proteinsthe real catalyst. He was able to convince them that RNA catalysts did exist after he synthesized the Tetrahymena RNA in another organism and showed that it still was catalytic, although it had never been exposed to Tetrahymena proteins.136 Cech and Sidney Altman, who discovered a ribozyme that processes a pre-transfer RNA,137 shared the Nobel Prize for chemistry in 1989. A wide range of catalytic RNA moleculessribozymesshave been discovered and characterized since the original discovery in 1981.138 An extensively characterized class of ribozymes, called hammerhead or hairpin ribozymes, occur in plant viroids that self-cleave in ViVo.139 They can be tailored to cleave or ligate external substrates. Detailed kinetic studies of each step of their reactions have been done,49 and two crystal structures have been determined.140 NMR studies are underway, but no solution structure has appeared yet. It is clear that the RNA is flexible and that the structures obtained in the crystal or in solution will not necessarily reveal the catalytic mechanism. Specific metal ions are involved in the catalysis; their locations and interactions are critical. The field of ribozyme catalysis is new (approximately 1980) and very important in understanding chemical catalysis in solution, in providing new ideas about chemical evolution of life,37 and in devising new methods of treating viral diseases by sequence-specific cleavage. It is an active field as shown by the 300 references found with the title word “ribozyme” in the past six years of the Current Contents database. Random Synthesis and Selection of RNAs To Perform Any Function.141 It is very easy to synthesize DNA or RNA with random sequences; at each cycle of a commercial synthesis machine all four nucleotides are introduced. The number of different sequences produced depends on the number of positions randomized and the total number of molecules present in the amount of material synthesized. A 1 µmol synthesis could produce about 1017 different sequences; this is the number of possible sequences for a randomized 28-mer. Of course, you can randomize more than 28 nucleotides; you will then get only a small sampling of the total sequence space available. The idea is to select the sequence among this large number that does the function that you choose. For example, you can isolate a sequence that binds a certain protein.142,143 The random mixture of DNAs or RNAs is passed through a column containing the target protein, and the tightest binding fraction is collected. This fraction is amplified by PCR (for RNAs the fraction is first reverse transcribed into DNA, amplified, and then transcribed back to RNA), and the selection process is repeated. After about 10 cycles of selection and amplification, only a few strongbinding sequences are present. Each different sequence can be cloned and characterized as to binding constant and structure. From 1017 potential binders a small number of “best” molecule have been selected. The functions and properties that can be evolved are only limited by the originality of the experimenter. Examples include RNA cleaving ribozymes,144 DNA cleaving ribozymes,145 amide cleaving ribozymes,146 ligases,147 ATP binders,148 a ribozyme that isomerizes a hindered biphenyl,149 and so forth. The ability to evolve RNA or DNA molecules to have specific physical and chemical interactions provides a unique method to test theories of molecular interactions. Of course, it also can produce molecules of great medical and practical use. Conclusions The Role of the Physical Chemist. Physical chemists understand thermodynamics and kinetics. They can apply

Tinoco quantum mechanics and statistical mechanics to increasingly complex systems. They are comfortable with all the structural methods including X-ray crystallography, nuclear magnetic resonance, and other spectroscopies. So what do we lack that prevents us from discovering and solving the important problems in biology? It is lack of knowledge, and therefore lack of interest, in the biological problems. We must take the time and trouble to learn what questions the neurobiologists, the developmental biologists, the physiologists, the geneticists, and even the physicians are asking. It may be that we can answer some of their questions immediately. We will certainly learn about the difficult, challenging, and vital problems that someone will solve in the next 100 years. Acknowledgment. Professor Carlos Bustamante, University of Oregon, Dr. Ling X. Shen, University of California at Santa Cruz, and Professor Douglas Turner, University of Rochester, were kind enough to read the manuscript and to make very helpful suggestions. Professor James Williamson, MIT, and Professor Richard Sinden, Texas A&M, kindly supplied figures. My research has been supported by grants from NIH and from DOE; their support is gratefully acknowledged. References and Notes (1) Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 737. (2) Wilkins, M. H. F.; Stokes, A. R.; Wilson, H. R. Nature 1953, 171, 738. Franklin, R. E.; Gosling, R. G. Nature 1953, 171, 740. (3) There is an apocryphal story that Sir Lawrence Bragg, the longtime head of the Cavendish Laboratory where the DNA double helix was discovered, at first thought that Watson-Crick was one person. (4) Saenger, W. Principles of Nucleic Acid Structure; SpringerVerlag: New York, 1984. Bloomfield, V. A.; Crothers, D. M.; Tinoco, I., Jr. Nucleic Acids: Structures, Properties and Functions; University Science Books: Mill Valley, CA, 1997. (5) Fleischmann, R. D.; 37 coauthors; Venter, J. C. Science (Washington, D.C.) 1995, 269, 496-512. (6) Sanger, F.; Nicklen, S.; Coulson, A. R. Proc. Natl. Acad. Sci. U.S.A. 1977, 74, 5463-5467. (7) Maxam, A. M.; Gilbert, W. Proc. Natl. Acad. Sci. U.S.A. 1977, 74, 560-564. (8) Schwartz, D. C.; Cantor, C. R. Cell 1984, 37, 67. (9) deGennes, P. G. J. Chem. Phys. 1971, 55, 572. (10) Lumpkin, O. J.; Zimm, B. H. Biopolymers 1982, 21, 2315. (11) Smith, S. B.; Aldridge, P. K.; Callis, J. B. Science (Washington, D.C.) 1989, 243, 203. Schwartz, D. C.; Koval, M. Nature 1989, 338, 520. (12) Houseal, T. W.; Bustamante, C.; Stump, R. F.; Maestre, M. F. Biophys. J. 1989, 56, 507-516. (13) Bustamante, C.; Gurrieri, S.; Smith, S. B. Trends Biotechnol. 1993, 11, 23-30. (14) Smith, S. B.; Heller, C.; Bustamante, C. Biochemistry 1991, 30, 5264-5274. (15) Bains, W.; Smith, G. C. J. Theor. Biol. 1988, 135, 303-307. (16) Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov Science (Washington, D.C.) 1993, 260, 1649-1652. (17) Pease, A. C.; Solas, D.; Sullivan, E. J.; Cronin, M. T.; Holmes, C. P.; Fodor, S. P. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 5022-5026. (18) Erlich, H. A.; Gelfand, D.; Sninsky, J. J. Science (Washington, D.C.) 1991, 252, 1643-1650. (19) Sinden, R. R.; Wells, R. D. Curr. Opin. Biotechnol. 1992, 3, 612622. (20) Smith, F. W.; Feigon, J. Nature 1992, 356, 164-168. (21) Kang, C. H.; Zhang, X.; Ratliff, R.; Moyzis, R.; Rich, A. Nature 1992, 356, 126-131. (22) Williamson, J. R. Annu. ReV. Biophys. Biomol. Struct. 1994, 23, 703-786. (23) Gushlbauer, W.; Chantot, J. F.; Thiele, D. J. Biomol. Struct. Dyn. 1990, 8, 491-511. (24) Wyatt, J. R.; Vickers, T. A.; Roberson, J. L.; Buckheit, R. J.; Klimkait, T.; DeBaets, E.; Davis, P. W.; Rayner, B.; Imbach, J. L.; Ecker, D. J. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 1356-60. (25) Kyoguku, Y.; Lord, R. C.; Rich, A. Proc. Natl. Acad. Sci. U.S.A. 1967, 57, 250-257. (26) Sukhodub, L. F. Chem. ReV. 1987, 87, 589-606. (27) Hoogsteen, K. Acta Crystallogr. 1963, 16, 907-916. (28) Tinoco, I., Jr. J. Am. Chem. Soc. 1960, 82, 4785-4790. (29) Tinoco, I., Jr. AdV. Chem. Phys. 1962, 4, 113-160.

Nucleic Acid Structures, Energetics, and Dynamics (30) Gray, D. M.; Hung, S.; Johnson, K. H. In Biochemical Spectroscopy; Methods in Enzymology Vol. 246; Sauer, K., Ed.; Academic Press: San Diego, CA, 1995; pp 19-34. (31) Woody, R. W. In Biochemical Spectroscopy; Methods in Enzymology Vol. 246; Sauer, K., Ed.; Academic Press: San Diego, CA, 1995. (32) Sauer, K. Biochemical Spectroscopy; Methods in Enzymology Vol. 246; Academic Press: San Diego, CA, 1995. (33) Borer, P. N.; Dengler, B.; Tinoco, I., Jr.; Uhlenbeck, O. C. J. Mol. Biol. 1974, 86, 843-853. (34) Freier, S. M.; Kierzek, R.; Jaeger, J. A.; Sugimoto, N.; Caruthers, M. H.; Neilson, T.; Turner, D. H. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 9373-9377. (35) Turner, D. H.; Sugimoto, N.; Freier, S. M. Annu. ReV. Biophys. Biophys. Chem. 1988, 17, 167-192. (36) Chastain, M.; Tinoco, I., Jr. In Progress in Nucleic Acid Research and Molecular Biology; Cohn, W. E., Moldave, K., Ed.; Academic Press: Orlando, FL, 1991; Vol. 41, pp 131-177. (37) Gesteland, R. F.; Atkins, J. F. The RNA World; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1993. (38) Breslauer, K. J.; Frank, R.; Blocker, H.; Marky, L. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 3746-3750. (39) Aboul-ela, F.; Koh, D.; Tinoco, I., Jr.; Martin, F. H. Nucl. Acids Res. 1985, 13, 4811-4824. (40) Gray, D. M.; Liu, J.; Ratliff, R. L.; Allen, F. S. Biopolymers 1981, 20, 1337-1382. (41) Gray, D. M.; Hamilton, F. D.; Vaughan, M. R. Biopolymers 1978, 17, 85-106. (42) Koo, H. S.; Drak, J.; Rice, J. A.; Crothers, D. M. Biochemistry 1990, 29, 4227-4234. (43) Poland, D.; Scheraga, H. A. Theory of Helix-Coil Transition in Biopolymers; Academic Press: New York, 1970. (44) Po¨rschke, D.; Uhlenbeck, O. C.; Martin, F. H. Biopolymers 1973, 12, 1313-1335. (45) Po¨rschke, D.; Eigen, M. J. Mol. Biol. 1971, 62, 361-381. (46) Po¨rschke, D. Biophys Chem. 1974, 2, 97-101. (47) Dewey, T. G.; Turner, D. H. Biochemistry 1979, 18, 5757-5762. (48) van de Ven, F. J. M.; Hilbers, C. W. Eur. J. Biochem. 1988, 178, 1. (49) Hertel, K. J.; Herschlag, D.; Uhlenbeck, O. C. Biochemistry 1994, 33, 3374-3385. (50) Leroy, J. L.; Kochoyan, M.; Huynh-Dinh, T.; Gueron, M. J. Mol. Biol. 1988, 200, 223-238. (51) Leroy, J. L.; Broseta, D.; Gueron, M. J. Mol. Biol. 1985, 184, 165178. (52) Woese, C. R.; Magrum, J.; Gupta, R.; Siegel, R. B.; Stahl, D. A.; Kop, J.; Crawford, N.; Brosius, J.; Gutell, R.; Hogan., J. J.; Noller, H. F. Nucl. Acids Res. 1980, 8, 2275-2293. (53) Noller, H. F. Annu. ReV. Biochem. 1984, 53, 119-162. (54) Puglisi, J. D.; Wyatt, J. R.; Tinoco, I., Jr. Acc. Chem. Res. 1991, 24, 152-158. (55) Chang, K.; Tinoco, I., Jr. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 8705-9709. (56) Marino, J. P.; Gregorian, R. S., Jr.; Csankovszki, G.; Crothers, D. M. Science (Washington, D.C.) 1995, 268, 1448-1454. (57) Nussinov, R.; Piecznik, G.; Grigg, J. R.; Kleitman, D. J. SIAM J. Appl. Math. 1978, 35, 68-82. (58) Williams, A. L., Jr.; Tinoco, I., Jr. Nucl. Acids Res. 1986, 14, 299314. (59) Zuker, M. Science (Washington, D.C.) 1989, 244, 48-52. (60) Walter, A. E.; Turner, D. H.; Kim, J.; Lyttle, M.; Mu¨ller, P.; Mathews, D. H.; Zuker, M. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 92189222. Walter, A. E.; Turner, D. H. Biochemistry 1994, 33, 12715-12719. (61) Pohl, F. M.; Jovin, T. M. J. Mol. Biol. 1972, 67, 375-396. (62) Wang, A. H.; Quigley, G. J.; Kolpak, F. J.; Crawford, J. L.; van Boom, J. H.; van der Marel, G.; Rich, A. Nature 1979, 282, 680-686. (63) Peck, L. J.; Wang, J. C. Nature 1981, 292, 375-378. (64) Hall, K.; Cruz, P.; Tinoco, I., Jr.; Jovin, T. M.; van de Sande, J. H. Nature 1984, 311, 584-586. (65) Davis, P. W.; Adamiak, R. W.; Tinoco, I., Jr. Biopolymers 1990, 29, 109-122. (66) Nature 1981, 294, 417-422. (67) Zarling, D. A.; Calhoun, C. J.; Hardin, C. C.; Zarling, A. H. Proc. Natl. Acad. Sci. U.S.A. 1987, 84, 6117-6121. (68) Robinson, B.; Drobny, G. P. Annu. ReV. Biophys. Biomol. Struct. 1995, 24, 523-610. (69) Rippe, K.; Jovin, T. M. Methods Enzymol. 1992, 211, 199-220. (70) van de Sande, J. H.; Ramsing, N. B.; Germann, M. W.; Elhorst, W.; Kalisch, B. W.; von Kitzing, E.; Pon, R. T.; Clegg, R. C.; Jovin, T. M. Science (Washington, D.C.) 1988, 241, 551-557. (71) Otto, C.; Thomas, G. A.; Rippe, K.; Jovin, T. M.; Peticolas, W. L. Biochemistry 1991, 30, 3062-3069. (72) Rippe, K.; Fritsch, V.; Westhof, E.; Jovin, T. M. EMBO J. 1992, 11, 3777-3786. (73) Evertsz, E. M.; Rippe, K.; Jovin, T. M. Nucl. Acids Res. 1994, 22, 3293-3303.

J. Phys. Chem., Vol. 100, No. 31, 1996 13321 (74) Rich, A.; Davies, D. R.; Crick, F. H. C.; Watson, J. D. J. Mol. Biol. 1961, 3, 71-86. (75) Sinden, R. R. DNA Structure and Function; Academic Press: San Diego, 1994. (76) Wimberly, B.; Varani, G.; Tinoco, I., Jr. Biochemistry 1993, 32, 1078-1087. (77) Szewczak, A. A.; Moore, P. B.; Chan, Y.-L.; Wool, I. G. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 9581-9585. (78) Frank-Kamenetskii, M. D.; Mirkin, S. M. Annu. ReV. Biochem. 1995, 64, 65-95. (79) Plum, G. E.; Pilch, D. S.; Singleton, S. F.; Breslauer, K. J. Annu. ReV. Biophys. Biomol. Struct. 1995, 24, 319-462. (80) Felsenfeld, G.; Davies, D. R.; Rich, A. J. Am. Chem. Soc. 1957, 79, 2023-2024. (81) Moser, H. E.; Dervan, P. B. Science (Washington, D.C.) 1987, 238, 645-650. (82) Sklenar, V.; Feigon, J. Nature 1990, 345, 836-838. (83) Macaya, R.; Wang, E.; Schultze, P.; Sklenar, V.; Feigon, J. J. Mol. Biol. 1992, 225, 755-773. (84) Han, H.; Dervan, P. B. Nucl. Acids Res. 1994, 22, 2837-2844. (85) Radhakrishnan, I.; Patel, D. J. Biochemistry 1994, 33, 1140511416. (86) Rougee, M.; Faucon, B.; Mergny, J. L.; Barcelo, F.; Giovannangeli, C.; Garestier, T.; Helene, C. Biochemistry 1992, 31, 9269-9278. (87) Mirkin, S. M.; Frank-Kamenetskii, M. Annu. ReV. Biophys. Biomol. Struct. 1994, 23, 541-576. (88) Helene, C.; Thuong, N. T.; Harel-Bellan, A. Ann. N.Y. Acad. Sci. 1992, 660, 27-36. (89) Grigoriev, M.; Praseuth, D.; Guieysse, A. L.; Robin, P.; Thuong, N. T.; Helene, C.; Harel-Bellan, A. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 3501-3505. (90) Vologodskii, A. V.; Cozzarelli, N. R. Annu. ReV. Biophys. Biomol. Struct. 1994, 23, 609-643. (91) Vinograd, J.; Lebowitz, J.; Radloff, R.; Watson, R.; Laipis, P. Proc. Natl. Acad. Sci. U.S.A. 1965, 53, 1104-1111. (92) Depew, D. E.; Wang, J. C. Proc. Natl. Acad. Sci. U.S.A. 1975, 72, 4275-4279. (93) Wang, J. C. Proc. Natl. Acad. Sci. U.S.A. 1979, 76, 200-203. (94) Peck, L. J.; Wang, J. C. Proc. Natl. Acad. Sci. U.S.A. 1983, 80, 6206-6210. (95) Schlick, T.; Olson, W. K. J. Mol. Biol. 1992, 223, 1089-1119. (96) Shi, Y.; Hearst, J. E. J. Chem. Phys. 1994, 101, 5186-5200. (97) Chen, A. Y.; Yu, C.; Gatto, B.; Liu, L. F. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 8131-8135. (98) Chen, A. Y.; Liu, L. F. Annu. ReV. Pharmacol. Toxicol. 1994, 34, 191-218. (99) Kim, S.-H.; Quigley, G. J.; Suddath, F. L.; McPherson, A.; Sneden, D.; Kim, J. J.; Weinzierl, J.; Rich, A. Science (Washington, D.C.) 1973, 179, 285-288. (100) Jaeger, J.; SantaLucia, J., Jr.; Tinoco, I., Jr. Annu. ReV. Biochem. 1993, 62, 255-287. Shen, L. X.; Cai, Z.; Tinoco, I., Jr. FASEB J. 1995, 9, 1023-1033. (101) Moore, P. B. Acc. Chem. Res. 1995, 28, 251-256. (102) Varani, G. Annu. ReV. Biophys. Biomol. Struct. 1995, 24, 379404. (103) Antao, V. P.; Tinoco, I., Jr. Nucl. Acids Res. 1992, 20, 819-824. (104) Varani, G.; Tinoco, I., Jr. Q. ReV. Biophys. 1991, 24, 479-532. (105) Mueller, L.; Legault, P.; Pardi, A. J. Am. Chem. Soc. 1995, 117, 11043-11048. (106) Allain, F. H.; Varani, G. J. Mol. Biol. 1995, 250, 333-353. (107) Lynch, S.; Pelton, J.; Tinoco, I., Jr. Magnetic Resonance in Chemistry, 1996, in press. (108) Ernst, R. R.; Bodenhausen, G.; Wokaun, A. Principles of Nuclear Magnetic Resonance in One and Two Dimensions; Clarendon Press: Oxford, 1987. (109) Varani, G.; Cheong, C.; Tinoco, I. J. Biochemistry 1991, 30, 32803289. (110) Holbrook, S. R.; Cheong, C.; Tinoco, I., Jr.; Kim, S.-H. Nature 1991, 353, 579-581. (111) Baeyens, K. J.; De Bondt, H. L.; Holbrook, S. R. Nature Struct. Biol. 1995, 2, 56-62. (112) Pleij, C. W.; Rietveld, K.; Bosch, L. Nucl. Acids Res. 1985, 13, 1717-1731. (113) Puglisi, J. D.; Wyatt, J. R.; Tinoco, I., Jr. J. Mol. Biol. 1990, 214, 437-453. (114) Wyatt, J. R.; Puglisi, J. D.; Tinoco, I., Jr. J. Mol. Biol. 1990, 214, 455-470. (115) Shen, L. X.; Tinoco, I., Jr. J. Mol. Biol. 1995, 247, 963-978. (116) Chamorro, M.; Parkin, N.; Varmus, H. E. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 713-717. (117) Chen, X.; Chamorro, M.; Lee, S.; Shen, L. X.; Hines, J. V.; Tinoco, I., Jr.; Varmus, H. E. EMBO J. 1995, 15, 842-852. (118) Chen, X.; Kang, H.; Shen, L. X.; Chamorro, M.; Varmus, H. E.; Tinoco, I., Jr. J. Mol. Biol., 1996, in press. (119) Jorgensen, W. L. Acc. Chem. Res. 1989, 22, 184-189.

13322 J. Phys. Chem., Vol. 100, No. 31, 1996 (120) Karplus, M.; Petsko, G. A. Nature 1990, 347, 631-638. (121) Kollman, P. A.; Merz, K. M., Jr. Acc. Chem. Res. 1990, 23, 246252. (122) Major, F.; Turcotte, M.; Gautheret, D.; Lapalme, G.; Fillion, E.; Cedergren, R. Science (Washington, D.C.) 1991, 253, 1255-1260. (123) Major, F.; Gautheret, D.; Cedergren, R. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 9408-9412. (124) Gautheret, D.; Major, F.; Cedergren, R. J. Mol. Biol. 1993, 229, 1049-64. (125) Michel, F.; Westhof, E. J. Mol. Biol. 1990, 216, 585-610. (126) Tanner, N. K.; Schaff, S.; Thill, G.; Petit-Koskas, E.; CrainDenoyelle, A. M.; Westhof, E. Curr. Biol. 1994, 4, 488-498. (127) Westhof, E.; Altman, S. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 5133-5137. (128) Bustamante, C. Annu. ReV. Biophys. Biophys. Chem. 1991, 20, 415-446. (129) Smith, S. B.; Finzi, L.; Bustamante, C. Science (Washington, D.C.) 1992, 258, 1122-1126. (130) Bustamante, C.; Marko, J. F.; Siggia, E. D.; Smith, S. Science (Washington, D.C.) 1994, 265, 1599-1600. (131) Bensimon, D.; Simon, A. J.; Croquette, V.; Bensimon, A. Phys. ReV. Lett. 1995, 74, 4754-4757. (132) Perkins, T. T.; Smith, D. E.; Chu, S. Science (Washington, D.C.) 1994, 264, 819-822. (133) Perkins, T. T.; Quake, S. R.; Smith, D. E.; Chu, S. Science (Washington, D.C.) 1994, 264, 822-826. (134) Perkins, T. T.; Smith, D. E.; Larson, R. G.; Chu, S. Science (Washington, D.C.) 1995, 268, 83-87. (135) Cech, T. R.; Zaug, A. J.; Grabowski, P. J. Cell 1981, 27, 487496.

Tinoco (136) Kruger, K.; Grabowski, P. J.; Zaug, A. J.; Sands, J.; Gottschling, D. E.; Cech, T. R. Cell 1982, 31, 147-157. (137) Guerrier-Takada, C.; Gardiner, K.; Marsh, T. P.; Altman, S. Cell 1983, 35, 849-857. (138) Cech, T. R.; Bass, B. L. Annu. ReV. Biochem. 1986, 55, 599629. (139) Symons, R. H. Annu. ReV. Biochem. 1992, 61, 641-671. (140) Pley, H. W.; Flaherty, K. M.; McKay, D. B. Nature 1994, 372, 68-74. Scott, W. G.; Finch, J. T.; Klug, A. Cell 1995, 81, 991-1002. (141) Gold, L.; Polisky, B.; Uhlenbeck, O.; Yarus, M. Annu. ReV. Biochem. 1995, 64, 763-797. (142) Tuerk, C.; Gold, L. Science (Washington, D.C.) 1990, 249, 505510. (143) Ellington, A. D.; Szostak, J. W. Nature 1990, 346, 818-822. (144) Pan, T.; Uhlenbeck, O. C. Biochemistry 1992, 31, 3887-3895. (145) Beaudry, A. A.; Joyce, G. F. Science (Washington, D.C.) 1992, 257, 635-641. (146) Dai, X.; De Mesmaeker, A.; Joyce, G. F. Science (Washington, D.C.) 1995, 267, 237-240. (147) Ekland, E. H.; Szostak, J. W.; Bartel, D. P. Science (Washington, D.C.) 1995, 269, 364-370. Bartel, D. P.; Szostak, J. W. Science (Washington, D.C.) 1993, 261, 1411-1418. (148) Huizenga, D. E.; Szostak, J. W. Biochemistry 1995, 34, 656665. Sassanfar, M.; Szostak, J. W. Nature 1993, 364, 550-553. (149) Prudent, J. R.; Uno, T.; Schultz, P. G. Science (Washington, D.C.) 1994, 264, 1924-1927.

JP953053P