DNA Triplexes: Solution Structures, Hydration Sites, Energetics

Juan Luis Asensio, Reuben Carr, Tom Brown, and Andrew N. Lane .... Karen M. Vasquez, Theodore G. Wensel, Michael E. Hogan, and John H. Wilson...
0 downloads 0 Views 5MB Size
Volume 33, Number 38

0 Copyright 1994 by the American Chemical Society

September 27, 1994

Perspectives in Biochemistry DNA Triplexes: Solution Structures, Hydration Sites, Energetics, Interactions, and FunctiontYA Ishwar Radhakrishnan' and Dinshaw J. Patel'JsS Department of Biochemistry and Molecular Biophysics, College of Physicians and Surgeons, Columbia University, New York,New York 10032,and Cellular Biochemistry and Biophysics Program, Memorial Sloan- Kettering Cancer Center, New York,New York 10021 Received July 6, 1994

Historical Perspective. The formation of a specific complex between two strands of poly(uridy1ic acid) with one strand of poly(adeny1ic acid) in the presence of divalent cations was first described by Felsenfeld et a f . (1957). Subsequently, triple-stranded complexes containing other combinations of polynucleotide strands were described including those belonging to the deoxyribose series [reviewed in Felsenfeld and Miles (1967), Wells et a f . (1988), Cheng and Pettitt (1992b), and Sun and H6lbne (1993)l. These studies culminated in lowresolution, X-ray fiber-diffraction models for the (A),-2(U),, (A),.2(1),,, and (A),.2(T), combinations (Arnott & Selsing, 1974, and references cited therein). Further, a potential biological function for these structures was identified when they were found to inhibit RNA1 polymerase-mediated transcription in vitro (Morgan & Wells, 1968). Interest in triplexes revived followingthe discoveryof singlestrand-specific S1 endonuclease hypersensitive sites in the upstream regions of several eukaryotic genes (Larsen & Weintraub, 1982). These sites were mapped to (R),.(Y), sequences which, when subcloned into supercoiled plasmids, exhibited the same sensitivity toward S1 nuclease [reviewed This work was supported under NIH Grant GM 34504 to D.J.P. Coordinates for the Y.RY triplexes containing G-TA (Accession No. 149D) and T-CG (Accession No. 177D) triples and the R-RY triplex (AccessionNos. 134D, 13SD, 136D) havebeendepositedwith the Protein Data Bank, Brookhaven National Laboratory, Upton, NY 11973, from whom copies can be obtained. * Address correspondenceto this author at Memorial Sloan-Kettering Cancer Center. t Columbia University. 8 Memorial Sloan-Kettering Cancer Center. I Abbreviations: DNA, deoxyribonucleicacid;RNA, ribonucleicacid; G, guanine;A, adenine;T,thymine;C, cytosine;I, inosine;NMR, nuclear magnetic resonance;NOE, nuclear Overhauser effect; RMD, restrained molecular dynamics. A

0006-2960/94/0433-11405%04.50/0

in Wells et a f .(1988) and Htun and Dahlberg (1989)l. After a brief controversy, the observed hypersensitivity was assigned to the single-stranded regions arising from a duplex intramolecular triplex (+ single strand) transition (Lee et af., 1984; Christophe et a f . , 1985; Lyamichev et al., 1986). Site-specific binding of an oligonucleotide to the major groove of a DNA duplex via intermolecular triple helix formation was demonstrated by Moser and Dervan (1987). Simultaneously, Helene and co-workers reported the formation of a triple-stranded complex between an a-oligonucleotide and its cognate DNA duplex sequence (Le Doan et al., 1987). Both studies demonstrated a new strategy for recognition of specific double-stranded DNA sequences with far-reaching implications in the field of genetics, biochemistry, and medicine. Triplex-Based Applications. The power of triplex-based approaches lies mainly in the extreme specificity of the interaction between third strand oligonucleotides and their cognate duplex DNA. Equipping third strand oligonucleotides with cleaving agents such as Fe(I1)-EDTA (Moser & Dervan, 1987) or copper(I1) phenanthroline (Franpois et al., 1989), for instance, allows them to be used as "rare-cutting" artificial endonucleases. A vivid demonstration of this approach is provided by the cleavage (with more than 80% yield) of a unique site 16 base pairs in length within 10 gigabase pairs of genomic DNA (Strobe1 & Dervan, 1991; Strobe1 et af., 1991). This result has implications in genome mapping because it may facilitate the dissection of chromosomes into megabase-sized fragments which can then be analyzed separately. The therapeutic implications of site-specific triplex formation are also of interest. Several studies have shown that triplex-forming oligonucleotideswhen targeted to the promoter

-

0 1994 American Chemical Society

11406 Biochemistry, Vol. 33, No. 38, 1994 (*) TL4#TI5 \ C l - y - :

T

G14-Gl3-Al2

qnl’.

0

3’&21-CZO-T19 t

x

T11

AlO-A9

G18

T17 -T16

0

0

#Tu%

5 --T6

x

X

TL3



I

Perspectives in Biochemistry

0

\

x -G8 .

T5-T6

-5’

x

TL3

TL8

A10-A9

I

0

-C15

t

x .

-C?

*Tu%

TL7

\

x -G8

-5’

I

0

T17 -T16 -C15 t

t +

TL8 TL.9

%lo’

FIGURE1: Oligonucleotide sequences of intramolecular Y-RY DNA triplexes containing (A) a G.TA triple and (B) a T-CG triple that were employed for NMR studies. Pairing alignments of (C) C+.GC, (D) TeAT, (E) G-TA, and (F) T-CG triples were taken from the solution structures of Y.RY triplexes (Radhakrishnan & Patel, 1994a,c). The (+) and (-) indicate the relative strand orientations. and coding regions of specific genes can alter the expression patterns presumably by inhibiting the initiation and elongation of mRNA synthesis by RNA polymerase (Cooney et al., 1988; Maher et al., 1990, 1992; Durland et al., 1991; Grigoriev et al., 1992; Young et al., 1991; Duval-Valentin et al., 1992). This opens the possibility of selectively controlling the expression of specific genes, which could be of great value in the treatment of genetic disorders. Considerable progress in understanding the structure and energetics of triple-stranded structures has been achieved in recent years. In this review, we shall endeavor to reconcile the conclusions drawn from a variety of biochemical and biophysical studies, within a broad structural framework. Solution Structure Determination Structural Motif Definitions. The current definition of a triple helix in nucleic acids is similar in spirit to what was originally proposed (Felsenfeld et al., 1957). It results from specific, Hoogsteen or reversed-Hoogsteen-type hydrogenbonding interactions (Hoogsteen, 1963) between bases in a homopurine strand of a Watson-Crick duplex and an additional oligonucleotide strand (henceforth designated strand 111; for brevity, the Watson-Crick paired homopyrimidine and homopurine strands will be designated strands I and 11, respectively). At least two structural motifs in DNA triplexes have been characterized. In the so-called “pyrimidine motif“ or pyri-

midine-purine-pyrimidine (Y-RY) triplexes, a pyrimidinerich third strand is aligned in parallel with the purine strand of the Watson-Crick duplex to form T-AT and C+.GC base triple combinations.2 In the “purine motif“ or purine-purinepyrimidine (R-RY) triplexes, a purine-rich third strand is aligned antiparallel to the purine strand, yielding G-GC, AaAT, and T-AT base triple combinations. Experimental N M R Design. In recent years, solution-state nuclear magnetic resonance (NMR) spectroscopyhas provided structural information on nucleic acids at levels comparable to those obtained routinely from X-ray crystallographic analyses. This inspired analogous N M R approaches on singlestranded oligonucleotide sequences that form intramolecular triple-stranded structures with defined strand orientations (Haner & Dervan, 1990;Sklenhr & Feigon, 1990; Radhakrishnan et al., 1991a). In order to optimize the thermal stability of these triplexes without substantially increasing the complexity of N M R spectra, the stem length and loop sizes were generally restricted to seven base triples and five thymine residues, respectively (Radhakrishnan et al., 1991a,b). Y-RY triplexes containing either a G-TA or a T C G triple (see below) in addition to canonical T-AT and C+-GC triples (Figure 1) were characterized with Na+ as the counterion at high resolution (Radhakrishnan & Patel, 1994a,c). The incorIn this notation, the third strand base is written first and the dot (-) signifies the Hoogsteen-type interaction.

Perspectives in Biochemistry

5' -T1x

Biochemistry, Vol. 33, No. 38, 1994

C2 - C 3 -T4 -C5 x

x

x

x

-C6 -T7 x

~~TL6'A14-G13-G12-~l--G10-G9 TLa 0 0 . 0 0 0 \ Tl5-G16-G17-Tl8-G19420 TL9nlO'

x

,n1-w

- A8.me'd -721 - 3'

\

Tw

I

(B)

5' -T1-

C2

.

- C3

C5 -C6

11407

- T7.TLI-Ty

I

FIGURE2: Oligonucleotide sequences of intramolecular R.RY DNA triplexes containing (A) G.GC and T-AT triples and (B) G-GC,T-AT, and ASATtriples that were employed for NMR studies. Pairing alignments of ( C ) G.GC and (D) T-AT triples were taken from the solution structures of an R-RY triplex (Radhakrishnan & Patel, 1993). (E) Pairing alignment of A.AT triples in R-RY triplexes was taken from a hand-built model that is consistent with experimental data (Radhakrishnan et al., 1993b). The (+) and (-) indicate the relative strand orientations.

poration of noncanonical triples resulted in increased dispersion of the N M R signals, allowing a detailed exploration of the conformational features of Y-RY triplexes in general and the unusual triples in particular. Likewise, the nonisomorphic nature of G-GC and T.AT triples (see below) permitted the structure of an R-RY triplex (Figure 2) to be determined at high resolution (Radhakrishnan & Patel, 1993). The tendency of G-rich sequences to form higher order aggregates in the presence of Na+ counterions warranted the use of Li+ as the counterion for R-RY triplexes. The high molecular weight (ca. 10 000) of such oligonucleotide sequences sometimes necessitated homonuclear three-dimensional N M R approaches in D20 solution to resolve ambiguities and maximize the spectral information content involving nonexchangeable protons (Radhakrishnan et al., 1992; Radhakrishnan & Patel, 1994a,c). Structure Determinations. Structures were determined by incorporating the N M R data initially in the form of distance restraints (- 16-19 exchangeable and nonexchangeable proton restraints per residue per structure), with the distancerestraints involving nonexchangeable protons subsequently replaced by intensity restraints (- 12-14 nonexchangeable proton restraints per residue per mixing time per structure) in restrained molecular dynamics (RMD) simulations. These calculations were conducted either in the presence of explicit solvent and counterions as in the case of Y-RY triplexes (Radhakrishnan

& Patel, 1994a,c) or in vacuo with a distance-dependent dielectric to mimic the effect of solvent as in the case of the R-RY triplex (Radhakrishnan & Patel, 1993). Conformational searches were conducted at high temperature (400 K) typically for 10 or 20 ps and at room temperature for about 10ps. Convergence to morphologically similar structures with atomic root-mean-squaredeviations of less than 1 A (measure of precision) was usually achieved from dissimilar starting structures and initial conditions. The low R-factors (measure of accuracy) and consistency with nonquantifiable N M R data (e.g., chemical shifts) confirmed the validity of the final structures.

Y-RY Triplexes Base Triple Combinations, Pairing Alignments, and Strand Orientation. Analogous to U-AU and C+.GC combinations in ribonucleotides (Felsenfeld et al., 1957; Lipsett, 1964), T-AT and C+.GC combinations in deoxyribonucleotides (Riley et al., 1966; Lee et al., 1979) were identified by early studies on triplex polymers. Recent studies have revealed at least two additional base triple combinations within this motif, including G-TA (Griffin & Dervan, 1989) and T-CG (Yoon et al., 1992). These triples differ from canonical T-AT and C+.GC triples in that they result from specific interactions between third strand bases and pyrimidine residues in a purinerich strand.

Perspectives in Biochemistry

11408 Biochemistry, Vol. 33, No. 38, 1994

I

,r

. H -

III

a

d

n FIGURE3: (A) Triple-stranded region in the solution structure of a YoRY triplex containing a G-TA triple (Figure 1A). Labels I, 11, and I11 identify the 5'-ends of strands I, 11, and 111, respectively. (B) olecular surface view of the same triplex emphasizing the nature of the Crick-Hoogsteen groove. The molecular surface was mapped usi a spherical probe of radius 1.4 A in the GRASP program. The most convex, concave, and planar regions of the surface are coded green, gray, and white, respectively;colors are linearly interpolated A the third strand relative to t puri exes was established by affinity cleavage and chemi lecular (Moser & Derv triplexes [reviewed in We same result was obtained for a-oligodeoxyribonucleotide evidence for the pairing align was first provided by high-resolution NMR spectra of intermolecular triplexes & Feigon, 1989). Both to be stabilized by two bases on strands I1 and I11 without affecting the WatsonCrick interactions between bases on strands I and I1 (Figure lC,D). Global Structural Features. Detailed insight into t conformational features of YoRY triplexes has been provide tion structures (Radhakrishnan & Patel, 1994a,c) that re determined using restraints derived from N M R data adhakrishnan et al., 1991a, 1992) for the sequences shown panels A and B of Figure 1. A representative structure etermined for the triplex containing a noncanonical G-TA is shown in Figure 3A. The structure calculations, taken using RMD methods in the presence of explicit solvent and counterions,were greatly aided by the high quality R spectra and the large number of interproton icularly those between residues in strands I1 and 111. Analyses of the helicoidal structures indicate that the binding of the third strand induces global conformational changes in the Watson-Crick-paired x region of the triplex. This is most dramatically ted by the x-displacement values (N-2 A) for the region in the two solution structures (Radhakrishnan 1, 1994a,c), which is significantly different from what mally observed for canonical B-form helices in crystal structures (