Sequence-selective DNA recognition by synthetic ligands - American

Dec 13, 1990 - Table of Contents. 1. Introduction. 2. Protein-DNA Recognition. 3. Drug-DNA Recognition. 4. Minor-Groove Interaction. 4.1 Lexitropsins...
5 downloads 0 Views 3MB Size
Bioconjugate

Chemistry JANUARY/FEBRUARY 1991 Volume 2, Number 1 0 Copyright 1991 by the American Chemical Society

REVIEW Sequence-Selective DNA Recognition by Synthetic Ligands Peter E. Nielsen Research Center for Medical Biotechnology, Department of Biochemistry B, The Panum Institute, Blegdamsvej 3c, DK-2200 Copenhagen N, Denmark. Received December 13,1990

Table of Contents Introduction Protein-DNA Recognition Drug-DNA Recognition Minor-Groove Interaction 4.1 Lexitropsins 4.2 Oligodistamycins 4.3 Intercalator-Peptide Conjugates 5. Major-Groove Recognition 5.1 Peptides 5.2 Triple Helix 6. DNA-Modifying Ligands 7. Future Prospects 1. 2. 3. 4.

1. INTRODUCTION For more than a decade it has been a dream and a goal for organic chemists interested in DNA recognition to design reagents that bind to DNA selectively a t desired sequences. Such reagents eventually equipped with a chemical or photochemical tag capable of modifying or cleaving DNA at the site of binding should have great prospects as “synthetic restriction enzymes” and not least as gene-selective drugs. Giant steps toward this goal are being taken these years and reaching it within the not so distant future does not appear as wishful thinking anymore, but is rapidly approaching reality. In the present paper I shall briefly review the principles exploited by nature for the sequence-selective recognition of DNA by proteins (mostly involved in gene regulation) and by drugs. Next I describe how organic chemists have followed the paths devised by nature to construct synthetic DNA-sequence-selective ligands. Finally I shall try to foresee and suggest new approaches to reach the ultimate goal, e.g., the ability to construct ligands designed to bind specifically to any chosen DNA sequence (up to -20 bp). 1o43-iao219ii29o2-oooi~o2.5010

2. PROTEIN-DNA RECOGNITION The DNA double helix-and I shall limit the discussion to the B-form, although similar arguments apply to other forms such as A or Z-presents three areas of recognition, the phosphate backbone and the two grooves (major and minor), and all three features are exploited by nature in protein-DNA recognition (see refs 1-6 for recent reviews on DNA structure and protein-DNA interaction). The most detailed sequence information is available on the “floor” of the major groove. Four recognition sites are present in the major groove of AT base pairs (Figure 1). These are the adenine-N7 (hydrogen acceptor), the adenine amino group (N6) (hydrogen donor), the thymine-04 (hydrogen acceptor), and the thymine methyl group (van der Waal interaction). Similarly, the recognition sites available at GC base pairs are two hydrogen acceptors (guanine-N7 and guanine 06) and one hydrogen donor (the cytosine amino group at C4). Thus the DNA sequence can be unambigouslyread from the major groove as a “twodimensional grid” of hydrogen-donor and -acceptor and van der Waals contact sites (Figure 2). Especially, it should be noticed that ApT versus TpA (and GpC versus CpG) base pairs are easily distinguishable. Since the floor of the groove is not planer, the corresponding sites of the DNA binding ligand furthermore have to be positioned correctly in space in much the same way as the key-lock principle of the substrate-enzyme recognition. Considerably less sequence information is present in the DNA helix when read from the minor groove. The GC base pair presents two hydrogen-acceptor (guanine-N3 and cytosine-02) and one hydrogen-donor site (guanineC2N H 3 while only two hydrogen-acceptor sites (adenineN3 and thymine-02) are available a t A T base pairs. Distinguishing GpC from CpG and especially TpA from ApT sites is furthermore much more difficult from the minor groove compared to the major groove (cf. Figure 2). Indeed, efficient discrimination between A T and TA base pairs may not be possible since the recognition in both 0 1991 American Chemical Society

2

Bioconjugate Chem., Vol. 2, No. 1, 1991

Nielsen H

T

A

/ R

G

Figure 1. Chemical structures of GC and AT base pairs.

cases is a hydrogen-acceptor acceptor site quite symmetrically positioned in the groove. How much sequence information is available in the phosphate backbone in terms of interphosphate distances (groove widths), deoxyribose conformation, and electronegative potential and to what extent this is exploited by nature is still not clear. There is no doubt, however, that the DNA conformation in terms of the above parameters as well as the twist, roll, etc. and angles of the base pairs is strongly influenced by the DNA base sequence (7-10). Likewise, the biological function of the DNA is also to some degree governed by the DNA conformation (11-13). Furthermore, the electrostatic potential of the DNA grooves is strongly dependent on the DNA conformation, in particular the groove width. Specifically, it has been calculated that the deepest electronegative potential is in the minor groove of AT sequences (14). The electronegative potential of DNA most likely also serves as a “biological-recognition parameter”. The majority of the sequence-selective DNA-binding proteins studied so far recognize the DNA from the major groove (3, 5 ) and four types of structural oligopeptide motifs (the helix-turn-helix, the Zn-fingers, the Leuzipper, and the antiparallel @sheets) are primarily responsible for the DNA recognition. Of these only members of the helix-turn-helix and 0-sheet groups have been studied at atomic resolution and the results show that the recognition exploits hydrogen bonding, hydrophobic interaction, and ionic interactions. The recognition is in other words based on a three-dimensional grid more than on a “digital readout” of the bases by the protein. Thus a direct base pair-amino acid residue(s) code for recognition does not seem to exist, although certain “base pairamino acid rules” can be visualized at least in the case of lac repressor (15). The amino acid residues which are determinant for DNA-sequence recognition by the “helix-turn-helix proteins” reside in the as-helix of this motif and are found on the side of the helix facing the floor of the major groove. The second helix of the motif together with a third a-helix helps position the as-helix in the groove and interact with the backside of the as-helix, which is very lipophilic. Although X-ray structures are not yet available for Znfingers and Leu-zippers, it is almost certain that DNA recognition by Leu-zipper proteins is also mediated by a-helices. In this case two helices-turning away from each other and held together by a Leu-zipper in the middle-are placed in the DNA major groove in a scissorlike grip (16). An a-helix motif has also been implicated in the DNA recognition by the glucocorticoid-type Znfingers (5). Recently it has been found that some prokaryotic repressors (Met and Arc repressors) recognize DNA by way of two antiparallel 0-strands positioned in the DNA major groove (17, 18). Recognition of DNA via the minor groove has been suggested for a few proteins of which IHF (integration

host factor) binds sequence specifically to DNA (19).The mechanism of this recognition is still not known in detail. An X-ray crystallographic study of the Trp-repressorDNA (20) complex indicated that in this case the recognition was predominantly due to contacts with the DNA backbone and thus determined by the conformation of the DNA. It appears, however, that the DNA used in this study was not the optimal operator sequence (21). Thus an example where the DNA conformation is determinant for protein recognition is still lacking. 3. DRUG-DNA RECOGNITION Several drugs bind to DNA with remarkable sequence selectivity (22). On the basis of their mode of binding, they can be divided into intercalators and groove binders. However, most intercalators exploit groove interaction for DNA binding as well, and some groove binders a t some DNA sequences bind to DNA in an intercalationlike fashion (23). It is widely agreed that DNA intercalation, per se, does not contribute much sequence discrimination. Thus, the sequence selectivity exhibited by certain intercalators, such as echinomycin and actinomycin, is governed primarily by hydrogen bonding and hydrophobic interactions in the minor groove (24). However, the sequence preference is very “soft”. Echinomycin (Figure 3), for example, binds preferentially to GC-rich DNA sequences and especially to the GpC step, and the binding constant ratio between high-affinity and low-affinity sites is of the order of 10-20 (25). For comparison lac-repressor (a helix-turn-helix protein) binds to the operator sequence with K , i= 1013 M-l while the affinity for random DNA is 108 M-’ (26), i.e., a ratio of 105. Most minor groove DNA binding drugs, such as distamycin, netropsin, Hoechst 33258 (Figure 4), and DAPI (4’,6-diamidino-2-phenylindole) bind with significant preference (10-lo2 times) to AT-rich sequences, and the preferred binding sites contain a run of a t least four A/T’s (27-30). Several models explaining this preference for runs of A/T’s have been proposed (28,29,31), but most researchers now agree that a major contribution stems from the large electronegative potential of the narrow minor groove of A/T sequences which attracts the positively charged drugs (31) and also neatly accommodates these flat curved molecules snugly in the minor groove (Figure 5). One family of drugs, the chromomycins, which bind selectively to GC-rich sequences via minor-groove interactions have also been identified (321, and NMR results (33) indicate that drug dimer formation and significant widening of the minor groove are involved.

-

4. MINOR-GROOVE INTERACTION 4.1 Lexitropsins. From early X-ray studies on the binding of netropsin to AT-rich DNA sequence a model based on specific hydrogen bonding between the amide protons of netropsin and the hydrogen-acceptor sites in the A T minor groove was developed in which the amide protons of netropsin form bifurcated hydrogen bonds to the adenine-N7 and thymine-04 of two adjacent AT base pairs (34). On the basis of this model it was suggested that switching of the sequence preference from AT to GC should be possible by designing analogues of netropsin with hydrogen-bonding donor/acceptor properties complementary to the GC base pair steps (35, 36). Specifically, the pyrrole moieties of the netropsin were replaced by imidazole (1, Figure 5), which due to the nitrogen lone pair positioned in the minor groove was predicted to

Bioconjugate Chem., Vol. 2, No. 1, 1991 3

Review

hydrogen bond to the H2N group of guanine. The experiments with these new reagents termed lexitropsins were rather disappointing, however. Although an increased relative binding to GC-containing sequences was obtained, this was achieved by decreased affinity for AT sequences rather than by increased affinity for GC sequences (34,35). After it became clear that the cationic character of the ligand was a significant factor for conferring AT specificity, lexitropsin derivatives with only one positive charge containing imidazoles, furans, thiophenes, or thiazoles moieties (36-50) (e.g. 2-5, Figure 5) were synthesized (38-40) by Lown and co-workers, the immediate aim being GC acceptance by the lexitropsin and the ultimate goal controlling the sequence specificity of these reagents. The results so far have shown that the introduction of hydrogen-bonding acceptor groups (=Nor -0-1 pointing into the minor groove (compounds 2, 3) results in lexitropsins which no longer exhibit the pronounced (A/T)4 specificity typical for netropsin (and distamycin) but which bind to AT-GC mixed sequences as probed by DNase I footprinting. Proton NMR studies of complexes of these lexitropsins with self-complementary oligonucleotides indicate that hydrogen bonding between the guanine-NH2 and the lexitropsin acceptor moiety (immidazole or furan) may indeed take place. Somewhat unexpectedly the analogous thiazoles (e.g.,compound 5 ) exhibit an A T preference which is even stronger than that of netropsin (49). This finding (termed GC avoidance) has been rationalized on the basis of the sulfur atom which sterically clashes with guanine-NH2. Model studies reveal that phasing between the hydrogenbonding-donor/-acceptor sites in the minor groove of DNA and those of the lexitropsins becomes a problem for longer lexitropsins. It may therefore be necessary to connect such "oligolexitropsins" with linkers which both bring the individual lexitropsin units in phase with the DNA helix and at the same time make them follow the helical path of the groove (51). In an analogous approach Wade and Dervan have made distamycin derivatives containing a hydrogen-bondaccepting moiety (a pyridine) (52) and found that one of these (6: X, Y, Z = N, CH, CH) in contrast to distamycin binds preferentially to a GC-AT mixed sequence. Several other analogues of distamycin have also been prepared (53-56), but the detailed sequence preference of these reagents has not been studied yet. Finally, a few analogues of Hoechst 33258 with altered sequence preference have been described (57). In analogy to the hydrogen-bonding-acceptor lexitropsins, these drugs (e.g. 7) also bind to GC containing sequences as analyzed by footprinting and 'H NMR studies (57). It must be emphasized, however, that the binding mode of these new lexitropsins is far from fully understood and that it is not yet possible to design reagents specifically targeted to desired sequences by this approach. (For recent reviews on lexitropsins, see refs 37 and 128.) 4.2 Oligodistamycins. From statistical considerations of base contents and genomic sizes, it can be calculated that a unique sequence in Escherichia coli consists of 12 base pairs, while the corresponding number in a human cell is 15-19 base pairs (58). One goal is therefore to be able to target' ligands selectivily to sequences of this size. Targeting (A/T), sequences-although without discrimination of the positions of the A's and T's-is a relatively simple task and has been accomplished by distamycintype oligomers (59-63). The DNA binding site of these oligomers usually obey an emperical/theoretical rule stating that an oligomer with n pyrrole units, i.e. n pep-

-

-

tide bonds, binds to an (A/T),+l sequence. This behavior is consistent with the binding mode in which each amide proton forms bifurcated hydrogen bonds with the two hydrogen-acceptor sites of two adjacent AT base pairs. The largest reagent constructed so far is a "distamycin trimer" connected head to tail by 0-alanine linkers and it recognizes -16 bp (64). Some of these reagents were equipped with an EDTA ligand which functions as a DNAcleaving reagent upon complexing with Fez+. Finally, it has been very elegantly demonstrated that it is possible to design metal ion inducible DNA binding ligands by tethering individual distamycin units with metal ion (e.g. Mg2+)chelators in such a way that only the chelated form has the right conformation for DNA binding. These reagents therefore bind much more strongly to their target oligo-TA sequence in the presence of chelated divalent metal ions (65). Likewise, it is also possible to use optically active linkers. In the one case examined (tartaric acid), identical binding specificities were observed for the two forms (66). T h e R P form binds 10times less efficiently to DNA than the S,S form, however, showing that steric fit is an important parameter for DNA recognition by these compounds. For recent reviews on oligodistamycins, see refs 58 and 67. 4.3 Intercalator-Peptide Conjugates. Echinomycin is a naturally occurring intercalator-peptide conjugate in which two quinoxaline ligands are connected via a cyclic depsipeptide (Figure 3). Removal of the N-methyl groups (and exchanging the thioacetal bridge with a disulfide) gives the analogue TANDEM (68), which binds preferentially to AT-rich sequences in contrast to echinomycin which prefers the GpC step. An echinomycin analogue in which the quinoxaline moieties are replaced with acridinyl have also been synthesized (69), but the sequencebinding preference of this compound has not been reported. These results show that the sequence recognition by echinomycin is determined by the depsipeptide moiety. Thus, it could be interesting to explore the possibility of constructing sequence-selective DNA-recognizing ligands by tethering peptides to intercalators in general and in particular by using peptides as linkers for bis-intercalators. Finally, acridine-distamycin-like conjugates have been described (70, 71) which retain the AT preference of the distamycin ligand in accord with the lack of sequential DNA binding usually exhibited by simple intercalators like 9-aminoacridine. 5. MAJOR-GROOVE RECOGNITION

5.1 Peptides. Although it is not yet possible to design peptide a-helices-analogous to those found for instance in A-repressor, crolac repressor, CRP, etc. (72)-for recognition of predetermined DNA sequences, it is feasible to exploit the peptide-DNA recognition exhibited by these proteins. A pertinent question is how much of the protein is required for sequence-selective DNA binding, i.e., how large is the DNA binding domain. Recent results using the Hin recombinase indicate that three a-helices (al,(YZ, and the DNA-recognizingas)corresponding to 50 amino acids are sufficient for efficient DNA recognition. Specifically the 51-mer oligopeptide (residues 139-190) was chemically synthesized and conjugated to various DNAcleaving ligands, such as EDTA (73) or the Cu(1) binding tripeptide Gly-Gly-His (74-75) and thus converted to a sequence-selective DNA-cleaving reagent. Analogously, by conjugation of phenanthroline to the catabolite gene regulatory protein (CRP) (76) or of staphylococcal nu-

-

4

Bioconjugate Chem., Vol. 2, No. 1, 1991

Nielsen

Review

Bioconjugate Chem., Vol. 2, No. 1, 1991 5

Figure 2. Three-dimensional structure of the Drew-Dickerson dodecamer (V-CGCGAATTCGCG)Z(refs 9 and 10) part a (previous page, upper left) shows molecular graphics model colored according to atoms; parts b-e (previous page, upper right; previous page, lower left; previous page, lower right; this page, left, respectively) are the same as a, but colored according to recognition sites: hydrogen acceptor (cyan),hydrogen donor (red),and van der Waals (thymine methyl groups) (yellow). The dodecamer is shown from four angles at 90" intervals. Part f (this page, right) shows surface models of the four base pairs (from above; AT, TA, CG, and GC) colored as in b-e. The major groove is facing upward. Models were constructed on a "Personal Iris" using "Insight/Discover" software. diMe-L-Cys

L-N-Me-Val

NH;

H

II

I

Netropsin

D-Ser

H

L-N-Me-Val

N-Me-L-Cys

Echinomycin

Figure 3. Chemical structure of echinomycin.

clease to a truncated A-repressor (77), these peptides have been converted to sequence-selective DNA-cleaving reagents. Recently the putative DNA binding domain (50 amino acids) of a Leu-zipper DNA-binding protein (GCN4) containing an EDTA ligand was synthesized and shown to recognize and cleave the GCN4 recognition sequence (78). Peptide-DNA cleaver conjugates can also be used to study the DNA-binding mode of the peptide and thus the protein from which the DNA-binding domain originates. This was elegantly exploited in the studies of the DNA binding of GCN4 transcription factor (78) and Hin recombinase (79,80) using the appropriate peptide-EDTA(FeII) conjugates.

Distamycin

Hoechst 33258

Figure 4. Chemical structures of the minor-groove binders netropsin, distamycin, and Hoechst 33258.

All available genetic and biochemical data indicate that mainly the as-helix of the helix-turn-helix proteins is

6

Nielsen

Bioconjugafe Chem., Vol. 2, No. 1, 1991 H

NH2+

C

I

/ - - ? I

H U

H

G

T UY=CHIN

2

H I

N q H 2;hN & :H 'o c

0

R'

0

XN=OIS

NH',

3

A Figure 6. Chemical structuresof helix base tripletswith Watson-

Crick/Hoogsteen base pairing.

H

4

5

5.2 Triple Helix. The exploitation of DNA triple helix formation is for the time being the most versatile and rational approach to sequence-selective DNA recognition although severe limitations for a general application still exist. RNA triple helices were discovered 3 decades ago (83) in the 2:l poly(U)-poly(A) complex and several triple helix structures have been described since then (e.g. refs 84, 85). Triple-helix formation relies in most cases on Hoogsteen hydrogen bonds in the major groove between adenines, or guanines, of the second strand, and thymines, or protonated cytosines, of the third strand (Figure 6), and stable triple helices are thus formed between oligopyrimidine-oligonucleotides and "Hoogsteen-complementary" homopurine-homopyrimidine DNA sequences where the two pyrimidine strands have antiparallel orientation (Figure 7). Triple-helix formation is promoted by a high concentration of cations (e.g., 1M Na+), which are necessary for shielding of the negative phosphate backbone, and by acidic pH if cytosines are involved. However, triple helices can be formed a t physiological salt concentrations and neutral pH if spermidine (1mM) is included in the medium and they are furthermore stabilized by moderate concentrations of organic solvents (86). Triple helices can be further stabilized if an intercalator is conjugated to the third strand oligonucleotide (87-92). For example, the T,,, of an 11-mer (6T 5C) triple helix was measured to 19.3 "C (at pH 6.0,l mM spermidine). The corresponding 5'-acridine conjugate triple helix melted at 37.4 "C while the 3'-acridine conjugate melted at 25.5 "C. (Oligonucleotide conjugated with ethidium, ellipticine, or daunorubicin showed similar stabilization (90-92).) The results also indicate that triple-helix formation induces a conformational change in the DNA double helix (e.g. to an A-like conformation) which results in a strong intercalation site a t the 5'-junction. Independent experiments have confirmed this (92). In order to be able to study the triple-helix formation by affinity cleavage as well as with the goal of making sequence-targeted DNA-modifying and -cleavage reagents (artificial restriction enzymes) a large variety of DNAcleaving ligands have been conjugated to "triple-helix oligonucleotides". These include EDTA-Fe(I1) (86, 89),

+

7

Figure 5. Chemical structures of various lexitropsins and analogues of distamycin and Hoechst 33258.

responsible for DNA recognition (81, 82). It should therefore be a major goal to synthesize such modified "a3helices"-probably in a "helix-stabilized" from-which bind sequence specifically to DNA.

Review

Bioconjugate Chem., Vol. 2, No. 1, 1991 7

3'-

5'-

Figure 8. Schematic models of B-DNA helix showing attack

b

5'-TTTTTTTTTTTTTTT SI-.

,,

.AGCTTATATATATATAAAAflAAAAAAAAAATCGATAGGATC..

..

3'-,

,,

.TCGAATATATATATATTTTTTTTTTTTTTTAGCTATCCTAG..

..

5'-TTTTTCTCTCTCTCT

. 3 ' - . . . .TCGAATATATATATATTTTTCTCTCTCTCTAGCTATCCTfIG.. . .

SI-,

,,

.AGCTTATATATATATAAAAAGAGAGAGAGATCGATAGGATC,.

Figure 7. Space-filling molecular graphics model of DNA triple

helix (top). The A-form double helix is shown in grey while the third strand is in black; sequence of a typical "Hoogsteen" triple helix is shown (bottom).

azidoproflavine (87),phenanthrolineCu(1) (93-95), or porphyrin (96-98). Analogously, oligonucleotides capable of cross-linking to the target DNA have been designed by conjugation to psoralens (99) or alkylating agents (100). The major and far from trivial challenge in the triplehelix approach is to target purine/pyrimidine mixed DNA sequence. Molecular modeling reveals that the third helix is asymmetrically positioned in the major groove, since it is only reading the information of one side of the groove, the purine side. Thus in order to read mixed sequences two possibilities appear. Either the reading base (which in principle could be any aromatic ligand) must switch from one side of the groove to the other when encountering a purine-pyrimidine step or it must read the information of the pyrimidine half of the base pair. Both approaches have been tried without yet reaching a general solution. It has been shown that 5'-PumPy, (Pu = purine, Py = pyrimidine) sequences may be recognized by third strand oligonucleotides of the type 5'-Pum-linker-3'-Py,-l in which the linker couples the two Py oligonucleotides in a tail to tail fashion which allows for a "groove switch" over one base pair (which is not read) in the Pu-Py junction of the double helix (101). In this way both the Pu sequence and the Py sequence is read by conventional Hoogsteen base pairing. Finally, it has been shown that a single T A base pair in a purine sequence can be read by a guanine in the triple strand (GoTA) (102),but whether this finding can be extended to additional TA interruptions or interrupting A runs is not y e t h o w n and probably doubtful. The "Hoogsteen-type" triple helix may only be one out of a family of mutually incompatible triple helices since it has been reported that G-GC triplets combined with occasional TOAT and ToTA triplets may be formed in triplehelix structures in which the two G strands are antiparallel (103, 104). It is thus pertinent that the basic physicochemical properties of oligonucleotide triple helices be explored in

T

4

I I I I I 11.1 I I1

from the major (largearrows) and the minor (smallarrows) groove. terms of possible structures, their stabilities, rate of formation, and the relative stability of individual triplets in various environments. The biological and potential clinical prospects for triplehelix-based ligands (drugs) seem very bright judging from recent reports of efficient blockage of protein-DNA interaction and restriction enzyme DNA cleavage by sequence-directed triple-helix formation (105,106). Even more interesting, preliminary results indicate that oligonucleotides targeted to regulatory sequences via triplehelix formation exhibit the predicted biological effect in vivo. Specifically it was shown that a (dT)B acridine conjugate targeted to the origin of replication of SV40 inhibits virus replication in CV-1 cells (107). One of the arguments against the feasibility of using oligonucleotides as drugs has been their presumed inability to cross cell membranes due to their polyanionic character. This seems not to be a problem. Acridine-conjugated oligonucleotides are taken up by cells quite efficiently (108) and recent results have indicated that even natural oligonucleotides are taken up by cells in a receptor-mediated transport (109). The possible use of "triple-helix DNA cutters" as "rarecutter restriction enzymes", e.g., in connection with the human genome sequencing project, is also close to reality. I t has been demonstrated that a single sequence-specific cut can be introduced in DNA molecules of more than lo5 base pairs in length, although the cleavage efficiency is still only 25-70?; a t best (94,110-112) (see refs 67, 112, 113for more specific reviews on the triple-helix approach). 6. DNA-MODIFYING LIGANDS A DNA-cleaving ligand is often included in the design of sequence-directed reagents. This serves two purposes. Primarily, these reagents are designed to cleave DNA sequences selectively either with the aim of constructing "synthetic restriction enzymes" or with the aim of deactivating specific DNA sequence in vivo in the course of "gene-directed chemotherapy". Additionally, however, the presence of a DNA-cleaving ligand in the reagent also provides a convenient means by which the sequence selectivity and DNA-binding mode of the reagent may be analyzed by "affinity cleavage" or "autofootprinting". The cleavage pattern of the two DNA strands even reveals if the cleaving ligand is bound in the minor or in the major groove. Due to the asymmetry of the B-DNA helix, cleavage from the major groove will result in a 2 or 3 base pair stagger in the 3'-direction between the two DNA strands whereas major-groove cleavage will result in a 4 or 5 base pair stagger in the 5'-direction (Figure 8). A large variety of DNA-cleaving ligands have been used for this purpose, and virtually all of them have been chosen from the broad selection of footprinting reagents which are now a t hand (116) (Figures 9 and 10). Dervan and co-workers have used the versatile EDTA-Fe( 11)complex

8

Biocon/ugate Chem., Vol. 2, No. 1, 1991

Nielsen

also proven very versatile (76, 94, 118, 119) and a DNA double strand cleavage yield of 70 % has been reported for this ligand conjugated to an oligonucleotide (112). Photo-cross-linking of the reagent to the DNA target has been obtained with the above mentioned azidoproflavin and azidophenacyl ligands as well as with psoralens (ref 99 and refs cited in 120) and quinones (121). Chemical cross-linking was accomplished by various alkylating ligands, such as alkyl halides (123-125) and nitrogen mustards (122). Ideally, one would require a DNA-cleaving ligand with catalytic activity (one ligand cleaving more than one target). This is probably to some extent the case with the EDTA-Fe(I1) and the phenanthroline-Cu(1) complexes. Furthermore, the mechanism should preferentially be hydrolytic-at least in the case where “synthetic restriction enzymes”are the goal-in order for the products to be processable by conventional enzymes. In other words, a synthetic analogue of the catalytic site of a DNase like DNase I or micrococcus nuclease would be “handy”.

AR ‘C”