VOLUME 30
NUMBER 5
®
MAY 1997 Registered in U.S. Patent and Trademark Office; Copyright 1997 by the American Chemical Society
RNA-Protein Intermolecular Recognition GABRIELE VARANI* MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K. Received June 24, 1996
Background RNA molecules play a central role in all the main functions of living molecules: storage of genetic information, propagation of the genetic material, and enzymatic activity. RNA molecules do not perform those functions alone, but in tight association with RNA-binding proteins. Thus, RNAprotein recognition is central to understanding a wide range of biological processes. RNA-protein recognition cannot be undertstood without recognizing that the diversity and complexity of RNA structures are comparable to those of proteins. This structural diversity defines an enormous variety of sites and shapes for intermolecular recognition. In this Account, I will describe our current knowledge of the molecular basis of RNA-protein recognition. Since detailed thermodynamic and atomic-level structural information on RNA-protein complexes is very scarce, this Account will inevitably raise more questions than it will give answers. Thus, it will provide at least one good explanation for the ever increasing interest in this area of biochemistry.
Affinity and Specificity Proteins bind RNA with the affinity required for complex formation at the concentrations of reagents present in Gabriele Varani was born in Carrara, Italy, in 1959. He graduated from the University of Milano in physics and completed a Ph.D. in biophysics from the University of Milano under the supervision of Professor Giancarlo Baldini in 1987. After postdoctoral work with Professor Ignacio Tinoco at the University of California in Berkeley, Dr. Varani joined the MRC Laboratory of Molecular Biology in Cambridge in 1992 as a group leader and is now a senior member of the staff. Dr. Varani’s interests are the determination of RNA structure by NMR spectroscopy and the study of RNA recognition by proteins and small molecular weight ligands. S0001-4842(96)00035-0 CCC: $14.00
1997 American Chemical Society
living cells and for regulation of biological function. For example, transfer RNA (tRNA) synthetase enzymes bind tRNA with modest affinity (the bimolecular dissociation constant Kd ≈ 10-6 M): these enzymes would be ineffective if the substrate was not released after catalysis. In contrast, components of RNA-protein complexes that are permanently assembled (such as the human U1A protein) bind cognate RNAs much more tightly (Kd < 10-9 M). The analysis of the few existing structures of RNA-protein complexes determined at atomic resolution reveals that affinity is not simply related to parameters such as protein charge or the size of intermolecular interface area. For example, tRNA-synthetase complexes have very large interface areas but bind RNA weakly, while the human U1A protein, with a small interface area, binds RNA very tightly. Understanding the molecular determinants of the binding energy of an RNA-protein interaction is a very important but insufficient goal: specificity is equally important. Many RNA-binding proteins bind any RNA weakly, regardless of its sequence or structure. However, biological function requires discrimination of cognate RNAs (the correct, relevant targets) from noncognate RNAs, which are present in very large excess in the cell. The difference in binding energy between cognate and noncognate RNAs defines how specific an RNA-protein interaction is. Understanding this specificity adds a further, intriguing dimension to the characterization of these intermolecular recognition events.
RNA Recognition Differs from DNA Recognition The many existing structures of DNA-protein complexes (>150) define an important paradigm in intermolecular recognition. Very often, DNA-protein recognition occurs by insertion of an R-helix into the major groove of doublestranded DNA.1 A specific DNA sequence is then recognized through the formation of extensive hydrogen bonding and van der Waals interactions with the bases (“direct readout”) and by recognition of sequence-dependent conformational features through electrostatic interactions with the negatively charged phosphodiester backbone * E-mail:
[email protected]. FAX: ++44-1223-213556. Phone: ++44-1223-402417. (1) Steitz, T. A. Q. Rev. Biophys. 1990, 23, 205-280. VOL. 30, NO. 5, 1997 / ACCOUNTS OF CHEMICAL RESEARCH
189
RNA-Protein Intermolecular Recognition Varani
FIGURE 1. DNA and RNA double helices present different structures. The wide major groove of B-form DNA (right) sharply contrasts the narrow and deep major groove of A-form RNA (left). (“indirect readout”). This paradigm cannot be applied to RNA recognition. As shown in Figure 1, the major groove of double-helical RNA is too narrow to allow the insertion of a protein R-helix or β-strand. Some DNA-binding proteins2-5 bind in the minor groove, and the RNA minor groove is well accessible (Figure 1). However, the chemical groups exposed in the minor groove of nucleic acid bases are not diverse enough among different nucleotides to allow effective discrimination.6 Consequently, all known sequence-specific RNA-binding proteins recognize singlestranded regions and hairpin loops, where the functional groups on the bases become accessible (Figure 2). RNA double-helical regions are recognized only when structural distortions in the double helix generated by internal loops or bulges (Figure 2) allow access to the major groove. Further differences exist between DNA-protein and RNA-protein recognition. All DNA double-helical structures are very similar (at least to to a first approximation). Thus, sequence-specific DNA recognition requires a very precise reading of the identity of individual nucleotides within the DNA double helix. The diversity of RNA structures (Figure 2) favors the recognition of unique shapes and charge distributions of different RNAs. Thus, despite the chemical similarities between RNA and DNA, the many important lessons of DNA-protein recognition are only partially applicable to RNA recognition.
rβ Protein Domains Many RNA-binding proteins contain modules of 60-90 amino acids that are responsible for RNA recognition and auxiliary domains that perform additional functions.7-9 (2) Werner, M. H.; Huth, J. R.; Gronenborn, A. M.; Clore, G. M. Cell 1995, 81, 705-714. (3) Love, J. J.; Li, X.; Case, D. A.; Giese, K.; Grosschedl, R.; Wright, P. E. Nature 1995, 376, 791-795. (4) Kim, J. L.; Nikolov, D. B.; Burley, S. K. Nature 1993, 365, 520-527. (5) Kim, Y.; Geiger, J. H.; Hahn, S.; Sigler, P. B. Nature 1993, 365, 512520. (6) Seeman, N. C.; Rosenberg, J. M.; Rich, A. Proc. Natl. Acad. Sci. U.S.A. 1976, 73, 804-808. (7) Burd, C. G.; Dreyfuss, G. Science 1994, 265, 615-621. 190 ACCOUNTS OF CHEMICAL RESEARCH / VOL. 30, NO. 5, 1997
FIGURE 2. RNA-binding proteins do not target double-stranded RNA in a sequence-specific manner, but recognize instead single-stranded regions (hnRNP C) or sites of local distortions induced in double-helical regions by RNA hairpins (U1A, U1 70K, EIAV Tat, N, ...), bulges (HIV Tat), or internal loops (U1A, HIV Rev, TFIIIA, ...).
The three most common RNA-binding modules, ribonucleoprotein (RNP), K-homology (KH), and doublestranded RNA-binding (dsRBD) domains, have compact, globular structures (Figure 3) and constitute independent structural domains and RNA-binding units.8 The dsRBD domain is a general double-stranded RNAbinding module.10-12 Isolated domains bind doublestranded RNA of any sequence with little or no specificity,13 but multiple dsRBD domains may specifically recognize certain RNA structures.13,14 KH domains appear to be non-sequence-specific single-stranded RNA-binding proteins. KH proteins have been associated with important biological functions; a single amino acid substitution (8) Nagai, K. Curr. Opin. Struct. Biol. 1996, 6, 53-61. (9) Biamonti, G.; Riva, S. FEBS Lett. 1994, 340, 1-8. (10) St Johnston, D.; Brown, N. H.; Gall, J. G.; Jantsch, M. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 10979-10983. (11) Bycroft, M.; Gru ¨ nert, S.; Murzin, A. G.; Proctor, M.; St Johnston, D. EMBO J. 1995, 14, 3563-3571. (12) Kharrat, A.; Macias, M. J.; Gibson, T. J.; Nilges, M.; Pastore, A. EMBO J. 1995, 14, 3572-3584. (13) Clarke, P. A.; Mathews, M. B. RNA 1995, 1, 7-20. (14) Ferrandon, D.; Elphick, L.; Nu¨sslein-Volhard, C.; St Johnston, D. Cell 1994, 79, 1221-1232.
RNA-Protein Intermolecular Recognition Varani
convergent solutions to a common structure for RNA binding. These considerations strongly suggest that the Rβ structure represents a particularly favorable platform for RNA recognition.
Arginine-Rich Motif
FIGURE 3. Three-dimensional structure of the three most common RNAbinding protein structures: RNP domain (left),18,63 KH domain (center),15 and dsRBD domain (right).11,12 that unfolds a human KH domain15 leads to fragile-X syndrome,16 the most common cause of inherited mental retardation. The RNP domain is one of the most common protein structures in higher organisms, comprising over 600 sequences. It is identified by two highly conserved amino acid sequences (RNP-1 and RNP-2)17 located in the central strands of an antiparallel β-sheet.18,19 RNP proteins specifically recognize both single-stranded and highly structured RNAs. RNP,18 KH,15 and dsRBD11,12 are Rβ proteins with an antiparallel β-sheet on one face of the protein packed by a hydrophobic core against an R-helical face (Figure 3). Although a number of all-helical RNA-binding proteins have been recently identified,20-22 the Rβ structural theme is conserved in many RNA-binding proteins that do not share sequence homology with these three motifs, including ribosomal proteins23 and other factors involved in protein synthesis.23-26 In some cases (for example, ribosomal proteins L12 and L30 and RNP proteins, or dsRBD proteins and ribosomal protein S5) a similar arrangement of secondary structure elements indicates that these proteins originate from a common ancestor. In general, different Rβ RNA-binding proteins have different topology of the secondary structure elements: the RNP domain has a repeated βRβ arrangement,18,19 dsRBDs have RβββR topology,11,12 and KH proteins have βRRββR fold.15 These structural differences and the low sequence homology indicate that the different domains represent distinct, (15) Musco, G.; Stier, G.; Joseph, C.; Castiglione Morelli, M. A.; Nilges, M.; Gibson, T. J.; Pastore, A. Cell 1996, 85, 237-245. (16) Siomi, H.; Siomi, M. C.; Nussbaum, R. L.; Dreyfuss, G. Cell 1993, 74, 291-298. (17) Query, C. C.; Bentley, R. C.; Keene, J. D. Cell 1989, 57, 89-101. (18) Nagai, K.; Oubridge, C.; Jessen, T. H.; Li, J.; Evans, P. R. Nature 1990, 348, 515-520. (19) Hoffman, D. W.; Query, C. C.; Golden, B. L.; White, S. W.; Keene, J. D. Proc. Natl. Acad. Sci. U.S.A. 1991, 83, 2495-2499. (20) Markus, M. A.; Hinck, A. P.; Huang, S.; Draper, D. E.; Torchia, D. A. Nat. Struct. Biol. 1997, 4, 70-77. (21) Xing, Y.; GuhaThakurta, D.; Draper, D. E. Nat. Struct. Biol. 1997, 4, 24-27. (22) Berglund, H.; Rak, A.; Serganov, A.; Garber, M.; Ha¨rd, T. Nat. Struct. Biol. 1997, 4, 20-23. (23) Liljas, A.; Garber, M. Curr. Opin. Struct. Biol. 1995, 5, 721-727. (24) Biou, V.; Shu, F.; Ramakrishnan, V. EMBO J. 1995, 14, 4056-4064. (25) Garcia, C.; Fortier, P.-L.; Blanquet, S.; Lallemand, J.-Y.; Dardel, F. J. Mol. Biol. 1995, 254, 247-259. (26) Kang, C.-H.; Chan, R.; Berger, I.; Lockshin, C.; Green, L.; Gold, L.; Rich, A. Science 1995, 268, 1170-1173.
A sequence of 10-15 amino acids rich in arginines and lysines mediates RNA recognition in the so-called “basicdomain” class of RNA-binding proteins.27 This feature is often referred to as a domain, but does not constitute an independent structural domain or RNA recognition unit. Structural analysis at atomic resolution has so far been unsuccesful for this protein family, but extensive data exist on short peptide models of the arginine-rich region. Remarkably, peptides as short as 10-15 amino acids bind RNA with comparable affinity to the corresponding protein (Kd ≈ 10-9 M in many cases28), but fail to discriminate cognate binding sites from noncognate RNAs. For example, a basic peptide mimic of the human immunodeficiency virus (HIV-1) Rev protein binds its cognate RNA only 2-fold better than noncognate RNAs, while the full Rev protein binds its cognate RNA 1000-fold better than other substrates.29 Peptide models have provided very valuable insight into fundamental mechanisms of RNA recognition by basicdomain proteins, but have not addressed satisfactorily the origin of specificity. Arginine-rich peptides are unstructured (with a single exception30), but display a stable conformation upon RNA binding.31-34 Each RNA structure presents a distinct array of hydrogen bond donors and acceptors and distribution of negative charges on the phosphates. Binding to either specific or nonspecific sites orders the unstructured peptides. Adaptation of the flexible peptide structures to the shape and charge distribution of different RNA targets would maximize favorable electrostatic interactions between the side chains of these very basic peptides and the negatively charged RNA backbone. Short peptides unconstrained by a protein scaffold may be unable to discriminate distinct RNA sites because they can adapt their conformation equally well to the shape of different RNAs. Many basic-domain proteins recognize structural distortions in regular RNA double helices induced by internal loops or bulges (Figure 2). For example, HIV-1 Rev protein binds an internal loop containing non-Watson-Crick base pairs,33-38 whereas HIV-1 Tat protein binds a tri-nucleotide (27) Lazinski, D.; Grzadienska, E.; Das, A. Cell 1989, 59, 207-218. (28) Weeks, K. M.; Ampe, C.; Schultz, S. C.; Steitz, T. A.; Crothers, D. M. Science 1990, 249, 1281-1285. (29) Daly, T. J.; Doten, R. C.; Rusche, J. R.; Auer, M. J. Mol. Biol. 1995, 253, 243-258. (30) Mujeeb, A.; Bishop, K.; Peterlin, B. M.; Turck, C.; Parslow, T. G.; James, T. L. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 8248-8252. (31) Puglisi, J. D.; Chen, L.; Blanchard, S.; Frankel, A. D. Science 1995, 270, 1200-1203. (32) Ye, X.; Kumar, R. A.; Patel, D. J. Chem. Biol. 1995, 2, 827-840. (33) Battiste, J. L.; Mao, H.; Rao, N. S.; Tan, R.; Muhandiram, D. R.; Kay, L. E.; Frankel, A. D.; Williamson, J. R. Science 1996, 273, 1547-1551. (34) Ye, X.; Gorin, A.; Ellington, A. D.; Patel, D. J. Nat. Struct. Biol. 1996, 3, 1026-1033. (35) Heaphy, S.; Finch, J. T.; Gait, M. J.; Karn, J.; Singh, M. Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 7366-7370. (36) Iwai, S.; Pritchard, C.; Mann, D. A.; Karn, J.; Gait, M. J. Nucleic Acids Res. 1992, 20, 6465-6472. (37) Battiste, J. L.; Tan, R.; Frankel, A. D.; Williamson, J. R. Biochemistry 1994, 33, 2741-2747.
VOL. 30, NO. 5, 1997 / ACCOUNTS OF CHEMICAL RESEARCH 191
RNA-Protein Intermolecular Recognition Varani
tRNA Recognition by Synthetase Enzymes
FIGURE 4. Structures of RNA-peptide complexes. The open major groove generated by an RNA bulge (left) or internal loop (right) is recognized by an antiparallel β-hairpin in the BIV Tat-TAR structure on the left32 or an extended r-helix in the HIV Rev-RRE structure shown on the right.33 bulge.39 Rev-derived peptides are superficially similar to DNA-binding proteins, since binding occurs in the RNA major groove and the peptide becomes R-helical upon binding33,34 (Figure 4, right). However, considerations of the RNA structure highlight fundamental differences. The structural distortion generated by non-Watson-Crick base pairs opens up the major groove of the RNA to allow the “direct” recognition of the base functionality in the vicinity of the structural distortion (Figure 4). Furthermore, these intramolecular (as well as some intermolecular) interactions stabilize a unique RNA structure that may facilitate the indirect readout of the electrostatic potential of the phosphodiester backbone. The complex between the bovine immunodeficienty virus (BIV) Tat-derived peptide and its cognate RNA31,32 shows a remarkable similarity with DNA recognition by ribbon-helix-helix proteins.40,41 In this complex, two unpaired nucleotides within an otherwise regular RNA helix open up the major groove to allow insertion of a β-ribbon peptide structure (Figure 4, left). The shape of two-stranded antiparallel β-ribbons closely matches that of double-stranded nucleic acids to easily fit DNA4,5,42 and RNA43 minor grooves, DNA major grooves,40,41 and distorted RNA major grooves.31,32 2′-OH groups in the RNA minor groove are also regularly spaced to donate hydrogen bonds to backbone carbonyl groups from antiparallel β-sheets.43 More examples of β-ribbon-RNA recognition will probably emerge in the future. (38) Peterson, R. D.; Bartel, D. P.; Szostak, J. W.; Horwath, S. J.; Feigon, J. Biochemistry 1994, 33, 5357-5366. (39) Gait, M. J.; Karn, J. TIBS 1993, 18, 255-259. (40) Raumann, B. E.; Rould, M. A.; Pabo, C. O.; Sauer, R. T. Nature 1994, 367, 754-757. (41) Somers, W. S.; Phillips, S. E. V. Nature 1992, 359, 387-393. (42) Church, G. M.; Sussman, J. L.; Kim, S.-H. Proc. Natl. Acad. Sci. U.S.A. 1977, 74, 1458-1462. (43) Carter, C. W. J.; Kraut, J. Proc. Natl. Acad. Sci. U.S.A. 1974, 71, 283287. 192 ACCOUNTS OF CHEMICAL RESEARCH / VOL. 30, NO. 5, 1997
Aminoacyl tRNA synthetases (aaRS) enzymatically activate tRNA for protein synthesis by covalently joining an amino acid to the 3′-end of each of 20 different tRNAs.44 The fidelity of translation of the genetic code depends on accurate aminoacylation. aaRS enzymes must recognize three-dimensional structural features common to all tRNAs, while discriminating different tRNAs by utilizing a limited set of identity elements (nucleotides or chemical modifications specific to each tRNA).45,46 Striking features of tRNA-synthetase complexes are very large interface areas, ∼3000 Å2 or ∼20% of free tRNA accessible surface.47-50 By contrast, the nonspecific complex between elongation factor TU (EF-Tu) and tRNAPhe has a smaller interface area and few direct contacts to the bases,51 although EF-Tu binds tRNAs much more tightly (Kd ≈ 10-9 M) than RS enzymes (Kd ≈ 10-6 M). Differences in binding free energy are relatively small for aaRS enzymes binding different tRNAs: substrate discrimination occurs primarily at the level of the efficiency of aminoacylation. tRNAinduced protein conformational rearrangements appear to be a primary mechanism of substrate discrimination.47,49 The structural complementarity expressed by very large interface areas allows RS enzymes to differentiate tRNAs from other cellular RNAs and may provide the free energy necessary for these conformational rearrangements.49,52 tRNA-synthetase complexes provide many examples of how different intermolecular interactions determine binding energy and contribute to intermolecular discrimination (i.e., specificity). Hydrogen-bonding interactions are clearly important. Direct and water-mediated hydrogen bonds between protein side chains and tRNA bases are formed in the typically wide minor groove of acceptor and anticodon stems of tRNAGln.47,50 The RNA bases are also recognized by hydrogen bonding within the anticodon loop of tRNAGln 47,50 and tRNAAsp 49 and in the unusually accessible major grooves at the end of the acceptor and anticodon stems of tRNAAsp.49,52 Electrostatic interactions are a second source of intermolecular contacts. Interactions with unique phosphodiester backbone geometries not only provide binding energy but also facilitate intermolecular discrimination. For example, a critical G‚U base pair in tRNAAsp is recognized only through the indirect readout of the backbone conformation.49 Shape complementarity allows extensive van der Waals interactions, and the ability of tRNA to form unique structural features is critical for anticodon recognition in tRNAGln 47,50 and tRNAAsp.49 Conformational changes and (44) Cusack, S. Nat. Struct. Biol. 1995, 2, 824-831. (45) McClain, W. H. J. Mol. Biol. 1993, 234, 257-280. (46) McClain, W. H. In tRNA: Structure, Biosynthesis and Function; So¨ll, D., RajBhandary, U., Eds.; American Society for Microbiology: Washington, DC, 1995. (47) Rould, M. A.; Perona, J. J.; So¨ll, D.; Steitz, T. A. Science 1989, 246, 1135-1142. (48) Biou, V.; Yaremchuk, A.; Tukalo, M.; Cusack, S. Science 1994, 263, 1404-1410. (49) Caverelli, J.; Rees, B.; Ruff, M.; Thierry, J.-C.; Moras, D. Nature 1993, 362, 181-184. (50) Rould, M. A.; Perona, J. J.; Steitz, T. A. Nature 1991, 352, 213-218. (51) Nilsen, P.; Kjeldgaard, M.; Thirup, S.; Polekhina, G.; Reshetnikova, L.; Clark, B. F. C.; Nyborg, J. Science 1995, 270, 1464-1472. (52) Ruff, M.; Krishnaswamy, S.; Boeglin, M.; Poterszman, A.; Mitschler, A.; Podjarny, A.; Rees, B.; Thierry, J. C.; Moras, D. Science 1991, 252, 1682-1689.
RNA-Protein Intermolecular Recognition Varani
ordering of flexible nucleotides occur in tRNAAsp upon synthetase binding, while two non-Watson-Crick base pairs stabilized by protein contacts extend the anticodon stem in the aaRSGln-tRNAGln complex. Similarly, adaptation of the seryl-tRNA synthetase conformation to the unique shape of the extended tRNASer variable arm through ordering of a protein helical arm maximizes surface complementarity and facilitates discrimination.48 A fourth category of interactions is intermolecular stacking between protein heteroaromatic side chains and RNA bases. Anticodon nucleotides are splayed-out against the surface of β-barrel domains in the tRNAAsp 49 and tRNAGln 47 complexes allowing intermolecular stacking interactions to occur.
RNP Domain Paradigm of RNA Recognition Hundreds of proteins from higher organisms recognize RNA substrates widely diverse in sequence and structure via RNP domains. Proteins which bind single-stranded RNA require multiple RNP domains,53-55 while single RNP domains can recognize 5-10 single-stranded RNA bases with high affinity and specificity when the nucleotides are presented in a defined RNA structural context. For example, the human U1A protein recognizes seven singlestranded nucleotides in the context of a hairpin or internal loop (Figure 2) with very high affinity (Kd ≈ 10-11 M)18,56,57 and specificity. When those seven nucleotides are presented in the absence of RNA secondary structure, the binding constant is reduced 100000-fold, i.e., almost to the level observed for binding to RNA of random sequence.58,59 Crystallographic60 and NMR61 structures of the complexes of the human U1A protein with two different RNA substrates have revealed important aspects of the molecular basis of RNP-RNA recognition. The N-terminal RNP domain of U1A binds two distinct RNA substrates that present the same single-stranded nucleotide sequence in a completely different structural context. In both the crystal structure of the U1A-hairpin loop60 and in the NMR structure of the U1A-internal loop complex,61 bases within the single-stranded loops are exposed to the surface of the β-sheet of U1A. Most hydrogen bond donors and acceptors from the single-stranded bases are recognized by an extensive network of interactions with protein residues, but an unusually large number of intermolecular contacts involve protein main chain donors and acceptors. In these complexes, all RNA bases are involved in intra(53) Tacke, R.; Manley, J. L. EMBO J. 1995, 14, 3540-3551. (54) Kanaar, R.; Lee, A. L.; Rudner, D. Z.; Wemmer, D. E.; Rio, D. C. EMBO J. 1995, 14, 4530-4539. (55) Shamoo, Y.; Abdul-Manam, N.; Patten, A. M.; Crawford, J. K.; Pellegrini, M. C.; Williams, K. R. Biochemistry 1994, 33, 8272-8281. (56) van Gelder, C. W. G.; Gunderson, S. I.; Jansen, E. J. R.; Boelens, W. C.; Polycarpou-Schwartz, M.; Mattaj, I. W.; van Venrooij, W. J. EMBO J. 1993, 12, 5191-5200. (57) Hall, K. B.; Stump, W. T. Nucleic Acids Res. 1992, 20, 4283-4290. (58) Tsai, D. E.; Harper, D. S.; Keene, J. D. Nucleic Acids Res. 1991, 19, 4931-4936. (59) Hall, K. B. Biochemistry 1994, 33, 10076-10088. (60) Oubridge, C.; Ito, N.; Evans, P. R.; Teo, C.-H.; Nagai, K. Nature 1994, 372, 432-438. (61) Allain, F.-H. T.; Gubser, C. C.; Howe, P. W. A.; Nagai, K.; Neuhaus, D.; Varani, G. Nature 1996, 380, 646-650.
or intermolecular stacking interactions and the RNA phosphodiester backbone contacts a set of basic amino acids. The comparison with the unbound RNA internal loop62 and protein63 structures (Figure 5a) reveals a complex recognition mechanism. A first set of intermolecular contacts involve rigid body docking between the protein and the well-ordered RNA double-helical regions and three single-stranded nucleotides. The remaining five single-stranded nucleotides are recognized instead by induced fit. The rigid interaction involves two loops within U1A that have unique sequences in different RNP proteins. Thus, recognition of a well-defined RNA structure allows U1A to binds its target and prevents other RNP proteins (lacking these unique amino acid sequences) from binding the same RNA. Complex formation reorients an R-helix at the end of the protein domain that is essential for binding.59,64,65 At the same time, protein binding orders five single-stranded nucleotides against the β-sheet surface through intermolecular stacking interactions with three highly conserved aromatic amino acids. Intermolecular stacking interactions with exposed aromatic side chains are common to RNP proteins,60,61 tRNA synthetases,49,50 and viral coat proteins.66 Although rare in DNA-protein complexes, intermolecular stacking interactions appear to be very common in RNA-protein recognition, perhaps because they provide large amounts (∼3 kcal/mol) of binding energy.67
Conformational Flexibility at RNA-Protein Interfaces Structures of RNA-protein complexes have provided important insight into the mechanisms of intermolecular recognition, but have also raised intriguing questions concerning the molecular origin of binding energy and sequence discrimination. Surprisingly, binding energy is not related to interface area, presumably a measure of the van der Waals interaction energy and of the entropic contribution to binding energy from solvent and ion release. The very tight binding of the U1A protein (Kd ≈ 10-11 M) is remarkable, since the interface area is small and binding involves an entropically costly disorder-order transition in the single-stranded RNA loop.61 If binding ordered exposed protein side chains at intermolecular interfaces to a high degree, changes in the distribution of molecular vibrations would contribute 15-25 kcal/mol (an amount comparable to the overall binding energy!) to increase the free energy of protein-protein and proteinDNA interactions.68 However, the NMR relaxation properties of amino acid chains in DNA-protein and peptideprotein complexes indicate that motion is much less restricted at the intermolecular interface than in the highly (62) Gubser, C. C. G.; Varani, G. Biochemistry 1996, 35, 2253-2267. (63) Avis, J.; Allain, F. H.-T.; Howe, P. W. A.; Varani, G.; Neuhaus, D.; Nagai, K. J. Mol. Biol. 1996, 257, 398-411. (64) Jessen, T. H.; Oubridge, C.; Teo, C. H.; Pritchard, C.; Nagai, K. EMBO J. 1991, 10, 3447-3456. (65) Scherly, D.; Kambach, C.; Boelens, W.; van Venrooij, W. J.; Mattaj, I. W. J. Mol. Biol. 1991, 219, 577-584. (66) Valega´rd, K.; Murray, J. B.; Stockley, P. G.; Stonehouse, N. J.; Liljas, L. Nature 1994, 371, 623-626. (67) LeCuyer, K. A.; Behlen, L. S.; Uhlenbeck, O. C. EMBO J. 1996, 15, 6847-6853. (68) Spolar, R. S.; Record, M. T. J. Science 1994, 263, 777-784.
VOL. 30, NO. 5, 1997 / ACCOUNTS OF CHEMICAL RESEARCH 193
RNA-Protein Intermolecular Recognition Varani
FIGURE 5. Examples of conformational adaptation in RNA-protein recognition. (a) NMR structures of the free U1A protein (right), the free RNA internal loop substrate (center), and the protein-RNA complex (left).61 Protein binding induces a dramatic change in the RNA conformation, and requires a sharp reorientation of an r-helix at the carboxy end of the U1A domain. (b) Crystallographic structures of tRNAAsp free80 (right) and in complex with aspartyl-tRNA synthetase (left).49,52 In this orientation, the anticodon is at the bottom while the acceptor end is at the top left corner of the tRNA. Although the L-shaped tRNA structure is preserved, synthetase binding changes the relative orientation of the two major helical domains of tRNAAsp. In contrast to the situation observed with the portion ordered hydrophobic core.69,70 Protein-nucleic acid interfaces may not be rigidly ordered. Rather, a fine balance of the U1A-RNA interface determined by induced fit, it between rigidity and flexibility may provide a compromise is more difficult to reorganize the intermolecular interface between complete specificity (at large entropic cost) and when the interaction occurs by rigid interlocking of complete lack of selectivity. preordered regions of the protein and RNA. For example, Several observations derived from the U1A complexes it is impossible for U1A to bind tightly if leucine 49 is suggest that protein-RNA interfaces are not rigid. Firstly, mutated; this residue snugly fits a binding pocket at the the arginine 52 side chain in U1A, involved in four junction between RNA helices and loops.61,71 Arginine 52 hydrogen bonds in the crystalline state,60 can be mutated nearby can only be mutated to lysine, which has similar to lysine (which cannot form those hydrogen bonds) size and positive charge,18,64 but not to a much smaller without significant increase in the free energy of binding glutamine. It has already been mentioned that disruption (