10102
Biochemistry 2007, 46, 10102-10112
Comparison of mRNA-Display-Based Selections Using Synthetic Peptide and Natural Protein Libraries† Bao-cheng Huang and Rihe Liu* School of Pharmacy and Carolina Center for Genome Sciences, The UniVersity of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 ReceiVed February 1, 2007; ReVised Manuscript ReceiVed May 22, 2007
ABSTRACT: mRNA display is a genotype-phenotype conjugation method that allows the amplificationbased, iterative rounds of in vitro selection to be applied to peptides and proteins. Compared to prior protein selection techniques, mRNA display can be used to select functional sequences from both long natural protein and short combinatorial peptide libraries with much higher complexities. To investigate the basic features and problems of using mRNA display in studying conditional protein-protein interactions, we compared the target-binding selections against calmodulin (CaM) using both a natural protein library and a combinatorial peptide library. The selections were efficient in both cases and required only two rounds to isolate numerous Ca2+/CaM-binding natural proteins and synthetic peptides with a wide range of affinities. Many known and novel CaM-binding proteins were identified from the natural human protein library. More than 2000 CaM-binding peptides were selected from the combinatorial peptide library. Unlike sequences from prior CaM-binding selections that correlated poorly with naturally occurring proteins, synthetic peptides homologous to the Ca2+/CaM-binding motifs in natural proteins were isolated. Interestingly, a large number of synthetic peptides that lack the conventional CaM-binding secondary structures bound to CaM tightly and specifically, suggesting the presence of other interaction modes between CaM and its downstream binding targets. Our results indicate that mRNA display is an ideal approach to the identification of Ca2+-dependent protein-protein interactions, which are important in the regulation of numerous signaling pathways.
One of the most effective strategies to get a thorough understanding of a protein of interest is through mapping its conditional interactions with other proteins on a proteomewide scale, while simultaneously searching the peptide sequence space for the binding specificity. The ideal method should allow the identification of target-interacting sequences from both natural protein and combinatorial peptide libraries under the conditions that are specific for such interactions. A number of approaches have been developed to address this problem. The yeast two-hybrid method has been widely used to isolate interacting sequences of a target protein from natural cDNA libraries (1, 2). Phage display is another approach to the selection of sequences with desired functions, often from a short combinatorial peptide library (3). Unfortunately, most peptide or protein selection methods have limitations in addressing conditional protein-protein interactions. mRNA display is a relatively new in vitro selection technique that allows the identification of conditional protein-protein interactions from both a natural protein library and a combinatorial peptide library (4-6). The central feature of this method is that the polypeptide chain is covalently linked to the 3′ end of its own mRNA. This is † This work was supported by startup funds from the Carolina Center for Genome Sciences and School of Pharmacy at the University of North Carolina at Chapel Hill (to R.L.) and by a grant from the American Cancer Society (RSG-06-073-01-TBE) (to R.L.). * To whom correspondence should be addressed. Tel: (919) 8433635. Fax: (919) 968-0081. E-mail:
[email protected].
accomplished by synthesis and in vitro translation of an mRNA template with puromycin attached to its 3′ end via a short DNA linker. During in vitro translation, when the ribosome reaches the RNA-DNA junction and translation pauses, puromycin, an antibiotic that mimics the aminoacyl moiety of tRNA, enters the ribosome “A” site and accepts the nascent polypeptide by forming a peptide bond. This results in tethering of the nascent peptide to its own mRNA. Since the genotype coding sequence and the phenotype polypeptide sequence are united in a single molecule, mRNA display provides a powerful means for reading and amplifying a peptide or protein sequence after it has been selected on the basis of its function. Multiple rounds of selection and amplification can be performed, enabling enrichment of rare sequences with desired properties. Compared to prior peptide or protein selection methods, mRNA display has several major advantages. First, the genotype is covalently linked to and is always present with the phenotype. This stable linkage makes it possible to use any arbitrary and stringent conditions in the functional selection. Second, the complexity of the peptide or protein library can be close to that of the RNA or DNA pools. Peptide or protein libraries containing as many as 1012-1013 unique sequences can be readily generated and selected, a few orders of magnitude higher than can be achieved using phage display. Therefore, both the likelihood of isolating rare sequences and the diversity of the sequences isolated in a given selection are significantly increased. It is of great interest to investigate the basic
10.1021/bi700220x CCC: $37.00 © 2007 American Chemical Society Published on Web 08/09/2007
Comparison of mRNA-Display-Based Selections features and potential problems of using mRNA display in studying conditional protein-protein interactions. Ideally, this question should be addressed by comparing two selections against the same target using both a natural protein and combinatorial peptide libraries, so that the interacting partners in both the proteome of interest and the sequence space can be examined. The model target we chose to compare here is CaM,1 which is considered as the primary transducer of Ca2+ signals in eukaryotes. Although numerous CaM-binding proteins have been revealed, novel signaling pathways mediated by CaM are continuously identified (6, 7). Like many other Ca2+-sensing proteins, CaM contains EF-hand motifs as Ca2+ binding sites and undergoes distinct conformational changes upon binding with Ca2+. CaM is involved in a wide variety of biological processes through its interaction with many different protein targets, which come in various shapes, sizes, and sequences (8-14). Many Ca2+-dependent CaM-binding proteins have one or more CaM-binding motifs that are characterized by a basic amphipathic helix with approximately 20 amino acids in length (15-17). On the basis of the positions of the conserved hydrophobic residues (Φ), three related Ca2+/ CaM-binding motifs have been grouped, namely 1-10, 1-14, and 1-16 motifs, which refer to groups of sequences whose key bulky hydrophobic residues are spaced 8, 12, and 14 residues apart, respectively. Another well-known CaMbinding motif is the IQ or IQ-like motif, which has a consensus sequence of (FILV)QX3(RK)GX3(RK)X2Φ or (FILV)QX3(RK), respectively (18, 19). Due to the presence of numerous CaM-binding motifs, a robust selection strategy should allow the identification of peptides or proteins with all of the subclasses of CaM-binding motifs. Phage display had been used to identify CaM-binding peptides from combinatorial peptide libraries. Dedman and co-workers reported the first CaM-binding screening using a phage library displaying random peptides 15 amino acids in length (20). Twenty-eight independent CaM-binding sequences were isolated in this screen, and all contained a tryptophan residue within the 15 amino acid insert. Kay and co-workers used a phage library displaying random peptides 26 amino acids in length for CaM-binding sequences (21). Twenty CaM-binding peptides were identified in this work, and 17 contained one of the three consensus sequence motifs. Interestingly, the sequences of the selected peptides from these two selections had little in common, which was attributed to different types of combinatorial peptide libraries used in the selection. It is now clear that phage display could preferentially select high-affinity interacting peptides at the expense of low-affinity binders, presumably due to biases against certain sequences at different steps (22). In the work reported by Nairn and co-workers, for example, two CaMbinding sequences (AWDTVRISFG and AWPSLQAIRG) comprised 80% of the pool after two rounds of affinity 1 Abbreviations: CaM, calmodulin; CAMK2D, calcium/calmodulindependent protein kinase II δ isoform; CBSPP, CaM-binding site prediction program; CDC5-L, Schizosaccharomyces pombe cell division cycle 5 homologue-like; GLUT4, glucose transporter member 4; Kd, binding affinity; MLCK, myosin light chain kinase; MYBL1, v-myb myeloblastosis viral oncogene homologue (avian)-like 1; Ni-NTA, Ni2+nitrilotriacetic acid; PPP3C, catalytic subunit of protein phosphatase 3 R isoform; snRNA, small nuclear RNA; TBC1D4, TBC (Tre-2, BUB2, CDC16) domain-containing family member 4; TMV, tobacco mosaic virus; TNT, coupled in vitro transcription and translation.
Biochemistry, Vol. 46, No. 35, 2007 10103 selection using a phage-displayed combinatorial peptide library (23). Another problem in phage-display-based selection is that the selected peptide sequences usually correlate poorly with naturally occurring proteins in GenBank. Indeed, none of the selected CaM-binding peptides was found to be homologous to natural CaM-binding proteins. We hypothesized that the observed poor correlation could be due to low complexity of the initial libraries. Accordingly, screening a synthetic peptide library with much higher complexity could result in better correlation. However, most in vitro selection techniques only allow screening of peptide libraries with less than 109 sequences. Therefore, a novel strategy should be used to address the problem. In this work, we describe the Ca2+/CaM-binding selection using an mRNA-displayed combinatorial peptide library with about 1012 unique sequences. More than 2000 Ca2+/CaMbinding peptides were identified, including several that were highly homologous to the CaM-binding motifs in naturally occurring proteins. We also compared the CaM-binding selection from a human protein domain library with that from the combinatorial peptide library. The selection was very efficient in both cases, and CaM-binding natural proteins and synthetic peptides with a wide range of affinities were enriched. Some unique features and potential problems of mRNA-display-based selection were revealed. Our results indicate that mRNA display is an ideal approach to the identification of Ca2+-dependent protein-protein interactions, which are important in the regulation of numerous signaling pathways. EXPERIMENTAL PROCEDURES Construction and Generation of mRNA-Displayed Protein and Peptide Libraries. The mRNA-displayed natural protein domain library was generated with a mixture of poly(A)+ mRNAs from human brain, heart, spleen, thymus, and muscle as reported recently (6). For the combinatorial peptide library, a preselected, high-quality DNA library that codes for peptides with 20 consecutive random amino acids was used as templates for in vitro peptide synthesis (24). Each sequence in the DNA library contains a T7 RNA polymerase promoter, a TMV translation enhancer sequence, an Nterminal FLAG tag coding sequence, a random cassette encoding 20 consecutive random codons, and a His6 tag coding sequence. The library was preselected using mRNA display to remove sequences with stop codons and frame shifts. Unlike prior CaM-binding selections in which all of the random residues were coded by an NNK codon (N ) G, A, T, or C; K ) G or T) (20, 21, 23), we started from a library that encoded short peptides with more balanced amino acid compositions (Supporting Information, Table 1). This was realized by using three phosphoramidite mixtures with appropriate proportions of each of the four phosphoramidites in the DNA synthesis, corresponding to each of the three positions in the random codons (24, 25). The DNA library was in vitro transcribed using T7 RNA polymerase (Ambion, Austin, TX) to get mRNAs. The mRNA templates with puromycin at the 3′ ends were generated by cross-linking with an oligonucleotide containing a psoralen moiety and a puromycin residue at its 5′ and 3′ ends, respectively (26). In vitro translation was performed using rabbit reticulocyte lysate (Novagen, Madison, MI), and mRNA-protein fusion formation was accomplished under optimized conditions (5).
10104 Biochemistry, Vol. 46, No. 35, 2007 mRNA templates and mRNA-protein fusions were then isolated from the lysate using an oligo(dT) column, taking advantage of oligo(dA) residues at the puromycin-containing DNA linker [psoralen-(ATAGCCGGTG)2′-OMe-dA15-CCpuromycin]. To remove secondary RNA structures that might interfere with the selection step, the fusion molecules were converted to a DNA/RNA hybrid by reverse transcription. The resulting mRNA-displayed peptide library was successively purified on the basis of each of the affinity tags at the N- and C-termini. More than 85% of the purified sequences had contiguous ORF without stop codons or frame shifts (24). On the basis of the large number of clones we have picked up from the initial library for sequencing analysis, none was found to be identical to others. Selection of Ca2+-Dependent CaM-Binding Sequences from mRNA-Displayed Peptide or Protein Libraries. The purified combinatorial peptide or natural protein library displayed on their own mRNAs (∼1.5-3 pmol each round) was first diluted in a selection buffer (50 mM Tris-HCl, pH 7.5, 150 mM KCl, 0.05% Tween-20, 1 mg/mL BSA, 5 mM 2-mercaptoethanol, and 0.5 mM Ca2+) and passed through a precolumn with 100 µL of streptavidin beads (Pierce, Rockford, IL) to minimize the enrichment of matrix-binding sequences. The flow-through was incubated with 25 µg of biotinylated CaM (6) for 1 h at 4 °C. After binding, the mixture was incubated with 100 µL of streptavidin beads for another 0.5 h followed by loading to an empty column. Unbound and nonspecifically bound molecules were washed off the column using 18 and 40 column volumes of selection buffer for the natural protein library and combinatorial peptide library, respectively. Molecules that bound to CaM in a Ca2+-dependent manner were eluted using the same buffer containing 2 mM EGTA. The selected molecules were PCR amplified for the next round of selection or cloned into a TOPO vector (Invitrogen, La Jolla, CA) for sequencing and analysis. Synthesis and Purification of Selected Peptides or Proteins. To examine the CaM-binding properties of a large number of selected peptide sequences, a parallel CaM-binding assay was used. First, individually cloned sequences selected from round 1 or round 2 were PCR amplified and used as templates for a coupled in vitro transcription and translation (TNT) reaction in the presence of 10 µCi of [35S]methionine (Perkin-Elmer, Wellesley, MA) in a total volume of 25 µL for 90 min at 30 °C. Expressed peptides or proteins were purified using Ni-NTA agarose (Qiagen, Valencia, CA), either individually or in a 96-well microplate format. To purify peptides or proteins using 96-well plates, 25 µL of the TNT reaction mixture was diluted into 100 µL of NiNTA binding buffer (50 mM HEPES, pH 7.4, 300 mM NaCl, 0.05% Tween 20) and incubated with 25 µL of Ni-NTA agarose at room temperature for 1 h with gentle shaking. The mixtures were then transferred to a 96-well filter plate. The supernatant was removed by applying vacuum in a manifold for 3 min. The Ni-NTA resin in each well was washed with 800 µL of binding buffer for four times. Bound peptides or proteins were then eluted twice each with 50 µL of elution buffer (50 mM HEPES, pH 7.4, 300 mM NaCl, 250 mM imidazole, 0.05% Tween 20). Peptides or proteins thus purified were passed through a Zeba desalt spin column (Pierce) to further remove residual free [35S]methionine before being used in the binding assay.
Huang and Liu
FIGURE 1: Selection of Ca2+-dependent CaM-binding sequences using an mRNA-displayed combinatorial peptide library: mRNA, green; DNA, red; protein, yellow circles; puromycin, blue circle labeled with P; streptavidin beads, orange cross. The 20 random amino acids (raa) are shown as a black bar at the DNA level, flanked with two affinity tags at both sides.
In Vitro CaM-Binding Assays. An aliquot of purified peptide or protein was incubated with an appropriate amount of biotinylated CaM in a well of 96-well microplate in a CaM-binding buffer (25 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1 mg/mL BSA, 5 mM 2-mercaptoethanol) at room temperature for 60 min. The binding was performed in the presence of 1 mM Ca2+ or 2 mM EGTA. The CaM-peptide or CaM-protein complexes were then captured by adding 20 µL of a streptavidin-agarose slurry on the same plate, and the mixture was incubated at room temperature for 30 min with gentle mixing. After removal of the supernatant, the beads were washed four times each with 100 µL of corresponding binding buffer to wash away unbound molecules. Peptides or proteins captured by CaM were then gently released from the beads by chelating Ca2+ using 100 µL of elution buffer (CaM-binding buffer supplemented with 2 mM EGTA). Each fraction, including flow-through, washes, eluates, and agarose beads, was recovered. The binding percentage was determined from the scintillation counts of each fraction. Binding Affinity of Selected CaM-Binding Peptides or Proteins. To determine the binding affinity, biotinylated CaM with different final concentrations (from 5 nM to 1.0 µM) was used to interact with an appropriate amount of purified, [35S]methionine-labeled peptides or proteins in a polypropylene 96-well microplate. The binding assay was performed in the presence of 1 mM Ca2+. The radioactive counts of the flow-through, washes, eluates, and agarose beads were counted. The binding assay was performed twice, and the averaged data were fitted to a binding curve using GraphPad Prism software. RESULTS CaM-Binding Sequences Were Selected Very Efficiently from both Natural Protein and Combinatorial Peptide Libraries. The length of the protein portion from the ORFs of the natural protein library was in the range of 80-200 amino acids. Typically, 1.5 pmol (∼1012 molecules) of mRNA-displayed protein sequences was generated and used in the binding selection. Therefore, each protein and its fragments should have numerous copies. The combinatorial peptide library has a complexity of approximately 5 × 1012, which is more than 3 orders of magnitude higher than those used in most phage-display-based selections (24). We then performed Ca2+-dependent CaM-binding selection using either an mRNA-displayed natural protein domain library or a combinatorial peptide library. Figure 1 illustrates the
Comparison of mRNA-Display-Based Selections
Biochemistry, Vol. 46, No. 35, 2007 10105
Table 1: Sequences and CaM-Binding Properties of Several CaM-Binding Protein Fragments Isolated from the mRNA-Displayed Human Protein Domain Library S no. P1 P2 P3 P4 P5 P6
isolated protein fragment and its mapped CaM-binding motifa
potential CaM-binding motif predicted by CBSPPb
CaMK2D; D248-R390 from BAD92525; 327 RRKLKGAILTTMLATR PPP3CA; E381-R478 from NP_000935; 393 KEVIRNKIRAIGKMARVFSV MLCK; M1736-G1823 from NP_444253; 1742RRKWQKTGNAVRAIGRLSSM TBC1D4; D826-G925 from NP_055647; 838ELRSLWRKAIHQQILLLR CDC5-L; A22-E115 from NP_001244; 31WSRIASLLHRKSAKQCKAR MYBL1; K30-E191 from XP_034274; 151 YEAHKRLGNRWAEIAKLLP
contains a database basic 1-5-10 motif contains a database 1-5-8-14 motif contains a database basic 1-8-14 motif
% binding with Ca2+ (elution/on beads)
% binding with EGTA (elution/on beads)
Kd (nM)
27/13
0.7/2.6
32 ( 6
54/9.5
0.1/0.1
19 ( 3
42/12.7
1.8/1.4
26 ( 4
9.8/32.8
2.9/15.5
16 ( 3
31.7/14.6
0.9/2.2
37 ( 6
25.8/26.4
3.4/13.5
32 ( 5
a This column lists the name of the protein, the region of the fragment isolated from the selection, and the CaM-binding motif included in this region predicted using a CaM-binding site prediction program (CBSPP). The underlined residues are the key bulky hydrophobic residues as reported in the literature. b This column lists the type of the CaM-binding motifs predicted by CBSPP. Each of these motifs refers to a group of sequences whose key bulky hydrophobic residues are spaced 8, 12, and 14 residues apart, respectively, based on the classification detailed in http://calcium.uhnres.utoronto.ca/ctdb/ctdb/3cln-big.html.
selection scheme. In both selections, the signal was close to background in round 1 but rapidly increased to approximately 8% in round 2 (Supporting Information, Figure 1). Among the 269 annotated or hypothetic proteins isolated in round 1 from the natural protein library, 28 known and 99 novel CaM-binding proteins were confirmed using the CaMbinding assay (6). Interestingly, almost all of the sequences from the round 2 pool bound to CaM in a Ca2+-dependent manner. Out of 120 sequences we thoroughly analyzed from round 2, 72 were from 12 classes of well-known CaMbinding proteins, including different isoforms (R, β, γ, and δ) of Ca2+/CaM-dependent protein kinases, different isoforms (R, β, and γ) of Ca2+/CaM-dependent protein phosphatases, Ca2+ transporting ATPase, R II spectrin, phosphoinositide 3-kinase, myosin heavy polypeptides, voltage-dependent sodium channel II, DEAD-box protein, cyclic nucleotide phosphodiesterase, myosin light chain kinase, dystrophin, and nitric oxide synthase. The remaining 48 sequences were from potential novel CaM-binding proteins, including TBC1 family members. Other well-known CaM-binding proteins were found when more clones were picked up for analysis (data not shown). We picked up several selected sequences to detail their CaM-binding affinity and specificity. These sequences include selected protein fragments originating from wellknown CaM-binding proteins (CAMK2D, PPP3C, and MLCK), together with a few originating from potential novel CaM-binding proteins (TBC1D4, CDC5-L, and MYBL1). As shown in Table 1 and Figure 2A (also Supporting Information, Figure 2A), these protein fragments bound to CaM tightly and specifically. The binding was highly dependent on Ca2+, as it was significantly reduced when Ca2+ was chelated. The binding affinities were in the range of 16-37 nM, and those of known CaM-binding proteins were close to that reported in the literature (27, 28). We attributed the efficient enrichment of CaM-binding proteins from the natural protein library to its relatively lower complexity (