High-Throughput, Fluorescence-Based Screening ... - ACS Publications

May 7, 2004 - screening. Production of expressed proteins detected through screening can be scaled up either using. IVT reactions or with in vivo expr...
0 downloads 0 Views 332KB Size
High-Throughput, Fluorescence-Based Screening for Soluble Protein Expression M. A. Coleman, V. H. Lao, B. W. Segelke, and P. T. Beernink* Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, 7000 East Ave., L-448, Livermore, California 94550 Received May 7, 2004

Protein expression screening methods are essential for proteomic scale characterization of gene and cDNA expression libraries. Screening methods are also important for the identification of highly expressed protein targets, for example, in quantities suitable for high-throughput screening and protein structural studies. To address these needs, we describe the implementation of several rapid, fluorescence-based protein expression screening strategies using Escherichia coli or E. coli-based in vitro transcription/translation (IVT) systems. In vitro expression screening is fast, convenient and, as we show, correlates well with in vivo expression. For screening, expressed proteins are labeled either as fusions with green fluorescent protein (GFP) or through translational incorporation of a fluorescent amino acid derivative, BODIPY-FL-Lysine. Fluorescence-based detection of GFP fusions or BODIPYlabeled proteins is considerably faster than other common expression screening methods, such as immunological detection of gels or dot blots. Furthermore, in vitro and in vivo screening used together yield a larger set of expressed proteins than either method alone. Specifically labeled proteins in cellular lysates are detected in one of three formats: a microplate using a fluorescence plate reader, a dot-blot using a fluorescence scanner or a microarray using a laser scanner. We have established a correlation among the various detection formats, which validates the use of protein microarrays for expression screening. Production of expressed proteins detected through screening can be scaled up either using IVT reactions or with in vivo expression systems in the absence of a fluorophore for subsequent characterization of protein function or interactions. Keywords: protein expression • high-throughput screening • fluorescence • green fluorescent protein (GFP) • BODIPY-FL • protein microarray

Introduction Recently, large quantities of genomic sequencing data have become publicly available, which allows structural and functional relationships within the encoded proteomes of interest to be queried. The challenge of studying proteins on a genomewide scale requires the development of high-throughput approaches for identifying open reading frames (ORFs), cloning, protein expression, and purification. The resulting proteins can be used for studies of biochemical function, macromolecular interactions, generation of affinity reagents, or protein structural analyses. However, technical hurdles still remain in the processes of high throughput cloning, protein expression and purification, with protein expression being arguably the biggest bottleneck. To establish an efficient protein production pipeline, more rapid and efficient methods in these areas are needed, in particular protein expression screening approaches. Coupled in vitro transcription/translation (IVT) expression systems are well-suited for analytical purposes such as expression screening. One application of high-throughput protein * To whom correspondence may be addressed: Tel: (925) 422-5793. Fax: (925) 424-3130. E-mail: [email protected].

1024

Journal of Proteome Research 2004, 3, 1024-1032

Published on Web 09/17/2004

expression is the identification of expressed proteins from hypothetical genes or expression cDNA libraries. One such collection, the I.M.A.G.E. consortium,1 which is based at Lawrence Livermore National Laboratory (LLNL), contains >25 000 full-length human and mouse cDNAs. An efficient screening process that could be used to identify highly expressed proteins or variants of a single protein is also particularly valuable in the identification of suitable targets for structural studies. Obtaining soluble proteins in sufficient yield is the major bottleneck in structural genomics projects,2 which usually require milligram (or greater) quantities of highly purified proteins. To carry out these projects, a high-throughput screening system is required that can generate recombinant plasmids rapidly and identify clones that express soluble proteins or protein domains. IVT is sometimes suitable for preparative uses to provide proteins for biochemical studies including enzyme kinetics assays and protein interaction analyses, when microgram quantities of protein or less are needed. IVT expression is particularly useful for producing proteins for protein microarrays, for which sub-microgram quantities are sufficient. Protein arrays allow multiplexed protein detection along with sensitive 10.1021/pr049912g CCC: $27.50

 2004 American Chemical Society

Fluorescence-Based Protein Expression Screening

quantification in a small format.3,4 In addition, the microarray format has been adapted for studying protein-protein interactions.5-7 These developments enable extremely small-scale expression screening either directly using labeled proteins or indirectly by immunological detection. Protein arrays have potential applications in basic biological research, genomic annotation, identification of disease markers and the diagnosis of disease.4,8,9 IVT is well suited for proteomic studies because it circumvents slow, cumbersome steps such as bacterial transformation and growth and IVT can easily be performed in highthroughput formats. An additional advantage of IVT is that it may be able to overcome specific types of expression problems. For example proteins that are cytotoxic when overexpressed in vivo will not usually interfere with transcription or translation in vitro. Other expression problems may be ameliorated through the use of additives since IVT is an open system. Proteins that are susceptible to intracellular proteolysis, for example, may be recovered through IVT expression in the presence of protease inhibitors (G. A. Martin, personal communication). Other additives including chaperonins,10,11 lipids12 and redox factors13 have also been successfully employed in IVT expression methods. For these reasons, IVT has value in the expression of large numbers of proteins for analytical or preparative uses. To develop more rapid protein expression screening approaches, we combine the attributes of high-throughput protein expression with several rapid and sensitive fluorescence labeling and detection systems. A fluorescence-based protein expression screen is implemented in several formats, a microplate, a dot blot, and a microarray. We compare these screening methods on a subset of proteins and examine the potential correlation between cell-free and bacterial expression. The development of high-throughput expression screening methods that utilize fluorescent labeling enables rapid, reproducible detection of labeled proteins with high sensitivity and a wide dynamic range. High-throughput protein expression identifies highly expressed proteins for structural studies and produces sufficient quantities of proteins for biochemical assays and interaction analyses.

Experimental Methods Cloning and Bacterial Expression. For the construction of carboxyl (C)-terminal GFP fusion proteins,1,14 a set of microbial genes and human cDNAs were amplified by Polymerase Chain Reaction (PCR) with primers containing restriction site adapters for NdeI and BamHI restriction endonucleases and a highfidelity polymerase, Pfu DNA polymerase (Stratagene). (We use the terminology “C-terminal GFP fusion protein” to indicate a fusion of GFP to the carboxyl-terminal end of a protein of interest, i.e., protein-GFP.) The PCR products were digested with NdeI and BamHI restriction enzymes (New England Biolabs) and ligated into the pET28-derived plasmid GFPfolder,14 which was digested with the same enzymes. Genes were expressed as C-terminal GFP fusion proteins in 2 mL Escherichia coli cultures, were grown at 37 °C with vigorous shaking until mid-exponential phase (OD600 ) 0.6) was reached, and expression was induced with 1 mM IPTG. Cultures were grown for an additional 2 h and were harvested by centrifugation. Template Preparation and IVT Reactions. Sequential PCR and IVT reactions were performed in 25 µL volumes in 96-well PCR plates (Corning). DNA amplification was performed using

research articles Taq DNA polymerase (Roche Diagnostics) and primers specific to the T7 promotor and terminator without (5′ TAATACGACTCACTATAGGG and 5′-GCTAGTTATTGCTCAGCGG) or with 7 nucleotide GC clamp15 sequences (5′-GCGCGCGAGATCTCGATCCCGCGAAATTAATACGAC 5′-GCGCGCGTATCCGGATATAGTTCCTCCTTTCAG). PCR amplification conditions consisted of 5 cycles with a 50 °C annealing temperature followed by 20 cycles with a 60 °C annealing temperature. Denaturation (94 °C) and annealing (50 or 60 °C) steps were 30 s, and the extension step (72 °C) was 2 min. IVT reactions contained 1 µL of unpurified PCR template and 24 µL of a master mix containing the RTS-100 (Roche Diagnostics) kit components and 0.13 µL of a BODIPY-FL-LystRNALys conjugate, FluoroTect GreenLys (Promega). The reactions were incubated at 30 °C for 4 h and analyzed immediately or stored at -20 °C. Plasmid templates for PCR included expression clones in pIVEX2.4b (Roche Diagnostics), pETBlue-2 (Novagen) and the pET28 derivative GFPfolder14. Fluorescence Detection Using a Plate Reader. GFP fluorescence was determined from the soluble fraction of bacterially or IVT-expressed proteins following centrifugation for 10 min at 3250 × g. GFP fluorescence was quantified on a Tecan Genios plate reader with a 485 nm excitation filter and a 510 nm emission filter using a gain setting of 1.0. The background signal from cells transformed with the parent expression plasmid (pIVEX 2.3) was subtracted. From a standard curve of known concentrations of purified GFP, the fluorescence signal was 555 RFU per microgram of protein. Fluorescence Detection Using a Dot-Blot Method. Detection of IVT products with an incorporated BODIPY-FL-Lys utilized a protein dot-blot procedure. IVT products were fractionated by centrifugation in a microplate rotor at 3250 × g for 10 min. 10 µL of the soluble (supernatant) fraction were diluted 1:20 in Urea Lysis Buffer [8 M urea, 20 mM Tris-HCl, 10 mM NaPO4, pH 8.0]. This mixture was applied to a PVDF membrane (Immobilon-P, Millipore) using a Bio-Dot vacuum blotting apparatus (Bio-Rad). The membrane was washed three times by vacuum applications, once with Urea Lysis Buffer and twice with 1× PBS [80 mM NaPO4, 20 mM NaPO4, 100 mM NaCl, pH 7.5] to remove unincorporated BODIPY-FL-LystRNALys. The apparatus was disassembled and the membrane was further washed in 1× PBST [PBS containing 0.1% (v/v) Tween-20] for 30 min. The fluorescence signal was acquired on a FluorImager 595 flatbed fluorescence scanner (Molecular Dynamics) using a 488 nm excitation filter and quantified with ImageQuant software (Molecular Dynamics). Protein Array Printing and Imaging. Barcoded glass slides coated with γ-aminopropylsilane (GAPS II; Corning) were spotted with protein samples using a robotic arrayer (Norgren Systems). Crude, IVT-expressed GFP fusion proteins were diluted 1:2 in Spotting Buffer (50 mM HEPES, pH 7.5, 5% glycerol, 50 mM KCl). Purified GFP was diluted to concentrations ranging from 5 ng/mL to 1 mg/mL in Spotting Buffer. A single print pin (Telechem International) was used to deposit approximately 1 nL of diluted protein solution onto the glass slides. Proteins were spotted in replicates of 2-5, generating ∼300 µm diameter spots with spot-to-spot distances of ∼350 µm. Bovine serum albumin (BSA) served as nonfluorescent control, while Cy3/Cy5-labeled DNA was used as position controls to mark the four corners of the array. After spotting, the slides were stored in the dark at 4 °C. Protein arrays were imaged with a laser-based confocal scanner (ScanArray 5000 XL; Packard Bioscience) using the 488 Journal of Proteome Research • Vol. 3, No. 5, 2004 1025

research articles

Coleman et al.

Figure 1. In vitro expression screening of GFP fusion proteins. A, Fluorescence of GFP fusion proteins synthesized by IVT in a 96-well plate was detected on a UV transilluminator at 365 nm. The source of template DNA (plasmid or PCR product) for the IVT reaction is shown on the right. The top two rows represent a comparison of the same genes expressed using alternative sources of reaction template. The bottom row depicts eight additional genes expressed from plasmid templates. Controls include GFP with no fusion partner (A8, B8) and E. coli lysate alone (A9, B9). B, Quantification of the fluorescence signal using a plate reader. The background fluorescence signal of the lysate was subtracted and the three rows from top left to bottom right are plotted sequentially.

nm Ar laser to detect GFP and GFP fusion proteins and the 543 nm GHeNe laser for Cy3-labeled DNA and Rhodamine conjugated antibodies, while the Cy5 labeled DNA was detected using a 594 nm VheNe laser. Images were collected and analyzed using mean pixel intensities with QuantArray software (Packard Bioscience). Microarray-Based GFP Immunoassays. For immunological detection of fusion proteins, spotted arrays were covered with a hybridization chamber (Schleicher & Schuell) and filled to a total volume of 300 µL with Blocking Buffer [1× PBS, 1% Tween 20, 100 µg/mL BSA]. The arrays were incubated for 30 min at 25 °C with gentle shaking. Wash Buffer [50 mM Tris-HCl, pH 7.5, 50 mM NaCl, 2 mM DTT, 0.5% NP-40] was applied to the slides three times for 5 min each at 25 °C. After the final wash, mouse anti-GFP 1° antibody (Ab) (Living Colors A.v. Peptide Ab; BD Clontech) was diluted 1:500 in 400 µL of PBST and was incubated for 30 min at 25 °C with gentle shaking, followed by three 5 min washes at 25 °C. A Rhodamine-labeled goat antimouse 2° Ab (Santa Cruz Biotechnology) was diluted 1:250 in 400 µL of PBST and was incubated for 30 min at 25 °C with gentle shaking. The incubation with the 2° Ab was followed by three additional 5 min washes with gentle shaking. Imaging and analysis of the arrays was performed as described above.

Results Rapid Detection of Soluble GFP Fusion Proteins in Vitro. We pursued the development of cell-free protein expression 1026

Journal of Proteome Research • Vol. 3, No. 5, 2004

screening methods because of their suitability for highthroughput approaches, potential for automation and flexibility of reaction volumes (ca. 10 µL to 10 mL). To identify a suitable cell-free expression system, we first tested E. coli based extracts from several suppliers and found that the Rapid Translation System (RTS-100; Roche Diagnostics) provided the most favorable expression yields for several different proteins (data not shown). Using RTS-100 extracts, we examined cell-free expression of GFP fusion proteins from the pET28-derived plasmid GFPfolder14 and from linear templates that were PCR amplified from the same plasmids. The soluble products were visualized on a UV transilluminator at 365 nm (Figure 1A) and the relative fluorescence was quantified on a plate reader using a 488 nm excitation filter (Figure 1B). The absolute yields, derived from a standard curve of purified GFP, were 10-40 µg in 25 µL IVT reactions. Two- to five-fold higher yields of soluble protein were obtained from plasmid templates (Figure 1B, 1-9) than from PCR templates (Figure 1B, 10-18). Rapid Detection of Soluble Fusion Proteins in Vivo. Next, we examined the ability to screen for the soluble expression of GFP fusion proteins from E. coli cultures. The relative fluorescence of total fusion protein (soluble and insoluble) could be visualized from intact E. coli on agar plates and judged qualitatively (Figure 2A). Twenty expression constructs including prokaryotic genes and eukaryotic cDNAs (Table 1) were examined for inducible, soluble expression of a C-terminal GFP fusion protein in clarified, crude cellular extracts. The fluores-

Fluorescence-Based Protein Expression Screening

research articles

Figure 2. Fluorescence detection of proteins expressed in E. coli. A, In vivo fluorescence of GFP fusion proteins is shown on an agar plate on a transilluminator (365 nm) following overnight growth in the presence of 1 mM IPTG. The proteins shown are (clockwise from brightest clone): LcrV-GFP, Ape1-GFP, plasmid negative control, Rad23A, Tn1331, SFN5-GFP. B, Twenty E. coli cultures expressing GFP fusion proteins and two negative control cultures (no plasmid, pIVEX 2.4b) were grown and induced as described in Methods. The soluble fraction of the lysates was quantified using a fluorescence plate reader and plotted as a histogram in increasing order of fluorescence. Clone identities are given in Table 1. One of the proteins, Ape1 R237C (Clone 57-1; arrow), which is known to yield ∼1 mg purified protein per L culture, was identified as a threshold for high-level expression in quantities suitable for structural studies. Table 1. Expressed Proteins and Their Source c

clone no.

protein name

Mr (kDa)

organism

1-1 2-1 3-2 4-4 5-3 6-1 7-1 8-1 9-1 11-1 15-1 16-1 17-1 18-1 19-1 20-1 23-1 32-2 56-1 57-1

RepO XRCC9 XPB XRCC1 NTDa YopD Gene II Rad23A Tn1331 PA833 XRCC1 LcrV Rad54B Rad51C LcrH SFN5 LcrG Orf38 XPF Ape1 Ape1 R237Cb

74.0 68.4 86.0 21.4 33.7 45.1 39.9 21.0 64.8 71.4 35.9 32.9 14.9 18.5 22.6 10.5 37.4 99.1 34.5 34.5

E. coli H. sapiens H. sapiens H. sapiens Y. pestis M13 H. sapiens K. pneumoniae P. aerophilum H. sapiens Y. pestis H. sapiens H. sapiens Y. pestis H. sapiens Y. pestis Y. pestis H. sapiens H. sapiens H. sapiens

a N-terminal domain, residues 1-195. b Destabilized Ape1 variant. c Sequence-deduced relative protein mass (subunit).

cence signal of 100 µL of soluble GFP fusion proteins was obtained using a fluorescence plate reader (Figure 2B). Fol-

lowing subtraction of a control containing vector alone, the signals ranged from slightly higher than background, ∼100 relative fluorescence units (RFU), to ∼42 000 RFU. The microplate detection format exhibited a dynamic range of about 2.5 orders of magnitude. On the basis of a standard curve of purified GFP, the fluorescence signals corresponded to 0.2 to 76 µg GFP per 100 µL extract. The quantities of protein expressed in this screening experiment were consistent with our previous experience with several proteins expressed as native or His6 tagged constructs. One of our test proteins, Ape1 R237C (Figure 2B, Clone 57-1) was used as a benchmark because it is moderately expressed as a native protein at approximately 1% of the soluble protein in E. coli. The yield of purified R237C protein following two purification steps using ion exchange chromatography16 is approximately 1 mg per liter of culture. This is near the practical lower limit of expression yield for structural studies, therefore this signal (2000 RFU; ∼4 µg/mL) served as an approximate threshold for pursuing structural studies. Dot Blot Expression Screening of BODIPY-FL-Labeled Proteins. IVT-based expression testing was performed on 45 different clones, including prokaryotic genes and eukaryotic cDNAs that were expressed from several different plasmids, pIVEX2.4b (Roche Diagnostics), pET28 (Novagen) and pETJournal of Proteome Research • Vol. 3, No. 5, 2004 1027

research articles

Coleman et al.

24 000 RFU). The 13 prokaryotic genes yielded an average fluorescence signal of 19 800 RFU and the 32 eukaryotic genes gave an average signal of 7900 RFU. From four independent replicates of the pIVEX-GFP positive control, the average signal was 39 708 RFU with a standard deviation (1σ) of 6352 RFU ((16%), and an average signal-to-noise ratio of 6.3.

Figure 3. Fluorescence detection of in vitro expressed, BODIPYFL-labeled proteins. A, Forty-five experimental expression constructs, and three controls were expressed in vitro and spotted in duplicate (left and right halves). The 45 expression clones included 4 pIVEX2.3-based clones (B1-E1), 31 pET28 clones (F1H1, A2-D3, A5-H6) and 10 pETBlue-2 clones (E3-H4). The positive control pIVEX-GFP (A1, A7) and the negative lysate alone control (A2, A8) are also shown. B, Quantification of data from Panel A. The fluorescence data were quantified using ImageQuant software, averaged and the background fluorescence from the lysate only control was subtracted. The data are plotted sequentially, column by column, and are named according to the left half of the blot.

Blue-2 (Novagen). PCR and IVT reactions were performed in duplicate and vacuum blotted onto a polyvinylidene fluoride (PVDF) membrane (Immobilon-P; Millipore). The blot was washed and visualized based on the fluorescence of incorporated BODIPY-FL-Lys residues (Figure 3A). The control strip (Figure 3A, top) containing 2-fold serial dilutions of the positive control reaction (pIVEX-GFP) showed that the fluorescence signal was linear with respect to the quantity of BODIPY-FLlabeled protein above 2000 RFU. The fluorescence signal of duplicate measurements (Figure 3A, left and right halves) was averaged, the background fluorescence (wells H1, H7) was subtracted and the data were plotted as a histogram (Figure 3B). This experiment demonstrated that the BODIPY-FL-Lys conjugate was incorporated by the ribosome, that there was a large dynamic range in the signal among different proteins and that duplicate reactions gave comparable signals. SDS-PAGE analysis confirmed that only the protein expressed under control of the T7 promotor contained the fluorescent Lys derivative (data not shown). The pIVEX2.4b plasmid (Figure 3, A1-E1), which was optimized for in vitro expression, showed good expression yields (3000-40 000 RFU). The pET28 clones (F1-H1, A2-D3, A5-H6) showed variable expression that ranged from background (corrected to 0 RFU) to very high (70 000 RFU). The pETBlue-2 clones (E3-H4) exhibited low to moderate expression (10001028

Journal of Proteome Research • Vol. 3, No. 5, 2004

Rapid Protein Detection in a Microarray Format. We used protein microarrays for comparison of relative expression levels for the purpose of protein expression screening. We deposited IVT-expressed proteins onto microarrays for either direct fluorescence or immunological detection. Control experiments were first conducted to assess the sensitivity and linearity of GFP spotted directly (Figure 4A). The range of protein amounts that was detected based on direct GFP fluorescence was 2 × 10-13 g to 7 × 10-9 g. The threshold fluorescence signal was defined as being greater than 3 times higher than the signalto-noise ratio of the nonfluorescent control protein BSA. In this range, the fluorescence signal for the standard curve was linear with an R2 value of 0.90. Next, IVT-expressed GFP fusion proteins were arrayed in replicate, which showed that different fusion proteins exhibited reproducible fluorescence signals (Figure 4B, upper panel). Quantification of five replicates gave an accurate measure of soluble expression levels. The five different fusion proteins that were examined exhibited standard deviations (1σ) of 4-8% (Figure 4B, lower panel). For relative quantification, the fluorescence signals were normalized to a reference protein, LcrH-GFP. Using this approach, microarraybased expression screening rapidly and reproducibly identified differences in relative protein expression levels. Finally, to demonstrate protein-specific detection, we performed array-based immunoassays to detect GFP fusion proteins. For these experiments, IVT expressed GFP fusion proteins were spotted in duplicate on a GAPS-coated glass slide. Immunological detection used an anti-GFP 1° Ab and a Rhodamine-conjugated 2° Ab. Immunodetection of a slide similar to that shown in Figure 4A indicated that the sensitivity was comparable to direct fluorescence measurements, but that the signal was somewhat more noisy, with an R2 value of 0.67 for Rhodamine versus a value of 0.9 for GFP (data not shown). Next, microarrays were printed with 15 different GFP fusion proteins and controls containing either Cy-labeled DNA (red) or GFP (green) (Figure 4C, row A). Experimental spots containing the GFP fusion proteins exhibited GFP fluorescence (green) and Rhodamine 2° Ab fluorescence (red) to varying extents (Figure 4C, rows B, C). Antibody binding to GFP (or GFP fusion protein) led to a mixture of GFP and 2° Ab fluorescence, which produced a yellow color (e.g., A2, A3, C4, C7). For the 15 experimental clones, the GFP and Rhodamine fluorescence signals (quantified separately) exhibited a good correlation (cc ) 0.77), which shows that array-based immunoassays can be performed in a semiquantitative fashion. The equation for the correlation coefficient (cc) is as follows: n

F (x, y) ) (1/n)

∑[(x - 〈x〉)(y - 〈y〉)]/σ ‚σ j

j

x

y

j)1

These experiments demonstrate both direct detection of GFP fusion proteins and array-based immunodetection for protein expression screening, in addition to its previously reported application for macromolecular interaction studies.5 As an alternative to detection of GFP fusion proteins, these experiments were also adapted to use His6 tagged proteins using an

research articles

Fluorescence-Based Protein Expression Screening

Figure 5. Correlation of in vitro and in vivo protein expression levels. GFP fusion proteins (14) were expressed in vitro in E. coli extracts or IPTG-induced E. coli cultures as described in Experimental Methods. Clone identities are given in Table 1. The soluble products were transferred to a new 96-well plate and the fluorescence was quantified in a plate reader. The correlation coefficient was 0.89 for the eight most highly expressed proteins and 0.69 for all 13 proteins. Table 2. Comparison of Different Protein Expression and Detection Methods protein

E. coli/plate

IVT/plate

IVT/dot blot

IVT/array

average

LcrHa LcrV LcrG SFN5 GFP Ape1 XRCC1

1.0 2.4 1.5 0.7 nd 0.5 0.3

1.0 ndb 1.0 0.8 0.3 0.3 0.3

1.0 0.9 nd 0.1 0.6 0.2 0.3

1.0 nd 0.8 0.8 0.6 0.1 0.4

1.0 1.8 1.1 ( 0.36 0.6 ( 0.34 0.5 ( 0.17 0.3 ( 0.17 0.3 ( 0.05

a Expression levels from each screening method were normalized to those of the LcrH-GFP fusion protein. b Not determined.

Figure 4. Microarray based protein expression screening. A, Detection limit of spotted GFP. Serial dilutions of purified GFP were spotted in duplicate on a GAPS glass slide. A 1 nL volume of protein solution (1 ng to 5 fg) was spotted with a spot size of approximately 300 µm. Column 1, 10-9 g; 2, 5 × 10-10 g; 3, 2 × 10-10 g; 4, 10-10 g; 5, 10-12 g; 6, 5 × 10-13 g; 7, 2 × 10-13 g; 8, 10-14 g; 9, 5 × 10-15 g; 10, BSA. B, IVT-expressed GFP fusion proteins. Soluble IVT products (1 nL) were printed in five replicates (three are shown). Green, GFP fluorescence; red, Cy3labeled DNA. Row 1, LcrH-GFP; 2, GFP-H6; 3, XRCC1-GFP; 4, LcrGGFP; 5, Lysate control; 6, Cy3-DNA; 7, SFN5-GFP; 8, DNA. The fluorescence (mean pixel intensity) of replicate spots is reproducible with a standard deviation (1σ) of 4-8%. The quantified fluorescence for GFP fusion proteins correlates with expression levels detected with a plate reader or by spotting onto a PVDF membrane. C, Immunological detection of GFP-fusion proteins on a microarray. Proteins were spotted as above and were detected with an anti-GFP 1° Ab and a Rhodamine-labeled 2° Ab. The GFP (green) and Rhodamine (red) fluorescence was detected with 488 and 543 nm filters, respectively. A yellow spot indicates fluorescence from both GFP and the 2° Ab. Control spots included Cy3-labeled DNA (A1, A4, A5, A6, A8) and purified GFP (A2, A3). Experimental spots include: (row 2) 16-1, 6-1, 5-3, 4-4, 3-2, 2-1, 1-1; (row 3), 19-1, 11-1, 9-1, 8-1, 7-1, 20-1, 18-1, 171. For 15 GFP fusion proteins, the GFP and Rhodamine fluorescence showed a positive correlation (cc ) 0.77).

anti-penta-His 1° Ab (Qiagen) and a Rhodamine-conjugated 2° Ab (Molecular Probes) to allow rapid expression screening of minimally tagged proteins (data not shown).

Correlation between in Vitro and in Vivo Expression. To validate the use of cell-free protein expression screening as a first-pass filter to guide larger scale bacterial expression experiments, we examined the correlation of yields from IVT and bacterial protein expression. This experiment employed 13 different human and bacterial clones that were expressed as C-terminal GFP fusions14 (Figure 5). The expression clones were sorted based on their in vivo expression levels and were partitioned into two groups with relatively high and low expression, respectively. The threshold between these two groups, ∼2500 RFU, corresponded to the threshold previously identified for high (> 1 mg/L) level expression in vivo (Figure 2B). The eight proteins that were most highly expressed in vivo exhibited a good correlation (cc) ) 0.89), whereas the five most poorly expressed proteins showed a weaker correlation. The cc for all 13 proteins was 0.69. Only one of the more highly expressed proteins, Y. pestis YopD, did not fit this pattern and was significantly better expressed in vivo than in vitro. For the more poorly expressed clones, the in vitro expression levels were consistently and significantly higher than those in vivo (Figure 5). It seems likely that this set comprised proteins that were cytotoxic or proteolytically sensitive in vivo, which underscores the benefits of cell-free expression for certain classes of proteins. Correlation of All Expression Screening Methods. Finally, we compared relative expression levels of seven proteins to validate the four different fluorescence-based screening methods that we have implemented (Table 2). The expression and detection methods included the following: (1) E. coli expressed soluble protein quantified with a plate reader; (2) soluble IVT Journal of Proteome Research • Vol. 3, No. 5, 2004 1029

research articles

Coleman et al.

products quantified by plate reader; (3) soluble IVT products quantified by dot blot; and (4) soluble IVT products quantified by microarray. The expression data were normalized using a highly expressed protein that was tested using all four methods, Y. pestis LcrH. Although the relative expression yields varied somewhat using each method, the general pattern relative to the normalization standard was maintained. For example, the Y. pestis LcrG protein was expressed at high levels (0.8-1.5) in each of three screening methods examined. In addition, a protein that was expressed at lower levels, human XRCC1, showed very consistent expression among the four methods (0.3-0.4). However, some proteins such as LcrV and SFN5 showed greater variability (Table 2). Nonetheless, it is of significance that expression levels measured on the protein microarrays were comparable to direct fluorescence measurements and dot blots.

compared. The five proteins that were produced at the lowest levels in vivo were made at significantly higher levels in vitro. This discordance suggests that the latter set of proteins include those that are cytotoxic when highly expressed in E. coli or are susceptible to proteolysis in vivo. Notably four of the five proteins that were produced at significantly higher quantities in vitro were human DNA repair proteins and the fifth one was a virally encoded DNA replication helicase. This suggests that IVT expression is particularly useful for production of such difficult classes of proteins. The threshold between the more highly and more poorly expressed proteins (∼2500 RFU) corresponds closely with the threshold identified for production of milligram quantities of purified protein (Figure 2B). This agreement indicates that in vivo expression screening may be useful to differentiate between proteins that can be expressed efficiently in vivo versus those that require cell-free expression.

Discussion

In vivo protein expression screening is inexpensive and efficient for many proteins;21-23 however, it is somewhat difficult to pursue in high throughput due to steps in bacterial transformation and growth. Furthermore, a subset of the proteins examined here were expressed at very low levels in vivo, which indicates that some proteins may not be amenable to this approach or that further optimization of expression expression conditions (i.e., induction time, temperature, inducer concentration, cell strain) is required.21 In contrast, in vitro protein expression screening is rapid, can be adapted to massively parallel processes and provides access to proteins that cannot be produced efficiently in cell-based systems.18 The implementation of both in vivo and in vitro expression strategies yields a larger number of expressed proteins than either method alone, thus there are complementary benefits of in vitro and in vivo protein expression screening.

Successful expression of many proteins from the proteome requires an arsenal of approaches, including a variety of affinity tags or fusions and different expression systems. We have developed a high-throughput, fluorescence-based protein expression screening scheme using IVT to identify highly expressed proteins, which is useful for example for structural studies. The generalized approach consists of several steps, including the following: (1) PCR amplification of T7 promotorbased expression clones; (2) cell-free protein expression using RTS-100, with optional incorporation of a fluorescent label; (3) transfer of the soluble fraction to microplate, membrane or glass slide; and (4) detection by fluorescence or immunoblotting. Using fluorescence detection, the entire procedure can be carried out in approximately 8 h. The products are useful for downstream applications such as functional assays, protein interaction studies and protein arrays. For structural studies, cell-free expression screening allows effort to be focused on the more highly expressed clones, which can be readily scaled up in bacterial expression systems or in larger IVT reactions. These larger (1-10 mL) reactions employ continuous nutrient exchange to achieve high yields,17 as much as 6 mg protein per mL extract.18 Our high-throughput, in vitro expression and detection methods are rapid since they require no electrophoresis or immunoblotting. In addition, the detection methods are flexible, since they can employ either fluorescent fusion proteins or covalently incorporated labels.19,20 BODIPY-FL-Lys labeling is preferable if no affinity tag or a variety of tags is encoded on the test set of expression clones. Since the RTS IVT system is based on expression from the T7 promotor, only the recombinantly expressed protein of interest and the unincorporated BODIPY-FL-Lys conjugate yield a fluorescent signal. For nonT7 based clones, an additional PCR amplification step can be performed to incorporate the necessary regulatory sequences (Linear Template Generation Set, Roche Diagnostics). A flexible set of approaches for cell-free protein expression enables automated production of many proteins and their subsequent purification. We observed a positive correlation between in vitro and in vivo expression levels of thirteen different proteins. It is perhaps to be expected that in vitro expression in E. coli extracts would be comparable to in vivo expression in E. coli, at least for some classes of proteins. We observed a close correlation (0.89) for the nine proteins that were most highly expressed in vivo and a weaker correlation (cc ) 0.69) when all 13 proteins were 1030

Journal of Proteome Research • Vol. 3, No. 5, 2004

In addition to the correlation between in vitro and in vivo expression levels, we observed a concordance among all four combinations of synthesis, labeling and detection methods that were tested. These include the following: (1) fluorescence detection of GFP fusion proteins (synthesized in vitro and in vivo); (2) fluorescence detection of BODIPY-FL-Lys incorporated into tagged or nontagged proteins; and (3) protein microarrays. The observed variability among methods for some proteins (Table 2) may result from one or more experimental parameters of the different methods, for example better expression in vivo, differential binding of proteins to the PVDF membrane, differential stabilization of GFP or differential incorporation of BODIPY-FL-Lys due to variation in the number of Lys residues in different proteins. Nonetheless, the correlation of expression levels for a variety of proteins using various methods (Table 2) underscores the potential value of expression protein screening using a microarray format. A useful application of high-throughput, cell-free protein expression is the identification of expressed proteins from hypothetical genes or cDNA expression libraries. Previous approaches have primarily involved in vivo expression methods,21-24 which have the primary benefit of low cost and well-established methodology. High-throughput, in vivo expression has been implemented in a variety of manifestations to suit particular experimental goals. For example, the expression of hundreds of human cDNAs has been analyzed by electrophoresis of partially purified proteins.22 Optimization of expression parameters, including inducer concentration, induction time and temperature, for both prokaryotic and eukaryotic proteins, has been carried out using a dot immunoblot procedure.21 A third

Fluorescence-Based Protein Expression Screening

screening approach employed an enzyme-linked assay using horseradish peroxidase coupled to Ni2+ ions in conjunction with a genetic reporter for protein folding to screen 186 Thermotoga maritima proteins.24 The latter approach has the unique advantage of being able to differentiate misfolded and well-folded proteins. In addition, several cell-free approaches for expression screening for soluble proteins have been described using both prokaryote25 and eukaryote-derived extracts.26 These screening methods generally employ gel-based visualization of partially purified proteins or dot-blot expression screening through immunological detection of an affinity tag. Independent efforts to develop fluorescent protein reporters and labeling systems have been described. These include GFP fusion proteins,14,27 BODIPY-FL labeling19,20 and a highly sensitive assay based on hirudin-mediated inhibition of cleavage of a fluorogenic thrombin substrate.28 The combination of these cell-free expression and fluorescence labeling technologies presents the primary benefits of being more rapid and sensitive than other methods for protein expression screening (with the exception of radiolabeling). In our experience, our approach is better suited for miniaturization and automation than previously reported screening methods and is therefore applicable to highly parallel expression screening efforts. Fluorescence detection methods have several advantages over immunological detection. Our results show that direct fluorescence measurements exhibit better linearity than using a fluorescently labeled 2° antibody. This may be the result of additional variables in the immunodetection procedure such as nonquantitative binding of either of the two antibodies or the limited accessibility of antibody binding sites on the protein microarray. Although detection of a fusion protein or an affinity tag provides a convenient, universal basis for detection, it should be noted that in some cases His6 tags are known to have effects on protein solubility and enzyme activity.22,29 However, immunological detection methods, although much slower, provide some advantages over fluorescent labeling. Dot-blots using immunological detection in conjunction with C-terminal affinity tags have the advantage of being able to detect only full-length translation products. In addition, with immunoblotting the signal does not depend on the number of Lys residues in the protein, therefore the signal can be more easily standardized. Although Lys residues are usually located on the surface of proteins, BODIPY-FL-conjugated Lys residues may affect the protein conformation. The latter two limitations of BODIPY-FL-Lys labeling are addressed by the recent development of a fluorescent labeling scheme that involves amber suppression to incorporate an N-terminal BODIPY-FL-Met residue.20 The C-terminal GFP fusion construct was originally generated as a reporter for protein folding. The fluorescence of different C-terminal GFP fusion proteins was shown to correlate closely with the solubility of the native protein.14 The use of a C-terminal GFP fusion (i.e., protein-GFP) results in the protein of interest being translated first. Therefore, the folding kinetics of the protein of interest likely influences whether the entire fusion protein misfolds and aggregates, which apparently quenches the fluorescence signal. However, the proper folding of the fusion protein is not the sole determinant of fluorescence, as the solubility of properly folded proteins may also affect the fluorescence signal.2,14 The fluorescence signal is also proportional to the total protein quantity (i.e., expression yield), enabling the use of the GFP as an expression reporter.

research articles Related fluorescence-based methods have been used to identify protein variants with improved solubility. As mentioned above, the GFP fluorescence signal is directly related to the solubility of the native protein (and presumably to the solubility of the fusion protein). As a result, some proteins have been engineered for increased solubility using the C-terminal GFP reporter system.2,14 Five proteins, including Gene V, bullfrog ferritin,14 and three insoluble Pyrobaculum aerophilum proteins2 were evolved to yield a higher fluorescence signal and exhibited a significant improvement in the ratio of soluble to insoluble target protein in cellular lysates. A similar approach was used to screen a library of random cDNA fragments for the expression of soluble domain constructs.30 A clever in vivo genetic selection using chloramphenicol acetyltransferase fusion proteins avoids the use of labor-intensive, in vitro mutagenesis methods;31 however, it is not clear whether this approach can yield the mutant complexity that is accessible by recombinational mutagenesis methods such as DNA shuffling.32 An expression screening assay that uses GFP as a reporter requires that the GFP moiety not exert significant effects on the stability or solubility of the proteins to which it is fused. N-terminal fusions with stable, soluble proteins such as the maltose binding protein (MBP) and glutathione-S-transferase (GST) have been used to enhance the stability and/or solubility of its partners.33 Although wild-type GFP is not particularly soluble when expressed in E. coli,34 the C-terminal GFP fusion vector14 incorporates mutations obtained through directed evolution32 and combinatorial mutagenesis35 to enhance solubility, fluorecence amplitudes, and wavelength maxima. The critical aspect of the folding reporter appears to be the orientation of the fusion (i.e., protein-GFP), since MBP,36 GST37 and the GFP cycle 3 mutant (P. T. Beernink, unpublished results) are all highly stable, with calorimetric transition temperatures > 54 °C. In many cases, GFP fusion proteins retain the normal functions of the corresponding native proteins.38-40 However, in other cases, for functional studies, it may be necessary to generate native or affinity tagged proteins that do not bear fluorophores. There are several possible approaches when a native or minimally tagged construct is desired. First, one can clone the same PCR product (or subclone from the first construct) into a restriction-compatible vector for parallel expression of a native or a minimally tagged version. Second, an amber suppression strategy can be used to make a Cterminal GFP fusion protein only when the amber codon is suppressed (41; G. S. Waldo, personal communication). Finally, IVT reactions can be carried out under similar conditions without BODIPY-FL-Lys incorporation to generate unlabled, native proteins or hexahistidine tagged proteins. Once sufficient expression has been established using a fluorescence-based detection strategy, one of these approaches can be used to obtain native or minimally tagged versions of the desired proteins. In summary, we have implemented fluorescence-based protein expression screening in several high-throughput formats, including a novel implementation of protein microarray expression screening. We have applied these screening methods to subsets of approximately 50 prokaryotic and eukaryotic proteins. Since the various methods correlate reasonably well, the choice of a particular screening format depends on the specific requirements of the experimenter and access to appropriate detection hardware. This correlation validates the Journal of Proteome Research • Vol. 3, No. 5, 2004 1031

research articles use of microarray-based protein expression screening. Further miniaturization of cell-free expression reactions and continued development of microarray based screening techniques will enable rapid, efficient screening of hundreds or thousands of expression clones with small sample requirements.

Acknowledgment. We thank Simone Krupka and Brenda Marsh for technical assistance and Drs. Sharon A. Doyle and George A. Martin for valuable discussions and critical reading of the manuscript. This work was performed under the auspices of the U.S. Department of Energy by the University of Calif., Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48 and was partially supported by a University of California BioSTAR Grant (01-10152) to P.T.B. References (1) Lennon, G.; Auffray, C.; Polymeropoulos, M.; Soares, M. B. Genomics 1996, 33, 151-152. (2) Pedelacq, J. D.; Piltch, E.; Liong, E. C.; Berendzen, J.; Kim, C. Y.; Rho, B. S.; Park, M. S.; Terwilliger, T. C.; Waldo, G. S. Nat. Biotechnol. 2002, 20, 927-932. (3) Haab, B. B.; Dunham, M. J.; Brown, P. O. Genome Biol. 2001, 2, research0004.1-0004.13. (4) Pawlak, M.; Schick, E.; Bopp, M. A.; Schneider, M. J.; Oroszlan, P.; Ehrat, M. Proteomics 2002, 2, 383-393. (5) Coleman, M. A.; Miller, K. A.; Beernink, P. T.; Yoshikawa, D. M.; Albala, J. S. Proteomics 2003, 3, 2101-2107. (6) Kawahashi, Y.; Doi, N.; Takashima, H.; Tsuda, C.; Oishi, Y.; Oyama, R.; Yonezawa, M.; Miyamoto-Sato, E.; Yanagawa, H. Proteomics 2003, 3, 1236-1243. (7) Lee, Y.; Lee, E. K.; Cho, Y. W.; Matsui, T.; Kang, I. C.; Kim, T. S.; Han, M. H. Proteomics 2003, 3, 2289-2304. (8) Kononen, J.; Bubrndorf, L.; Kallioniemi, A.; et al. Nat. Med. 1998, 4, 844-847. (9) Huang, R. P. J. Immunol. Methods 2001, 255, 1-13. (10) Tsalkova, T.; Zardeneta, G.; Kudlicki, W.; Kramer, G.; Horowitz, P. M.; Hardesty, B. Biochemistry 1993, 32, 3377-3380. (11) Frydman, J.; Hartl, F. U. Science 1996, 272, 1497-1502. (12) Klammt, C.; Lohr, F.; Schafer, B.; Haase, W.; Dotsch, V.; Ruterjans, H.; Glaubitz, C.; Bernhard, F. Eur. J. Biochem. 2004, 27, 568580. (13) Kim, D. M.; Swartz, J. R. Biotechnol. Bioeng. 2004, 85, 122-129. (14) Waldo, G. S.; Standish, B. M.; Berendzen, J.; Terwilliger, T. C. Nat. Biotechnol. 1999, 17, 691-695. (15) Myers, R. M.; Fischer, S. G.; Maniatis, T.; Lerman, L. S. Nucleic Acids Res. 1985, 13, 3111-3129.

1032

Journal of Proteome Research • Vol. 3, No. 5, 2004

Coleman et al. (16) Erzberger, J. P.; Barsky, D.; Scharer, O. D.; Colvin, M. E.; Wilson, D. M. 3rd. Nucleic Acids Res. 1998, 26, 2771-2778. (17) Spirin, A. S. Bioorg. Khim. 1992, 18, 1394-1402. (18) Martin, G. A.; Kawaguchi, R.; Lam, Y.; DeGiovanni, A.; Fukushima, M.; Mutter, W. BioTechniques 2001, 31, 948-953. (19) Gite, S.; Mamaev, S.; Olejnik, J.; Rothschild, K. Anal. Biochem. 2000, 279, 218-225. (20) Mamaev, S.; Olejnik, J.; Olejnik, E. K.; Rothschild, K. J. Anal. Biochem. 2004, 326, 25-32. (21) Doyle, S. A.; Murphy, M. B.; Massi, J. M.; Richardson, P. M. J. Proteome Res. 2002, 1, 531-536. (22) Bussow, K.; Nordhoff, E.; Lubbert, C.; Lehrach, H.; Walter, G. Genomics 2000, 65, 1-8. (23) Braun, P.; Hu, Y.; Shen, B.; Halleck, A.; Koundinya, M.; Harlow, E.; LaBaer, J. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 2654-2659. (24) Lesley, S. A.; Graziano, J.; Cho, C. Y.; Knuth, M. W.; Klock, H. E. Protein Eng. 2002, 15, 153-160. (25) Busso, D.; Kim, R.; Kim, S. H. J. Biochem. Biophys. Methods 2003, 55, 233-40. (26) Sawasaki, T.; Ogasawara, T.; Morishita, R.; Endo, Y. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 14 652-14 657. (27) Wang, S.; Hazelrigg, T. Nature 1994, 369, 400-03. (28) Hempel, R.; Wirsching, F.; Schober, A.; Schwienhorst, A. Anal. Biochem. 2001, 297, 177-182. (29) Hammarstrom, M.; Hellgren, N.; van Den Berg, S.; Berglund, H.; Hard, T. Protein Sci. 2002, 11, 313-321. (30) Kawasaki, M.; Inagaki, F. Biochem. Biophys. Res. Commun. 2000, 280, 842-844. (31) Maxwell, K. L.; Mittermaier, A. K.; Forman-Kay, J. D.; Davidson, A. R. Protein Sci. 1999, 8, 1908-1911. (32) Crameri, A.; Whitehorn, E. A.; Tate, E.; Stemmer, W. P. C. Nat. Biotech. 1996, 14, 315-319. (33) Kapust, R. B.; Waugh, D. S. Protein Sci. 1999, 8, 1668-1674. (34) Tsumoto, K.; Umetsu, M.; Kumagai, I.; Ejima, D.; Arakawa, T. Biochem. Biophys. Res. Commun. 2003, 312, 1383-1386. (35) Cormack, B. P.; Valdivia, R. H.; Falkow, S. Gene 1996, 173, 3338. (36) Yang, Y. R.; Schachman, H. K. Biophys. Chem. 1996, 59, 289297. (37) Brockwell, D.; Yu, L.; Cooper, S.; McCleland, S.; Cooper, A.; Attwood, D.; Gaskell, S. J.; Barber, J. Protein Sci. 2001, 10, 572580. (38) Barak, L. S.; Ferguson, S. S.; Zhang, J.; Martenson, C.; Meyer, T.; Caron, M. G. Mol. Pharmacol. 1997, 51, 177-184. (39) Lincker, F.; Philipps, G.; Chaboute, M. E. Nucleic Acids Res. 2004, 32, 1430-1438. (40) Petrova, V. Y.; Drescher, D.; Kujumdzieva, A. V.; Schmitt, M. J. Biochem. J. 2004, Mar 4 [Epub ahead of print]. (41) Kuriki, Y. FEMS Microbiol. Lett. 1993, 107, 71-76.

PR049912G