Article pubs.acs.org/bc
Efficient Identification of Murine M2 Macrophage Peptide Targeting Ligands by Phage Display and Next-Generation Sequencing Gary W. Liu,†,# Brynn R. Livesay,†,# Nataly A. Kacherovsky,† Maryelise Cieslewicz,† Emi Lutz,† Adam Waalkes,‡ Michael C. Jensen,†,§ Stephen J. Salipante,*,‡ and Suzie H. Pun*,† †
Departments of Bioengineering and Molecular Engineering and Sciences Institute and ‡Laboratory Medicine, University of Washington, Seattle, Washington 98195, United States § Ben Towne Center for Childhood Cancer Research, Seattle Children’s Research Institute, Seattle, Washington 98145, United States S Supporting Information *
ABSTRACT: Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (40% and >32% of the selected phage library in replicates 1 and 2, respectively), appeared only 1 and 3 times, respectively, out of the 20 Sanger-sequenced clones in the first and second replicates. Based on the Sanger sequencing results in the first replicate, we would have been unable to identify a valid consensus sequence. In contrast, the second replicate revealed 5 distinct consensus sequences that did not reflect the proportions reported by next-generation sequencing. These observations underscore the high degree of variability inherent in a Sanger sequencing approach owing to sample size constraints, which are overcome by high-throughput nextgeneration sequencing. We employed Clustal Omega alignment analysis of the top 20 sequences identified by next-generation sequencing in each biological replicate to identify potential motifs (defined as a minimum of three consecutive amino acid residues) for binding studies.12 The most abundant sequence, “WPT”, and motifs that were present in multiple sequences were considered strong candidates for binding studies. Multiple sequence alignment analysis identified four hypothesized motifs (reported along with their parent peptide sequence, in parentheses): “DPW” (QSDPWLVSRWFA), “YPS” (TYPSTQWFFAKF), “SEQ” (FFPSEQVLIAAL), and “LPS” (GLPSSAELERLW) (colored, Figure S2). In addition, a fifth sequence YPSSEQLLAWWG containing both the YPS and SEQ motifs in tandem was also selected (“YPSSEQ”). To identify potential nonspecific,
Figure 3. Phage clone binding of (A) selected phage clones and (B) a preferential amplifier to M1 and M2 macrophages by flow cytometry. Phage binding is expressed as median fluorescence intensity (MFI). Data reported are the mean MFI ± standard deviation (n = 3). AU, arbitrary units. Statistical analysis performed with a two-tailed Student’s t test compared to phage-treated M1 macrophage. *pvalue 2 in UA1. In a prospective approach following the strategy recommended by ‘t Hoen, these selective binders would have been discarded. An improved strategy to identify selective binding sequences may be to assess sequence abundance in a UA4 library, which better represents the amplification cycles of the selected libraries. With this approach, only the two sequences DWSSWVYRDPQT (“DWSS”) and WPLWSFDWPQNA (“WPLW”) would be identified as false positive sequences. C
DOI: 10.1021/acs.bioconjchem.5b00344 Bioconjugate Chem. XXXX, XXX, XXX−XXX
Article
Bioconjugate Chemistry
Figure 4. Fraction of total sequences in Illumina next-generation data for confirmed binding sequences (blue) and nonspecific preferential amplifying sequences (red) throughout four rounds of phage display panning in both the first (left) and second (right) biological replicates.
Both “DWSS” and “WPLW” contain the plastic binding motif WXXW.13 We have also observed “DWSS” in the top 20 of next-generation sequencing results from phage display experiments against other targets (data not shown). To confirm nonspecificity of parasitic sequences, we tested binding of VHWDFRQWWQPS (“VHW”), which was present in both the UA1 (abundance: 199, proportion of library: ∼0.0069%) and UA4 (abundance: 41, proportion of library: ∼0.0043%) libraries and has been identified as a consensus sequence in display strategies against other targets pursued by our group. As tested by flow cytometry, this sequence did not exhibit significant macrophage binding compared to controls (Figure 3B). Binding Sequences Enrich Throughout Subtractive Phage Display Rounds. We next utilized multiple analytical tools to observe how binding sequences emerge in nextgeneration sequencing data and to find a method that would successfully identify binding sequences while eliminating false positives. In this context, the six validated binding sequences served as positive controls and the known parasitic sequences as negative controls. To characterize enrichment dynamics, the fractional abundance of the six binding sequences and three parasitic sequences were tracked throughout the four rounds of phage display for each biological replicate (Figure 4). Binding sequences exhibited lower abundance in the first and second rounds of phage display, but consistently increased in proportion after each round. Parasitic sequences generally exhibited higher fractional abundance in early rounds of the panning experiments, but began to plateau or decrease in fractional abundance in the last two rounds as binding sequences began to dominate the sequence space. Similar trends were observed when comparing the rank in abundance of binding and parasitic sequences throughout each round of phage display (Figure S4). Therefore, performing multiple rounds of phage display enhances discrimination between binding and parasitic sequences through changes in rank and proportion throughout the display rounds. While binding peptides may theoretically be identified without multiple subtractive rounds,8 in this report we find that additional rounds of selection enhanced the representation of desired target-binding phage versus undesired nonspecific phage. These observations emphasize the importance of experimental design for successful enrichment of targeting-binding sequences. We have observed a much higher prevalence of parasitic sequences in phage display experiments following less stringent selection protocols, such as using immobilized
proteins for positive selection and the immobilizing substrate for negative selection (data not shown). The selection strategy and amplification in bacteria between rounds both contribute to the “evolutionary” pressure placed on bacteriophage, selecting for phage that both amplify (leading to parasitic sequences) and bind to nontarget materials (plastic binders, nonspecific sequences) used in the display throughout an experiment. Here, the UA4 library enabled discrimination between targetbinding phage and parasitic phage. Algorithmic Identification of Binding Motifs. As we observed the greatest phage titers and the least library diversity in the fourth round of each replicate (Figure S3), these libraries provided data to explore the use of automated motif analysis algorithms to provide a clear and principled representation of target binding peptide motifs with minimal representation of parasitic peptide motifs. The MEME analysis (Multiple Expectation Maximization for Motif Elicitation) algorithm developed by Bailey utilizes an expectation maximization to fit a two-component, finite mixture model to identify multiple motifs of varying lengths within a DNA or protein sequence set.14,15 By using a “greedy” heuristic computation, MEME is able to sample a large number of sequences (up to 1000) in reasonable computing time while still identifying statistically significant motifs. Peptide sequences that appeared at least 10 times in the fourth round of both biological replicates were used to optimize analysis settings; we omitted sequences with abundance less than 10, as this method equally weights all sequences and low abundant sequences could introduce noise. Scanning for motifs between 6 and 12 amino acids in length resulted in the most robust representation of confirmed binding sequences and hypothesized motifs. MEME analysis without any comparison of the phage display libraries with the UA4 library identified binding sequence motifs as well as a DWSS parasitic sequence motif in round 1.4 (Figure S5). Removal of sequences that were present in the UA4 library before MEME analysis resulted in a robust representation of motifs and their binding sequences and good agreement between the two biological replicates (Figure 5). This approach effectively captured all of the hypothesized motifs present in the validated M2-binding sequences “DPW”, “YPSSEQ”, “YPS”, “SEQ”, and “LPS”. In addition, we observed that MEME was most effective after a substantial collapse of the library to diversity below 2 × 105 unique sequences. MEME analysis was able to identify many of the hypothesized binding motifs using sequences from the third round of biological replicate 2 and the fourth round of both replicates, but was only able to identify one significant motif D
DOI: 10.1021/acs.bioconjchem.5b00344 Bioconjugate Chem. XXXX, XXX, XXX−XXX
Article
Bioconjugate Chemistry
for targeting murine M2 macrophage were discovered by the combination of phage display selection, next generation sequencing, and data analysis. Based on our experience, we suggest the following workflow and methodological considerations to increase the productivity of phage display experiments and to improve the success rate of identifying target-binding sequences: 1. Experimental Design. A 100-fold representation of the ̈ library should be amplified the same number of rounds as naive the experimental library (UAX, where X denotes the number of amplifications), with no selection step between amplifications to aid in identifying parasitic sequences. Performing experiments in biological replicates enables comparison analyses that can be helpful in confirming conserved binding motifs. 2. Next-Generation Sequencing. Sample-specific barcodes can be used to multiplex multiple experimental samples onto a single sequencing run in order to minimize cost and time. In these studies, allocating 1 × 106 sequencing reads per sample gave an adequate representation of the sequence space. Sequencing the last round of phage display performed and the UAX library is sufficient for identification of false positives and can be used to identify conserved motifs. Fidelity of sequencing data is also a consideration, and efforts should be taken to utilize high-fidelity next-generation sequencing platforms17−19 to limit the number of artifactual sequences recovered. 3. Data Formatting and Analysis. Translate qualityfiltered sequencing data into counts of unique peptide sequences. Eliminate sequences from the selected library that are present in the UAX library and retain filtered sequences with abundance >10 sequence reads. We have developed an analysis pipeline, “PepRS”, to facilitate these steps. 4. MEME Analysis. Perform MEME motif analysis over a range of 6−12 amino acids in motif length (this may have to be adjusted depending on the size of the displayed peptide) and search for up to 10 motifs. Only consider motifs with E-values