Deconvolution of Combinatorial Libraries for Drug Discovery

Developmental Biology, University of Colorado, Boulder, Colorado 80309. Received March 4, 1996X. Synthesis and testing of mixtures of compounds in a c...
11 downloads 6 Views 593KB Size
2710

J. Med. Chem. 1996, 39, 2710-2719

Deconvolution of Combinatorial Libraries for Drug Discovery: Theoretical Comparison of Pooling Strategies Danielle A. M. Konings,‡,§ Jacqueline R. Wyatt,† David J. Ecker,† and Susan M. Freier*,† ISIS Pharmaceuticals, 2292 Faraday Avenue, Carlsbad, California 92008, and Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309 Received March 4, 1996X

Synthesis and testing of mixtures of compounds in a combinatorial library allow much greater throughput than synthesis and testing of individual compounds. When mixtures of compounds are screened, however, the possibility exists that the most active compound will not be identified. The specific strategies employed for pooling and deconvolution will affect the likelihood of success. We have used a nucleic acid hybridization example to develop a theoretical model of library deconvolution for a library of more than 250 000 compounds. This model was used to compare various strategies for pooling and deconvolution. Simulations were performed in the absence and presence of experimental error. We found iterative deconvolution to be most reliable when active molecules were assigned to the same subset in early rounds. Reliability was reduced only slightly when active molecules were assigned randomly to all subsets. Iterative deconvolution with as many as 65 536 compounds per subset did not drastically reduce the reliability compared to one-at-a-time testing. Pooling strategies compared using this theoretical model are compared experimentally in an accompanying paper. Introduction Advances in chemical synthesis of combinatorial libraries have enabled preparation of an unprecedented number of novel compounds for drug screening.1-7 Synthesis and testing of mixtures of compounds in a combinatorial library offer the potential of much greater throughput than the “one compound, one well” approach. When mixtures of compounds are screened, however, a “deconvolution” method must be used to determine which molecule(s) in the library is responsible for the activity. Iterative deconvolution begins by dividing the library into nonoverlapping subsets. The subsets are tested separately, and the one with greatest activity is identified. The compounds in the most active subset are divided into a new set of subsets and retested for activity. The process of dividing the most active subset into smaller subsets for retesting is continued until a unique molecule is identified. There are many ways of organizing subsets or “pooling strategies” for iterative deconvolution. Most common is pooling by fixed position where, at each round, molecules with a common functionality at a single position are grouped together. For example, in round 1, the subsets could consist of all molecules, NXN, where X is a single functionality unique to each subset and N is an equimolar mixture of all the functionalities at that position. In subsequent rounds, additional positions would be fixed. Fixed position pooling has been used for libraries synthesized using a variety of chemistries (Figure 1), and iterative deconvolution strategies have been used to identify a single active compound from a mixture.8-20 Iterative strategies based on pooling by fixed position can differ * Corresponding author: 619-603-2345 (phone), 619-431-2768 (fax), [email protected] (internet). † ISIS Pharmaceuticals. ‡ University of Colorado. § Present address: Department of Microbiology, Southern Illinois University, Carbondale, IL 62901. X Abstract published in Advance ACS Abstracts, June 15, 1996.

S0022-2623(96)00168-9 CCC: $12.00

Figure 1. Examples of pooling by fixed position for libraries of (a) N-methyl peptides,48 (b) phosphoryl-linked compounds,41 and (c) (mercaptoacyl)proline derivatives.49 Each subset consists of a single fixed functionality at one or more positions (X1, X2) and an equimolar mix of several functionalities at the other positions (R1, R2, R3).

from one another in which positions are fixed at each round (order of deconvolution) or the number of positions fixed at each round. Iterative deconvolution does not require fixed position pooling; any method for dividing the compounds into nonoverlapping subsets is allowed, although it may not always be feasible. We have previously used computer simulations to ask if the deconvolution procedure can identify the most active molecule(s) in a combinatorial library and to determine the effects of deconvolution order and experimental error on the outcome.21 We found that even when the library contained many molecules with suboptimal activity, iterative deconvolution almost always selected the most active molecule in the library. When reasonable experimental error was included in the © 1996 American Chemical Society

Theoretical Evaluation of Pooling Strategies

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14 2711

Figure 2. Pooling strategies for a library of 27 compounds. Each molecule in this hypothetical library contains three substituents. Each substituent is chosen from one of three functionalities. Activities were assigned randomly to the 27 molecules; a larger, bolder font indicates a more active molecule. The 27 molecules were assigned to three subsets using five different strategies. (a) Pooling by fixed position: The molecules were assigned to subsets according to the functionality in the second position. (b) Random pooling: Molecules were randomly assigned to subsets. (c) Hard pooling: The most active compound (cfg) was assigned to a subset with the eight least active molecules. The nine molecules with second through tenth best activity were grouped together in a single subset. (d) Easy pooling: The nine most active molecules were assigned to a single subset, and the remaining activity was divided equally among the remaining subsets. (e) Dealing cards: The molecules were ranked by activity and dealt alternately into the three subsets.

simulations, a molecule with activity nearly as great as that of the most active compound was usually selected. In this report, we have extended these computer simulations to include a large variety of pooling strategies (Figure 2). Using two different library-target pairs with very different affinity profiles, we found that iterative strategies using nonoverlapping subsets were quite successful at identifying one of the most active molecules in the library. Simulations which included experimental error suggested pooling strategies that concentrated active molecules in a single subset were somewhat more reliable than strategies that separated the most active molecules into different subsets in the early rounds. In an accompanying paper we report experimental evaluation of several of the pooling strategies using a library of 810 chemically synthesized compounds and an in vitro assay for inhibition of phospholipase A2 (PLA2). Results Characterization of Two Molecular Landscapes. RNA hybridization was chosen as the molecular interaction for our model because calculations based upon experimentally determined parameters can accurately predict the association constants of very large numbers of molecules.22 We were able to calculate the binding affinity for several hundred thousand different molecules for a specific target to create a “molecular landscape”.23 We created two different landscapes by selecting two RNA targets, a 9-mer and a 6-mer, and a library of all possible 262 144 RNA 9-mers. The energy profiles for these two library-target pairs have been previously described in detail.21 We call the two landscapes A (9-mer target) and B (6-mer target). Activities are reported relative to that of the most active compound in the library. Thus the most active compound has an activity of 1.0. Any activity greater than 0.2 is called “good” activity.

Figure 3. Cumulative frequency distribution of activities for landscapes A and B. Activity is plotted relative to the most active molecule in the library. The appearance of “steps” in the profile for landscape A is due to the low number of compounds with high activity. For example, the curve is flat between 1 and 0.5 because there were no compounds in the library with activity between that of the best and 0.5 times that of the best.

Figure 3 plots the cumulative frequency distribution for activities in these two molecular landscapes. Landscape A contained relatively few compounds with significant activity. There were two molecules with the highest affinity (“best” molecules) and only 12 molecules (0.005% of the library) with good activity. Almost all of the molecules (98.7%) had activities less than 10-5 relative to the best. In contrast, landscape B contained many compounds with significant activity. There were 16 molecules with the best activity and 2414 molecules (0.9%) with good activity. Only 37% of the molecules had activities less than 10-5. Relatively few molecules contributed to activity of the library in landscape A; the library was only 8 times more active than it would be if only one molecule were active. In contrast, many molecules contributed to activity of the library in

2712

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14

Konings et al.

Table 1. Comparison of Pooling Strategies in the Absence of Experimental Errora activity of selected compound (relative to best) pooling strategy

landscape A

landscape B

fixed positionb random poolingc hard easy dealing

1.00 ( 0.00 0.93 ( 0.19 0.04 1.00 1.00

0.80 ( 0.19 0.71 ( 0.16 0.52 1.00 1.00

a The specified strategy was used to divide the library into four subsets in each of nine rounds. b Reported value is average activity (and standard deviation) from 500 different orders of deconvolution. c Reported value is average activity (and standard deviation) from 500 different random poolings.

landscape B; the library was more than 2000-fold more active than it would be if only one molecule were active. It is important to realize that, once generated, the binding energy profile for each landscape simply represents a series of affinities of molecules for a target which could be a model for any macromolecular interaction. The two landscapes described here had distinctly different profiles which simulate a fairly broad range of binding interactions.21 These two distinct molecular landscapes provided us with two models in which to test various deconvolution strategies. We asked what was the likelihood that each strategy would identify the most active molecule in the library, and, if the most active molecule was not selected, what was the likelihood that the selected molecule would have good activity. Effect of Deconvolution Order on Reliability of Iterative Deconvolution. The library of all RNA 9-mers contained 262 144 molecules. For this library, typical iterative deconvolution consisted of nine rounds of synthesis and testing with four subsets in each round. The first strategies evaluated were “pooling by fixed position”. In this strategy, at each round, the functionalities at one or more positions were fixed and the other positions were randomized. Subsets characterized by fixed position pooling are typically synthesized using either competitive coupling of monomer mixtures10,24-27 or split bead28-30 techniques. These synthetic approaches allow any position in the molecule to be fixed in each round. We wanted to asked if the likelihood of success depended on the order in which positions were fixed (order of deconvolution). To test whether order of deconvolution affected the activity of the selected sequence, we simulated deconvolutions using different deconvolution orders. Details of the simulation procedure are described in the Experimental Section. Briefly, compounds were assigned to subsets by fixed position. Calculated activity of each subset was the mean activity of all the compounds in the subset. In the absence of experimental error, all orders of deconvolution resulted in selection of the most active molecule for landscape A. For landscape B, the selected molecule had either the best activity or good activity (Table 1).21 To assess the effect of deconvolution order in the presence of assay error, we performed 500 simulations of deconvolution for several orders of deconvolution incorporating a 2-fold Monte-Carlo error in the activity of each subset. In practice such an error would occur if there were error in the experimental measure of enzyme inhibition or subset concentration. Details of the Monte-

Figure 4. Distribution of activities selected during simulations of iterative deconvolution using different orders of deconvolution for landscapes A and B. Order of deconvolution lists the position fixed in each successive round: 573468291, 123456789, 957846321, 231468759, 264819735, and 564738291. The simulations with random order used a different, randomly selected order of deconvolution for each simulation. The deconvolution orders shown were selected to represent different strategies for deconvolution. We tested 123456789 because it is the order most easily accomplished by split bead synthesis and 564738291 because it fixes positions in the middle of the molecule first. The order 957846321 was tested for landscape A because positions 9, 5, and 7 were least “important”. Substitutions at these positions were tolerated better than substitutions at other positions. Other deconvolution orders were selected at random. Two-fold Monte-Carlo error in subset activity was included in the simulations.

Carlo procedure are described in the Experimental Section. Due to the 2-fold error, different simulations resulted in selection of different molecules. Figure 4 plots the percent of simulations resulting in selection of a molecule with each activity. This activity profile of selected molecules represents the success rate for the simulations and is a measure of the reliability of each pooling strategy. For landscape B, deconvolution order had very little effect on reliability. For all deconvolution orders tested, only about 1% of the simulations resulted in selection of the most active molecule, but 97% of the simulations resulted in selection of a molecule with good activity. For landscape A, roughly one-half of the simulations resulted in selection of the most active molecule, and approximately 90% of the simulations resulted in selection of a good binder. Reliability was affected slightly by deconvolution order. The order 957846321 was least successful, and 123456789 was most successful among those tested. Figure 5 compares the deconvolution profiles for these 2 orders of deconvolution. In the early rounds, for the less successful deconvolution (panel A), more than one subset showed substantial activity. In the later rounds, virtually all the activity was concentrated in a single subset. With 2-fold assay error, selection of a suboptimal subset was more likely in the early rounds and unlikely in the later rounds. In panel

Theoretical Evaluation of Pooling Strategies

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14 2713

Figure 5. Partitioning of library activity into subsets during deconvolution using deconvolution order 957846321 (a) or 123456789 (b) for library landscape A. At each round, the activity of each subset is expressed as a percentage of the combined activity for all subsets in that round. No Monte-Carlo error was included in calculation of subset activities.

B, the opposite effect was observed. In the early rounds a single subset contained most of the activity, and selection of a suboptimal subset was unlikely. In the later rounds more than one subset contained a significant fraction of the activity, and selection of a suboptimal subset was more likely. In the presence of 2-fold assay error, the deconvolution order in panel A was less successful than that in panel B. Our simulations suggest that keeping active molecules together in the early rounds and separating them during the later rounds is more successful than separating them during the early rounds. It should be noted that the deconvolution orders plotted in Figure 5 were selected because they demonstrated the two extremes of separating active molecules early or keeping active molecules together early. Most deconvolution orders fell between these two extremes. In general, deconvolution order had little overall effect on reliability. Effect of Fixing More than One Position per Round. A modification of the strategy described above is fixing more than one position per round. Fixing two positions in the first round has been reported for iterative deconvolution of peptide libraries31 and an oligonucleotide library.9 In our system, if three positions were fixed in each round, there would be three rounds of synthesis and screening with 64 subsets per round instead of nine rounds with four subsets per round. The ultimate extreme is fixing nine positions in a single round. Each of the 262 144 compounds is tested individually; no pooling is involved. Figure 6 compares the reliability when one, three, or nine positions were fixed during each round. Fixing nine positions per round (no pooling) was included because it represents the best reliability that can be obtained. For both landscapes, reliability improved very slightly when three positions were fixed each round. As expected, the most successful strategy was testing unique compounds. The less than perfect reliability observed when unique compounds were tested resulted from the 2-fold error incorporated into the simulations. Even when unique compounds were tested, a compound with suboptimal activity was sometimes selected because the 2-fold assay error resulted in it appearing more active than the most active compound. Comparison of Fixed Position Pooling to Random Pooling. In all the pooling strategies described

Figure 6. Distribution of activities selected during simulations of iterative deconvolution for landscapes A and B. Fixing one position per round resulted in nine rounds with four subsets per round. When three positions were fixed in each round, there were three rounds with 64 subsets per round. Fixing nine positions in a single round is equivalent to testing 262 144 compounds as individual molecules; no pooling is involved. A different, randomly selected order of deconvolution was used for each simulation. Two-fold Monte-Carlo error in subset activity was included in the simulations.

above, subsets had a single functionality at one or more positions and an equimolar mixture of all functionalities at the other positions. Such subsets, described by their fixed positions, have been used frequently in iterative deconvolutions because they can be synthesized as mixtures. We compared these fixed position poolings to other pooling strategies to ask if reliability could be improved using another pooling strategy. The first strategy evaluated was random pooling. In each simulation, the 262 144 molecules were randomly assigned to subsets in round 1; the compounds from the most active round 1 subset were randomly divided into subsets in round 2, and the process was continued until

2714

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14

Figure 7. Distribution of activities selected during simulations of iterative deconvolution for landscapes A and B. Each of the five pooling strategies described in Figure 2 was used to divide the library into four subsets in each of nine rounds: pooling by fixed position using 500 different orders of deconvolution, random pooling, hard pooling, easy pooling, and dealing cards. Two-fold Monte-Carlo error in subset activity was included in the simulations. Results of simulations of testing compounds one-at-a-time in the presence of 2-fold error are also reported (s).

unique molecules were compared in the final round. In the absence of experimental error, for landscape A, random pooling usually identified the most active molecule (Table 1). In the 12% of the simulations when random deconvolution did not identify the best compound, most of the compounds with good activity were together in one subset, and the two best compounds were each grouped with molecules with poor activity. Effectively, the less successful random poolings approached hard pooling (see below). With landscape A, in the presence of error in the activity assay, random pooling was slightly less successful than pooling by fixed position (Figure 7). This is due to the fact that random pooling is more likely than fixed position pooling to separate activity into different subsets during early rounds. Other Pooling Strategies. Pooling strategies examined above suggest that pooling strategy has only a small effect on reliability. In some cases slightly improved reliability was associated with keeping active molecules together in early rounds as much as possible. To examine the most extreme effects that pooling strategy could have on the reliability of iterative deconvolution, we evaluated the pooling strategies diagrammed in Figure 2. Fixed position pooling and random pooling have been described above. “Hard” pooling was designed to make identification of the best binder as difficult as possible. In this strategy the best binder(s) was put in the first subset. All the other molecules in that subset were the least active molecules in the library. The next best binders were grouped together in the second subset. The next best binders were grouped together in the third subset, and the

Konings et al.

process was continued until all the molecules were assigned to subsets. After identification of the most active subset in each round, the same “hard” strategy was used to divide the molecules from that winning subset into subsets for the next round. The only molecule in the first subset with significant activity was the best binder(s), while there were many molecules in the second subset with good activity. We expected, with this strategy, in every round, the likelihood of subset 2 being more active than subset 1 would be high and thus the likelihood of identifying the best binder would be reduced. “Easy” pooling was designed to make identification of the best binder as easy as possible. In this strategy all the most active compounds were assigned to subset 1, and the activity of the other compounds was divided by dealing into the remaining subsets. After each round, the same “easy” strategy was used to divide the molecules from the winning subset into subsets for the next round. This strategy was designed to maximize the difference in activity between subset 1 and the other subsets in all rounds. Subset 1 contained the best binder, so we expected this strategy would maximize the likelihood of identifying the best binder. Of course, in a real pooling experiment, activities of all individual molecules are not known, and thus easy and hard poolings are not real possibilities. In addition, in a library of more than a few compounds, it is essentially impossible that either easy or hard pooling would occur at random. These strategies were examined, however, because they represent the worst and best possible results for iterative deconvolution. The final strategy that was tested was designed to spread the active compounds among all subsets as equally as possible. The molecules were ranked in order of activity and dealt, one-at-a-time, back-and-forth, successively into the subsets. This “dealing cards” procedure was continued through all rounds. We expected this strategy to result in similar activities for all subsets. Consequently, when coupled with error in the activity assay, this strategy could increase the likelihood of identifying a molecule with suboptimal activity. Table 1 compares the results for each of these five strategies in the absence of assay error. For both landscapes, both “easy” and “dealing cards” strategies resulted in identification of the most active molecule and “hard” pooling identified a suboptimal binder. When hard pooling was used, the second subset was selected in several rounds. Each time the second subset was selected, the most active molecules were discarded with the less active first subset and, therefore, a suboptimal binder was selected. Figure 7 plots the results of simulations for each pooling strategy in the presence of 2-fold error in the activity assay. For both landscapes, easy pooling was a bit more successful than the other strategies and hard pooling was significantly less successful than the other strategies. Dealing cards was about as successful as pooling by fixed position or random pooling. Comparison of Iterative Deconvolution to Position Scanning. Position scanning14,25,32 is a noniterative technique which has been used with peptide libraries. A set of mixtures is synthesized for each position of the compound, and a single position is fixed in each subset. The most active compound is deduced

Theoretical Evaluation of Pooling Strategies

Figure 8. Distribution of activities selected during simulations of iterative deconvolution using pooling by fixed position, hard pooling, or position scanning for landscapes A and B. Two-fold Monte-Carlo error in subset activity was included in the simulations.

by selecting the functionality from the most active subset at each position. Results of position scanning simulations for this library have been reported previously.21 In the absence of error, position scanning identified the most active molecule for landscape A. For landscape B, position scanning identified a molecule with poor activity (0.04 relative to the best). Failure of position scanning to select the best binder with landscape B was a result of multiple binding alignments with similar energies. For example, GAUGCCCAA had the best activity, and CGCCCAAUC was 0.85 times as active. The selected compound GGCCCCCAA was a consensus between these two alignments of the preferred pharmacaphore GCCCAA and had poor activity. Figure 8 plots the results of position scanning when 2-fold error in the activity assay was included in the

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14 2715

simulations. Plots for fixed position pooling and hard pooling are included for comparison. For landscape B, the iterative strategies identified a compound with good activity more than 98% of the time. In contrast, position scanning identified a compound with good activity less than 20% of the time. For landscape A, position scanning identified a compound with good activity more often than hard pooling. The failure of position scanning was that 23% of the time a compound with activity less than 0.001 was identified. Even hard pooling, which represents the worst possible case for iterative deconvolution, always identified a compound with activity g0.001. Thus the least active compounds identified by position scanning were much less active than the least active compounds identified by even the least successful iterative strategy. The relatively poor reliability of position scanning with landscape A in the presence of assay error was due to multiple pharmacaphore alignments contributing to library activity. The best binder in the library was GCCCACACA, and many of the most active molecules differed from this by one or two substitutions. There was a second family of active molecules with a different alignment, the most active of which was CGCCCACAC with activity of 0.08. Overall, 80% of the library activity was due to molecules with the same alignment as GCCCACACA, and 16% was due to molecules in the CGCCCACAC family. Figure 9A plots the activities calculated for each position scanning subset in the absence of error. At many fixed positions, a single subset was clearly the most active and would likely be identified, even in the presence of assay error. At positions 5, 7, and 9, however, several subsets had comparable activity, so, in the presence of 2-fold error, suboptimal subsets could be selected. For example, at position 1, G was selected more than 99% of the time, but at position 5, A was selected only 72% of the time, and C was selected 10% of the time. As a result, the molecules GCCCCCACA and GCCCCCACG were identified in 6% of the simulations. These molecules contain monomers from both active alignments; both had activities less than 3 × 10-5. In preliminary studies, we modified the activities of the molecules in landscape A so only molecules with the

Figure 9. Activity profile for position scanning for landscapes A and B in the absence of experimental error. Subset activities are reported for each monomer at each fixed position with all other positions randomized. Activities were normalized so that the sum of activities for the four subsets at each fixed position was 100%.

2716

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14

same alignment as CGCCCACAC contributed to library activity. With this modified landscape, iterative deconvolution and position scanning had similar success rates providing further evidence that the poor reliability of position scanning with our library was due to multiple pharmacaphore alignments contributing to library activity. For landscape B several alignments contributed almost equally to the library activity. Thus the alignment problem was even greater. At several fixed positions, many position scanning subsets had similar activities (Figure 9B). In the presence of 2-fold assay error, the selected sequence often contained monomers from multiple alignments and had very poor activity leading to a poor reliability for position scanning with landscape B. Effect of Following More than One Subset. For the simulations described above, only the most active subset in each round was followed into the next round. In practice, if another subset has activity near that of the best, it may be sensible to deconvolute both subsets. Although this approach can require substantially more synthesis and testing than following only one subset each round, multiple subsets have been followed by many investigators using both iterative and position scanning strategies.25,27,33-36 To simulate this approach for iterative deconvolution, calculations were repeated as above except, in each round, all subsets with activity within 4-fold of the best were taken into the next round. A cutoff of 4-fold was selected because, with a 2-fold standard deviation in the Monte-Carlo simulation, it represents a 95% confidence interval for the most active subset. For each simulation, the activity of the selected sequence and the number of subsets required to identify that sequence were calculated. Figure 10 plots the results of such simulations. For landscape A, when all subsets within 4-fold of the best were followed into the next round, the reliability of iterative deconvolution with fixed position pooling or random pooling was comparable to that of one-at-a-time testing. Hard pooling was less successful. Typically 70-125 subsets were tested compared to testing of 36 subsets when only the best subset was followed into the next round. This approach was also applied to position scanning on landscape A. At each position, monomers were selected if their activity was within 4-fold that of the best, and all unique compounds composed of these monomers were tested. With this modification, the reliability of position scanning was greater than that of one-at-a-time testing (Figure 10). Typically 50-150 unique compounds were tested. This approach was also evaluated for position scanning on landscape B. With this modification, the reliability of position scanning was much greater than that observed when only one compound was tested (Figure 8B) and comparable to that of one-at-a-time testing (Figure 10). This approach typically required synthesis and testing of 2000-14 000 unique compounds. Our attempts to quantitatively evaluate this approach for iterative deconvolution of landscape B failed. Even in the absence of Monte-Carlo error, so many subsets were tested that the memory capacity of the computer was exceeded. Thus we are only able to say when this approach was applied to iterative decon-

Konings et al.

Figure 10. Distribution of activities selected during simulations of deconvolution using fixed position pooling with 200 different orders of deconvolution, random pooling, hard pooling, or position scanning for landscapes A and B. For the iterative strategies, four subsets were tested in round 1. In rounds 2-9, each subset from the previous round with activity within 4-fold that of the best was divided into four more subsets for testing. The position scanning simulations employed a similar approach of following all subsets with activity within 4-fold that of the best as described in the text. Twofold Monte-Carlo error in subset activity was included in the simulations. Results of simulations of testing compounds oneat-a-time in the presence of 2-fold error are also reported (s).

volution of landscape B, synthesis and testing of more than 250 subsets were required. Discussion Testing of molecules as mixtures greatly enhances the rate at which compounds in combinatorial libraries can be screened. The risk exists, however, that the most active compound will not be identified. The pooling and deconvolution strategy used should be designed as much as possible to minimize this risk. We have used a model system to evaluate a variety of pooling strategies. The reliability of each pooling strategy was measured by the activity profile of the selected molecules. The results suggested that order of deconvolution has little effect on the likelihood of success. A small increase in reliability was observed when the positions that were fixed early were “important”. An “important” position is defined as one where substitution of a nonoptimal functionality has a very detrimental effect on activity. Thus fixing important positions during the early rounds separates active molecules from inactive molecules and keeps active molecules together until the later rounds. Unimportant or “replaceable” residues have been evaluated for a pentapeptide binding to antibodies,12 and iterative deconvolution was more successful when an important position was fixed in the first round than when an unimportant position was fixed in the first round.37 It might be argued that these simulations showed only a small effect of deconvolution order because all positions were equally important. This was not the case. For example, for landscape B, a single substitu-

Theoretical Evaluation of Pooling Strategies

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14 2717

tion at position 4 reduced activity of the best compound to 1.6 × 10-5. In contrast, a single substitution at position 8 resulted in an activity of 0.63. Thus the importance of positions in this library differed by more than 104-fold. Although our results suggested important positions should be fixed first, it is often not known which positions are important. Random pooling and dealing cards were tested to examine the reliability of iterative deconvolution when no information was available about the activity profile and active molecules were not kept together in early rounds. The results with random pooling were somewhat less reliable than fixed position pooling; pooling by fixed position keeps active molecules together a bit more than random pooling. Pooling by dealing cards was designed to systematically distribute the active molecules into all subsets. The reliability for dealing cards was similar to that for random pooling because random pooling, on average, also distributes the activity equally into all the subsets. An improvement in success was observed when there were more subsets per round and fewer rounds. The modest difference between curves in Figure 6 demonstrates that pooling even as many as 65 536 compounds per subset did not drastically reduce the reliability compared to one-at-a-time testing. The slight increase in reliability with fewer subsets may not be worth the increased effort involved. For example, with our 9-mer library, fixing three positions per round would require synthesis and testing of 192 (3 rounds × 64 subsets per round) samples compared to only 36 (9 × 4) samples if one position is fixed each round. If subsets are prepared using manual mixing and splitting of beads at each synthesis position, then the small increase in success rate observed in Figure 6 when three positions were fixed per round would not be worth the increased synthetic effort required. The effort required for strategies using fewer rounds and more subsets per round can be reduced if robotic synthesis38-41 and high throughput screening are employed. Libraries can be divided into thousands of subsets which should increase the likelihood of identifying the most active compound in the library. In addition to an increased likelihood of success, fewer rounds with more subsets per round result in fewer compounds per subset in the first round so the concentration of each individual compound in the subset will be greater. Thus a practical consequence of more subsets in the first round is increased subset activity for the subset containing the most active compound. If identification of the most active subset in round 1 is limited by assay sensitivity, then more subsets and fewer compounds per subset should improve the likelihood of finding an active compound in the library. To estimate how much pooling strategy could help or hurt reliability we examined two extreme cases: easy and hard pooling. We found that the reliability of easy pooling was a bit greater than pooling by fixed position or random pooling. Hard pooling was substantially less reliable, especially for landscape A. Fortunately, this result is highly unlikely. The poor reliability of hard pooling occurred only when the most active molecule(s) was intentionally and repeatedly put in a subset with inactive molecules and the remaining active molecules were grouped together in another subset. Practical

implementation of easy or hard pooling would require knowledge of the activities of all the molecules in the library, and if this information were available, deconvolution would be unnecessary. Easy and hard pooling do, however, represent the extremes in reliability that can result from different pooling strategies. Position scanning was less successful than iterative deconvolution. When position scanning was unsuccessful, a molecule which was a combination of active molecules from two different alignments was typically selected. Activity of this blended molecule was much worse than the activity of the active molecule from either alignment. This phenomenon did not occur with iterative deconvolution resulting in a higher reliability for iterative deconvolution. Selection of an inhibitor of antibody binding from a library of hexapeptides25 provides an experimental example of the effect of multiple alignments on position scanning. Two hexapeptides with two different alignments, Ac-YPYPNL-NH2 and Ac-PYPNLS-NH2, are potent inhibitors of binding of a monoclonal antibody.42 When position scanning was applied to this library, multiple subsets were active for some fixed positions.25 Twelve sequences were selected for synthesis and screening as unique compounds. One-half of these, including Ac-PYPPLL-NH2, represented a combination of the two active alignments and had poor activity. In this peptide example, judicious selection of more than one monomer at some positions allowed identification of a correctly aligned compound with good activity.25 A similar approach also improved the reliability in our system. For landscape A, two or three monomers typically showed activity within 4-fold that of the best at three to six of the nine positions. Subsequent testing of the 50-150 unique compounds defined by these monomers resulted in a large improvement in reliability compared to testing of the single compound defined by the best monomer at each position (compare triangles in Figure 10A to circles in Figure 8A). Reliability of iterative strategies also improved when more than one subset was followed into subsequent rounds (Figure 10A). For landscape A, this iterative approach required synthesis and testing of 70-125 subsets. Synthesis of compound mixtures is almost always more difficult than synthesis of single compounds. Thus, position scanning following all monomers with activity near that of the best may be the most efficient strategy for a landscape similar to landscape A. For landscape B, the reliability of position scanning also improved when multiple monomers were pursued. This improvement required synthesis and testing of 2000-14 000 compounds and is therefore impractical. Following more subsets would likely also improve the reliability of iterative deconvolution for landscape B. Unfortunately, the large number of subsets required made even simulation of this strategy impossible. Thus, for landscapes similar to landscape B, following more than one subset is impractical and iterative strategies following only the best subset appear to be most efficient. When searching for lead compounds from chemical libraries, the most important question is whether or not the selected compound will have activity sufficient for optimization to a drug candidate. These simulations suggest that most iterative deconvolution procedures are likely to identify the most active compound in a chemical

2718

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14

Konings et al.

library or at least a compound with activity near the best. They do not, however, address the activity of this most active compound. We have previously reported guidelines for predicting activity of the final selected compound using the suboptimal binding factor.21 Values of the suboptimal binding factor reported previously21,23 and those from more recently published deconvolutions27,36,43,44 can be used to predict activity of the most active compound. In summary, we used nucleic acid hybridization to model deconvolution of combinatorial libraries and to test pooling strategies. The model system provided a rapid and inexpensive method for exploring a large number of deconvolution strategies. The simulations suggested iterative deconvolution is most successful when the most active molecules are pooled together during early rounds. Reliability was reduced only slightly when active molecules were assigned randomly to all subsets. Clearly the merits of different deconvolution approaches depend on the relative activities of the compounds in the libraries. Thus validity of these predictions for libraries, such as those in Figure 1, depends on the similarity between activity distributions in our model systems and those in real libraries. We used two library-target pairs which differ greatly in the distribution of activities in the library. The activity profiles of these two systems span those of many combinatorial libraries.3,21 In addition, the model used does not assume additivity between monomer units45,46 and thus does not oversimplify molecular interactions. We believe, therefore, that this model is representative of many types of libraries. In an accompanying paper we provide an experimental comparison of several of the pooling strategies using a library of 810 chemically synthesized compounds and an in vitro assay for inhibition of PLA2. The experimental results support the predictions of the theoretical model that nearly all strategies of iterative deconvolutions are successful. In the experimental studies, only hard pooling failed to identify the most active compound.41

standard Monte-Carlo techniques.47 Typically 500 simulations were performed for each set of conditions. When more than one subset was followed, only 100-200 simulations were performed. Reproducibility of the results of Monte-Carlo simulations was assessed by performing two sets of 500 simulations using identical conditions. When the two sets of simulations were compared, percent selected at each activity (see, for example, Figure 4) differed by 2% or less at all activities. Thus, we estimate the inaccuracy in percent selected during our simulations to be 2% or less.

Experimental Section Simulations of Pooling and Deconvolution. The targets for landscapes A and B were respectively 5′-GUGUGGGCA3′ and 5′-UGGGCA-3′. Methods for calculation of free energies for library sequences binding to target RNA have been described previously.21 We define the activity of a molecule as the reciprocal of the concentration needed to bind 50% of the target molecules:

activity ) 1/IC50 ) KA ) exp (-∆G°37/(RT)) where KA is the association constant for the molecule, -∆G°37 is the binding free energy, R is the gas constant (0.001 987 kcal/mol/K), and T is temperature (310.15 K). Pooling strategies were simulated by dividing the library into subsets according to each pooling scheme. Activities of each subset were calculated as the average activity of the compounds in the subset.21,27 This calculation assumes no synergism or antagonism between compounds within a subset. A result of this averaging procedure is that the reciprocal of the activity of a mixture is the total concentration of compounds in the mixture needed for 50% binding. Two-fold experimental error in subset activity was included in the simulations by assuming the observed activities had a log normal distribution about the true activity. We assumed log(activity) had a normal distribution with a mean equal to log(true activity) and a standard deviation equal to log(2). Observed activities for each subset were generated using

Acknowledgment. The authors thank Dr. C. Pinilla for valuable discussions. Computing resources were generously provided by the University of Colorado. References (1) Gallop, M. A.; Barrett, R. W.; Dower, W. J.; Fodor, S. P. A.; Gordon, E. M. Applications of combinatorial technologies to drug discovery. 1. Background and peptide combinatorial libraries. J. Med. Chem. 1994, 37, 1233-1251. (2) Gordon, E. M.; Barrett, R. W.; Dower, W. J.; Fodor, S. P. A.; Gallop, M. A. Applications of combinatorial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening strategies, and future directions. J. Med. Chem. 1994, 37, 13851401. (3) Terrett, N. K.; Gardner, M.; Gordon, D. W.; Kobylecki, R. J.; Steele, J. Combinatorial synthesis--The design of compound libraries and their application to drug discovery. Tetrahedron 1995, 51, 8135-8173. (4) Eichler, J.; Houghten, R. A. Generation and utilization of synthetic combinatorial libraries. Mol. Med. Today 1995, 174180. (5) Janda, K. D. Tagged versus untagged libraries: Methods for the generation and screening of combinatorial chemical libraries. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 10779-10785. (6) Houghten, R. A. Soluble Combinatorial Libraries: Extending the Range and Repertoire of Chemical Diversity. Methods: Companion Methods Enzym. 1994, 6, 354-360. (7) Pinilla, C.; Appel, J.; Blondelle, S.; Dooley, C.; Dorner, B.; Eichler, J.; Ostresh, J.; Houghten, R. A. A Review of the Utility of Soluble Peptide Combinatorial Libraries. Biopolymers (Pept. Sci.) 1995, 37, 221-240. (8) Ecker, D. J.; Vickers, T. A.; Hanecak, R.; Driver, V.; Anderson, K. Rational screening of oligonucleotide combinatorial libraries for drug discovery. Nucleic Acids Res. 1993, 21, 1853-1856. (9) Wyatt, J. R.; Vickers, T. A.; Roberson, J. L.; Buckheit, R. W., Jr.; Klimkait, T.; DeBaets, E.; Davis, P. W.; Rayner, B.; Imbach, J. L.; Ecker, D. J. Combinatorially selected guanosine-quartet structure is a potent inhibitor of human immunodeficiency virus envelope-mediated cell fusion. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 1356-1360. (10) Geysen, H. M.; Rodda, S. J.; Mason, T. J. A priori delineation of a peptide which mimics a discontinuous antigenic determinant. Mol. Immunol. 1986, 23, 709-715. (11) Blake, J.; Litzi-Davis, L. Evaluation of peptide libraries: An iterative strategy to analyze the reactivity of peptide mixtures with antibodies. Bioconjugate Chem. 1992, 3, 510-513. (12) Geysen, H. M.; Rodda, S. J.; Mason, T. J.; Tribbick, G.; Schoofs, P. G. Strategies for epitope analysis using peptide synthesis. J. Immunol. Methods 1987, 102, 259-274. (13) Houghten, R. A.; Pinilla, C.; Blondelle, S. E.; Appel, J. R.; Dooley, C. T.; Cuervo, J. H. Generation and use of synthetic peptide combinatorial libraries for basic research and drug discovery. Nature 1991, 354, 84-86. (14) Houghten, R. A.; Appel, J. R.; Blondelle, S. E.; Cuervo, J. H.; Dooley, C. T.; Pinilla, C. The use of synthetic peptide combinatorial libraries for the identification of bioactive peptides. BioTechniques 1992, 13, 412-421. (15) Edmundson, A. B.; Harris, D. L.; Fan, Z.-C.; Guddat, L. W.; Schley, B. T.; Hanson, B. L.; Tribbick, G.; Geysen, H. M. Principles and pitfalls in designing site-directed peptide ligands. Proteins 1993, 16, 246-267. (16) Owens, R. A.; Gesellchen, P. D.; Houchins, B. J.; DiMarchi, R. D. The rapid identification of HIV protease inhibitors through the synthesis and screening of defined peptide mixtures. Biochem. Biophys. Res. Commun. 1991, 181, 402-408. (17) Eichler, J.; Houghten, R. A. Identification of substrate-analog trypsin inhibitors through the screening of synthetic peptide combinatorial libraries. Biochemistry 1993, 32, 11035-11041. (18) Buckheit, R. W., Jr.; Roberson, J. L.; Lackman-Smith, C.; Wyatt, J. R.; Vickers, T. A.; Ecker, D. J. Potent and specific inhibition of HIV envelope-mediated cell fusion and virus binding by G quartet-forming oligonucleotide (ISIS 5320.. AIDS Res. Hum. Retroviruses 1994, 10, 1497-1506.

Theoretical Evaluation of Pooling Strategies (19) Ecker, D. J.; Wyatt, J. R.; Vickers, T. Novel guanosine quartet structure binds to the HIV envelope and inhibits envelope mediated cell fusion. Nucleosides Nucleotides 1995, 14, 11171127. (20) Davis, P. W.; Vickers, T. A.; Wilson-Lingardo, L.; Wyatt, J. R.; Guinosso, C. J.; Sanghvi, Y. S.; DeBaets, E. A.; Acevedo, O. L.; Cook, P. D.; Ecker, D. J. Drug leads from combinatorial phosphorodiester libraries. J. Med. Chem. 1995, 38, 4363-4366. (21) Freier, S. M.; Konings, D. A. M.; Wyatt, J. R.; Ecker, D. J. Deconvolution of Combinatorial Libraries for Drug Discovery: A Model System. J. Med. Chem. 1995, 38, 344-352. (22) Turner, D. H.; Sugimoto, N.; Jaeger, J. A.; Longfellow, C. E.; Freier, S. M.; Kierzek, R. Improved parameters for prediction of RNA structure. Cold Spring Harb. Symp. Quant. Biol. 1987, 52, 123-133. (23) Kauffman, S. A. The Origins of Order, Self-Organization and Selection in Evolution; Oxford University Press: Oxford, U.K., 1993. (24) Carell, T.; Wintner, E. A.; Bashir-Heshemi, A.; Rebek, J., Jr. A novel procedure for the synthesis of libraries containing small organic molecules. Angew. Chem., Int. Ed. Engl. 1994, 33, 20592061. (25) Pinilla, C.; Appel, J. R.; Blanc, P.; Houghten, R. A. Rapid identification of high affinity peptide ligands using positional scanning synthetic peptide combinatorial libraries. BioTechniques 1992, 13, 901-905. (26) Smith, P. W.; Lai, J. Y. Q.; Whittington, A. R.; Cox, B.; Houston, J. G.; Stylli, C. H.; Banks, M. N.; Tiller, P. R. Synthesis and Biological Evaluation of a Library Containing Potentially 1600 Amides/Esters. A Strategy for Rapid Compound Generation and Screening. Bioorg. Med. Chem. Lett. 1994, 4, 2821-2824. (27) Pirrung, M. C.; Chen, J. Preparation and screening against acetylchoinesterase of a non-peptide “indexed” combinatorial library. J. Am. Chem. Soc. 1995, 117, 1240-1245. (28) Furka, A.; Sebestyn, F.; Asgedom, M.; Dibo, G. General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Pept. Protein Res. 1991, 37, 487-493. (29) Lebl, M.; Krchna´k, V.; Sepetov, N. F.; Seligmann, B.; Strop, P.; Felder, S.; Lam, K. S. One-bead-one-structure combinatorial libraries. Biopolymers 1995, 37, 177-198. (30) Stankova´, M.; Issakova, O.; Sepetov, N. F.; Krchna´k, V.; Lam, K. S.; Lebl, M. Application of one-bead one-structure approach to identification of nonpeptidic ligands. Drug Dev. Res. 1994, 33, 146-156. (31) Dooley, C. T.; Chung, N. N.; Schiller, P. W.; Houghten, R. A. Acetalins: Opioid receptor antagonists determined through the use of synthetic peptide combinatorial libraries. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 10811-10815. (32) Dooley, C. T.; Houghten, R. A. The use of positional scanning synthetic peptide combinatorial libraries for the rapid determination of opioid receptor ligands. Life Sci. 1993, 52, 1509-1517. (33) Houghten, R. A.; Dooley, C. T. Biorg. Med. Chem. Lett. 1993, 3, 405-412 (abstract). (34) Blondelle, S. E.; Takahashi, E.; Weber, P. A.; Houghten, R. A. Identification of antimicrobial peptides by using combinatorial libraries made up of unnatural amino acids. Antimicrob. Agents Chemother. 1994, 38, 2280-2286. (35) Pinilla, C.; Appel, J. R.; Houghten, R. A. Synthetic peptide combinatorial libraries (SPCLs.:identification of the antigenic determinant of B-endorphin recognized by monoclonal antibody 3E7. Gene 1993, 128, 71-76.

Journal of Medicinal Chemistry, 1996, Vol. 39, No. 14 2719 (36) Terrett, N. K.; Bojanic, D.; Brown, D.; Bungay, P. J.; Gardner, M.; Gordon, D. W.; Mayers, C. J.; Steele, J. The Combinatorial Synthesis of a 30,752-compound Library: Discovery of SAR Around the Endothelin Antagonist, FR-139,317. Bioorg. Med. Chem. Lett. 1995, 5, 917-922. (37) Geysen, H. M. Combinatorial peptide libraries: A critical evaluation. Abstract for IBC Conference on Combinatorial Libraries, San Francisco, CA, Aug. 11-12, 1994. (38) Zuckermann, R. N.; Kerr, J. M.; Siani, M. A.; Banville, S. C.; Santi, D. V. Identification of highest-affinity ligands by affinity selection from equimolar peptide mixtures generated by robotic synthesis. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 4505-4509. (39) Zuckermann, R. N.; Kerr, J. M.; Siani, M. A.; Banville, S. C. Design, construction and application of a fully automated equimolar peptide mixture synthesizer. Int. J. Pept. Protein Res. 1992, 40, 497-506. (40) Lashkari, D. A.; Hunicke-Smith, S. P.; Norgren, R. M.; Davis, R. W.; Brennan, T. An automated multiplex oligonucleotide synthesizer: Development of high-throughput, low-cost DNA synthesis. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 7912-7915. (41) Wilson-Lingardo, L. A.; Davis, P. W.; Ecker, D. E.; Hebert, N.; Sprankle, K.; Brennan, T.; Freier, S. M.; Wyatt, J. R. Deconvolution of Combinatorial Libraries for Drug Discovery: Experimental Comparison of Pooling Strategies. J. Med. Chem. 1996, 39, 2720-2726. (42) Appel, J. R.; Pinilla, C.; Houghten, R. A. Identification of related peptides recognized by a monoclonal antibody using a synthetic peptide combinatorial library. Immunomethods 1992, 1, 17-23. (43) Campbell, D. A.; Bermak, J. C.; Burkoth, T. S.; Patel, D. V. A transition state analogue inhibitor combinatorial library. J. Am. Chem. Soc. 1995, 117, 5381-5382. (44) Carell, T.; Wintner, E. A.; Rebek, J., Jr. A solution-phase screening procedure for the isolation of active compounds from a library of molecules. Angew. Chem., Int. Ed. Engl. 1994, 33, 2061-2064. (45) Borer, P. N.; Dengler, B.; Tinoco, I. J.; Uhlenbeck, O. C. Stability of ribonucleic acid double-stranded helices. J. Mol. Biol. 1974, 86, 843-853. (46) Tinoco, I., Jr.; Borer, P. N.; Dengler, B.; Levine, M.; Uhlenbeck, O. C.; Crothers, D. M.; Gralla, J. Improved Estimation of Secondary Structure in Rionucleic Acids. Nature New Biol. 1973, 246, 40-41. (47) Bevington, P. R.; Robinson, D. K. Monte carlo techniques. Data Reduction and Error Analysis for the Physical Sciences; McGraw-Hill: San Francisco, 1992; pp 75-95. (48) Ostresh, J. M.; Husar, G. M.; Blondelle, S. E.; Do¨rner, B.; Weber, P. A.; Houghten, R. A. “Libraries from libraries”: Chemical transformation of combinatorial libraries to extend the range and repertoire of chemical diversity. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 11138-11142. (49) Murphy, M. M.; Schullek, J. R.; Gordon, E. M.; Gallop, M. A. Combinatorial organic synthesis of highly functionalized pyrrolidines: Identification of a potent angiotensin converting enzyme inhibitor from a mercaptoacyl proline library. J. Am. Chem. Soc. 1995, 117, 7029-7030.

JM960168O