Simulated Screens of DNA Encoded Libraries: The Potential Influence

Apr 26, 2016 - Simulated screening of DNA encoded libraries indicates that the presence of truncated byproducts complicates the relationship between l...
0 downloads 8 Views 1MB Size
Subscriber access provided by HOWARD UNIV

Article

Simulated Screens of DNA Encoded Libraries; the Potential Influence of Chemical Synthesis Fidelity on Interpretation of Structure-Activity-Relationships Alexander Lee Satz ACS Comb. Sci., Just Accepted Manuscript • DOI: 10.1021/acscombsci.6b00001 • Publication Date (Web): 26 Apr 2016 Downloaded from http://pubs.acs.org on April 27, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Combinatorial Science is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

ACS Combinatorial Science

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Simulated Screens of DNA Encoded Libraries; the Potential Influence of Chemical Synthesis Fidelity on Interpretation of StructureStructure-ActivityActivity-Relationships

Keywords: DNA encoded, combinatorial chemistry, encoded library, library screening, DEL, split-and-pool, combinatorial libraries, hit ID, DNA conjugates. Alexander L. Satz Roche Innovation Center Basel, Grenzacherstrasse 124, 4070 Basel [email protected], +41616874118

ACS Paragon Plus Environment

ACS Combinatorial Science

2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract

Simulated screening of DNA encoded libraries indicates that the presence of truncated byproducts complicates the relationship between library member enrichment and equilibrium association constant (these truncates result from incomplete chemical reactions during library synthesis). Further, simulations indicate that some patterns observed in reported experimental data may result from the presence of truncated byproducts in the library mixture, and not structure-activity-relationships. Potential experimental methods of minimizing the presence of truncates are assessed via simulation; the relationship between enrichment and equilibrium association constant for libraries of differing purities is investigated. Data aggregation techniques are demonstrated that allow for more accurate analysis of screening results, in particular when the screened library contains significant quantities of truncates.

ACS Paragon Plus Environment

Page 2 of 27

Page 3 of 27

ACS Combinatorial Science

3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction DNA encoded library (DEL) technology allows for facile generation and screening of extremely large numbers of drug-like molecules (1-5). Numerous literature reports illustrate the value of the technology including the discovery of LFA-1 (6), ADAMTS-5 (7), RIP3 Kinase (8), BCATm (9), InhA (10), SIRT1-3 (11), Wip1 phosphatase (12), and sEH (13) inhibitors. DNA encoded libraries (DELs) are commonly synthesized via split-and-pool chemistry (13-16), although a variety of alternative methodologies have been reported (17-21). Regardless of the method of synthesis, the technology results in the generation of complex mixtures of DNA conjugates. Each library member consists of a small-molecule moiety covalently tethered to a DNA sequence. The DNA sequence, or barcode, encodes the protocol for the multi-step chemical synthesis of the attached small-molecule. Generally, the multi-step synthesis consists of aqueous and DNA-compatible reactions which allow access to drug-like small-molecules (2224). It is generally not possible to remove unreacted starting materials or byproducts during library synthesis, or to quantify the yields of the products and byproducts (13-21). Thus, the final population of library members with a particular barcode will include many related but different small-molecules (25). The ratios of the different products is dictated by the synthetic yields at each step of the multi-step synthesis. The existence of these byproducts, all of which possess the same DNA barcode, complicates the interpretation of DEL screening results. The product of a DEL screen is interrogated by high throughput sequencing. The result of sequencing is a list of barcode sequences, and the number of times each barcode was observed (herein referred to as counts) (2,14,26). It is often implied that counts correlates with the equilibrium association constant of the library member encoded by that barcode (2,5,6,7,9,10,13,14). To date, however, only a single instance of a reasonable correlation between counts and equilibrium association constant has been explicitly shown (27). This lack of correlation is generally not discussed (2, 5,10,11,14), is treated as an artefact (17), or ignored by removing perceived outliers or binning of data in an effort to clean the data set (6,7,9). Thus, despite its numerous successes (5-13), interpretation of DEL screening data remains problematic. The result of DEL screens may be visualized as scatter plots displaying the relationship between counts and the building blocks encoded at each cycle of chemistry. For instance, Figure 1 displays the sequencing output of the screen of DEL-A against p38 MAP kinase (14). As reported by Clark et.al. (14), a single building block at cycle-2, 3-amino-4-methyl-Nmethoxybenzamide (AMMB), is required for both enrichment and biochemical activity in follow-up assays. The scatter plot provided in Figure 1 displays only those data points that contain the AMMB building block at cycle-2, and >9 counts. The cut-off value of >9 counts was arbitrarily chosen by raising the cut-off until

ACS Paragon Plus Environment

ACS Combinatorial Science

4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

noticeable patterns appeared in the data (14). Horizontal and vertical lines come into focus when applying this cut-off, and data points with the highest number of counts generally occur at the intersection of lines ( Figure 1). Although the underlying cause of these patterns is not explicitly discussed in the literature (2,5,6,7,9,10,11,14,17), a relationship between counts and ligand potency is generally assumed, and observed patterns are implied to provide structure-activity-relationships (SAR). For instance, the observation of increased counts at the intersection of two lines is often observed (2,6,10, 11,14,17) in screening data, and presented as an indication of convergent SAR (2,10,11). We previously reported computational simulations which suggest that a DEL screen, conducted at a single protein concentration, is unlikely to provide a meaningful correlation between counts and equilibrium association constant (28). This conclusion is reasonable since i) a binding assay requires a minimum of several data points, over a range of protein or ligand concentrations, to reliably measure an equilibrium association constant, and ii) the concentration and the equilibrium association constant of the DEL molecules are unknown, and it is impossible to solve for two unknowns with only a single data point. Thus, these reported simulations predict counts to be strongly influenced by synthetic yield. Still, it is not obvious how synthetic yields alone might bring about the complex patterns visualized in Figure 1, including why library members at the intersections of lines would have higher synthetic yields. However, these previously reported simulated screens did not take into account the formation of truncated byproducts during multi-step library synthesis (28).

ACS Paragon Plus Environment

Page 4 of 27

Page 5 of 27

ACS Combinatorial Science

5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A)

B)

Cycle 1: 192 AminoAcids Cycle 2: 192 Amines Cycle 3: 192 Amines

Figure 1. A) DEL screening data following selection of DEL-A against p38 MAP kinase as reported by Clark et. al.(14) Each data point represents a particular DNA barcode. The x- and y- axis represent the building block used during library synthesis during chemistry cycle-3 and 1 respectively. The building blocks have been arbitrarily assigned ID numbers. Larger and darker data points possess a higher number of counts. Only data points with counts >9, and the AMMB building block at cycle-2, are shown. B) Schema for the synthesis of DEL-A as reported by Clark et.al.(14).

In theory, the underlying cause of the patterns observed in Figure 1 may be determined through experiment. This would require the synthesis, purification, and characterization of thousands of DNA conjugates, followed by determination of equilibrium association constants for each individual library member. Unfortunately, such an experiment is prohibitively expensive. As an alternative, we seek to address the above questions via computational simulation. In silico DELs will be constructed that mimic actual DELs, as produced through split-and-pool multi-step chemical synthesis. Quantities of all products and by-products will be tracked, and equilibrium association constants assigned to each individual in silico library member. Simulation of DEL screens will be accomplished using the previously described mathematical model (28). This model treats a DEL screen as a solution-phase thermodynamic system, similar to a simple binding assay (29). If the in silico DELs, and assigned equilibrium association constants, have been generated appropriately, then it is expected that simulated DEL screens will provide patterns similar to that observed for

ACS Paragon Plus Environment

ACS Combinatorial Science

6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

experimental screens. It is postulated that the underlying causes of the patterns observed in the in silico screens may also be valid for similar patterns observed in reported experimental data. It is hypothesized that the lack of correlation between counts and equilibrium association constant may be explained by taking into account both synthetic yield, and byproducts formed, during multi-step chemical synthesis of DELs.

Results and Discussion

Generation of an in silico DEL, and comparison of simulated and experimental screening results An in silico DEL, herein referred to as DEL-S1, is generated with the goal of generally mimicking the yields of products and byproducts of DEL-A as reported by Clark et. al. (14) (see Figure 1). The DEL probability tree shown in Figure 2 describes a simplified stochastic model of the multi-step chemical synthesis of DEL-A, and is used as a template for the generation of DELS1. At each ith cycle of chemistry, desired product is formed with a yield of Yi, while starting material is left-over (or reacts with solvent, or impurities in reagent stock solutions (14)) with a yield of 1-Yi. Thus, each barcode within the library encodes a combination of eight different products types (9 9-16 16 as shown in Figure 2), and the yields of 9-16 sum to 1. The aforementioned probability tree (Figure 2) assumes the following i) the synthetic yield of each building block is an independent random variable sampled from a normal distribution (mean = 0.6, sd = 0.2), and ii) left-over starting materials and other byproducts are binned together, and treated as a single entity with a combined yield of 1-Yi. The synthetic yield of each building block upon reaction with a DNA-conjugate is commonly assumed independent of the exact composition of the DNA-conjugated starting material (2,14). This assumption provides the rational for validation of building blocks prior to use in library synthesis (building blocks are tested in model reactions, and those that provide high yields in the model reaction are later used in library synthesis). A computational model could be devised which does not make this assumption, and instead assigns individual yields to every combination of building block and possible DNA-conjugated starting material. However, doing so has no obvious benefit, and would increase computational overhead. Numerous byproducts might be formed throughout library synthesis, although the most common occur due to reaction with solvents or contaminants in solvents (14). The probability tree (Figure 2) provides a simplified model, where all left-over starting material and byproducts of a particular reaction are binned into a single product type with yield (1-Yi). Specific byproducts could be readily incorporated into the DEL-A probability tree (Figure 1), however, doing so would complicate the model and offers no obvious benefit.

ACS Paragon Plus Environment

Page 6 of 27

Page 7 of 27

ACS Combinatorial Science

7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Probability Tree for DEL-A (14) and in silico library DEL-S1. DNA conjugate 1 is a barcode with a specific DNA sequence, and a starting material for library synthesis. Via a technique such as split-and-pool chemistry, DNA barcode (1 1) yields 8 different product types (9 9-16 16). 16 The substituent X may be unchanged from starting material (-Cl), or the result of reaction with solvent (such as ethanol), impurities in the solvent, or impurities within the building block stock solutions (such as dimethylamine)(14). The relative amounts of each product are dictated by the synthetic yields at each cycle of chemistry for that particular building block (see Figure 1 for synthetic schema). Synthetic yields are assumed to be independent of the starting material composition.

Table 1 illustrates one method of combining building blocks to create an exemplar in silico DEL. For example, in silico building blocks are assigned IDs of 1-3 at each cycle, and three cycles iterated over (three nested loops) to create a 27-member library. Each in silico library member is assigned a barcode which consists of the building block IDs from each cycle of chemistry concatenated to from a short identifier. To conduct a simulated DEL screen, each library member must also be assigned an equilibrium association constant. For the exemplar 27-member DEL, 27 absolute values were sampled from a normal distribution (mean = 0, sd = 9), sorted in descending order, and then assigned to each library member as shown in Table 1. Now each library member has a barcode and an assigned equilibrium association constant. Library members with the lowest index numbers have the greatest equilibrium association constants.

ACS Paragon Plus Environment

ACS Combinatorial Science

8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As mentioned above, the library molecules are constructed from 3 nested loops. Building block IDs derived from the outermost loop show the strongest correlation with assigned equilibrium association constant. In contrast, IDs derived by the innermost loop show the least correlation with equilibrium association constant. In the case of DEL-A , SAR shows that a particular building block, AMMB, is required at cycle-2 for the molecule to bind the target (14). Alternatively, cycles-1 and -3 tolerate a wide range of structural diversity (14). For this reason, DEL-S1 cycle-2 is assigned to the outermost loop, and cycles-3 and -1 assigned to the middle and inner loops respectively. Because each cycle of DEL-S1 contains 100 building block IDs, one million equilibrium association constants were generated (values taken from a halfnormal distribution (mean = 0, sd = 1.9) prior to sorting). A histogram of DEL-S1 equilibrium association constants (Figure S1), and a discussion regarding the choice of distribution to generate the equilibrium association constants is provided in the supporting information.

ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27

ACS Combinatorial Science

9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1. Construction of an exemplar 27-member in silico DEL.a Index BB ID

BB ID

BB ID

Outer Loop

Middle Loop

Inner Loop Barcode

value

1

1

1

1

1-1-1

10.3

2

1

1

2

1-1-2

9.0

3

1

1

3

1-1-3

6.0

4

1

2

1

1-2-1

5.6

5

1

2

2

1-2-2

5.5

6

1

2

3

1-2-3

4.6

7

1

3

1

1-3-1

4.2

8

1

3

2

1-3-2

2.9

9

1

3

3

1-3-3

2.6

10

2

1

1

2-1-1

2.5

11

2

1

2

2-1-2

2.2

12

2

1

3

2-1-3

2.1

13

2

2

1

2-2-1

2.1

. . . 27

. . . 3

. . . 3

. . . 3

. . . 3-3-3

. . . 0.2

aSee

DNA

Log10(Ka)

text for details. As mentioned earlier, the DEL-A probability tree (Figure 2) states that reactions provide

product with yield Yi, and left-over starting material and byproducts with yield 1-Yi. Seven of the eight final products encoded by each barcode is a truncation, where the product fails to contain at least one of the three possible building blocks. To ensure that equilibrium association constants are generated for each possible truncated byproduct, it is necessary to assign one building block ID at each cycle as a null. Left-over starting material (or other byproduct) is considered equivalent to a reaction where the null building block was added. Assigning low ID values to the null represents a situation where truncated molecules at that cycle are expected to retain potency comparable to the fully enumerated molecule. Assigning a high ID value to the null increases the likelihood that a truncation at that cycle will be detrimental to potency. Reported SAR for DEL-A suggests that truncation at either cycle-1 or cycle-3 is tolerated (14). In contrast, truncation at cycle-2 is not tolerated (14). For this reason, DEL-S1 was assigned nulls at low ID values for cycles-1 and -3, and a high ID value for cycle-2 (see supporting information Section S4).

ACS Paragon Plus Environment

ACS Combinatorial Science

10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In the case of the exemplar in silico 27-member library described in Table 1, building block ID-2, at each cycle, has been assigned as the null. Table 2 describes the 8 products encoded by the DNA barcode 1-1-1 (Table 1), including their yields and assigned equilibrium association constants (as retrieved from Table 1). Also shown in Table 2 is the formula used to calculate each product yield as dictated by the DEL-A probability tree provide in Figure 2. Note that the yield of every final product contains terms from all three cycles of chemistry. DEL-S1 was generated in a manner similar to that of the exemplar 27-member library in Tables -1 and 2, resulting in a total of 1 million DNA barcodes, and 8 million distinct products (code provided in supporting information DelGenerationAndScreening.txt).

ACS Paragon Plus Environment

Page 10 of 27

Page 11 of 27

ACS Combinatorial Science

11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2. Assigning of yields to products with DNA barcode 1-1-1.a Cmpdb

Actual

Actual

Identityc

Log(Ka)

% Yieldd

Formula to calculate yielde

value 9

2-2-2

2.2

7

(1-Y1A)*(1-Y2)*(1-Y3)

10

2-2-1

2.5

20

(1-Y1A)*(1-Y2)*Y3

11

2-1-2

2.9

14

(1-Y1A)*(Y2)*(1-Y3)

12

2-1-1

4.2

37

(1-Y1A)*Y2*Y3

13

1-2-2

5.5

4

Y1A*Y1B*(1-Y2)*(1-Y3) + Y1A*(1-Y1B)

14

1-2-1

5.6

5

Y1A*Y1B*(1-Y2)*Y3

15

1-1-2

9.0

3

Y1A*Y1B*Y2*(1-Y3)

16

1-1-1

10.3

10

Y1A*Y1B*Y2*Y3

aSee

Table 1 for a description of 27-member in silico DEL. structures provided in Figure 2. cAs discussed in text, unreacted starting material is assigned to building block ID-2. dYields were randomly sampled from a normal distribution: Y1a = .23, Y1b = .89, Y2 = .65, Y3 = .73. eFinal product yields are calculated as dictated by the DEL-A probability tree (Figure 2). bCompound

DEL-S1 was generated and assigned equilibrium association constants in a best effort to mimic reported SAR for the DEL-A screen against p38 Map kinase (14). Additionally, yields of all products and byproducts corresponding to each DNA barcode have been determined. Thus the barcode, yield, composition, and equilibrium association constant for every distinct product for DEL-S1 is known and tracked. It is now possible to simulate a screen using DEL-S1. The simulation is conducted as previously described (see supporting information DelGenerationAndScreening.txt) (28). The result of the simulated DEL-S1 screen is visualized in Figure 3 in the same manner as the experimental data shown in Figure 1 (see supporting information Section S2 and Rcode_ForVisualization.txt). Comparison of the experimental and simulated data reveal similar patterns, including the tendency of data points at the intersection of lines to possess the highest values of counts.

ACS Paragon Plus Environment

ACS Combinatorial Science

12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Visualization of the DEL-S1 simulated screen. Simulated screening data was treated in the same manner as the experimental data displayed in Figure 1; the threshold for counts was increased until patterns emerged (counts > 46). Larger and darker data points possess a higher number of counts. Yields for select cycle-1 building blocks (marked with red stars) are listed in Table 3. Circled data points are detailed in Table 4. Vertical lines marked with blue or orange stars are discussed in the text.

Figure 4 shows the relationship between assigned equilibrium association constant and

counts. As expected, the correlation is weak. Figure 4 also suggests that experimental DEL screens may suffer from false negatives. This conclusion is logical; as synthetic yields trend to zero, so too must observed counts for the corresponding library member. The influence of synthetic yield on counts may also be visualized. As previously discussed, each data point in Figure 3 represents a combination of 8 different product types, all encoded by the same in silico barcode (see Figure 2). The yields of all 8 product types, for each barcode, sums to 100%. It is also possible to calculate the yield of only those products which bind the target. Figure 5 shows the relationship between counts and the total yield of products with equilibrium association constant greater than 106 for each individual barcode. When viewing only data points with

counts > 45, the synthetic yield appears to correlate more strongly with counts than does the equilibrium association constant.

ACS Paragon Plus Environment

Page 12 of 27

Page 13 of 27

ACS Combinatorial Science

13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. DNA barcode counts versus Log10(Ka) values for DEL-S1 simulated screening data. The 753 data points shown in Figure 3 (counts > 45) are displayed. Ordinary least squares with counts and Log(10)Ka as the response and predictor variables respectively, provide an Rsquared value of 0.08.

Figure 5. DNA barcode counts versus total yield of products with Log10(Ka) > 6 for DEL-S1 simulated screening data. The 753 data points shown in Figure 3 (counts > 45) are displayed. Ordinary least squares with counts and Total Yield, Log(10)Ka as the response and predictor variables respectively, provide an R-squared value of 0.43.

The patterns observed in Figure 3 may be further examined to determine their underlying cause. Numerous horizontal lines are observed, including where cycle-1 building block IDs are 9, 30, and 35 (marked with red stars in Figure 3). The underlying cause of these highly enriched horizontal lines are high synthetic yields for those particular cycle-1 building blocks. As shown in Table 3, the yields for cycle-1 building blocks 9, 30, and 35 are significantly greater than the average yield of 60%. In contrast to cycle-1, only select cycle-3 building blocks result in molecules which bind the target. Many of the vertical lines observed in Figure 3 derive from these select cycle-3 building blocks (some examples are marked with a blue star). However, many of the vertical

ACS Paragon Plus Environment

ACS Combinatorial Science

14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

lines observed in Figure 3 encode cycle-3 building blocks which result in molecules that do not bind the target (two examples of these vertical lines are marked with an orange star). In these cases, the cycle-3 building blocks have very low synthetic yields (14-17%), and the underlying cause of these vertical lines are high yields of products truncated at cycle-3. Table 4 Entry 1 details a single data point on the line observed at cycle-3 building block ID-23 (see green circle in Figure 4). As shown in Table 4, the high enrichment of this barcode is entirely due to cycle-3 truncated products.

Table 3. Percent yields for select cycle-1 building blocks. a CycleCycle-1 ID

Y1A

Y1B

9

83

82

30

92

91

35

81

78

aThese

building blocks are marked by red stars in Figure 3. See the DEL-A probability tree (Figure 2) for a description of Y1A and Y1B.

ACS Paragon Plus Environment

Page 14 of 27

Page 15 of 27

ACS Combinatorial Science

15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 4. Detailed breakdown of select data points from Figure 3.a Entry Observed Barcode

1

2

3

4

aData

CycCyc-1

CycCyc-2

CycCyc-3

9

31

23

70

70

77

31

31

31

39

26

39

Actual BBs

Log10 Log10 % Yield

counts

(Ka)

Sum of

counts

9-31-null

6.87

33

74

null-31-null

5.38

8

9

null-31-39

6.83

20

55

70-31-39

6.16

12

21

70-31-null

7.17

7

15

null-31-null

5.38

13

6

70-31-null

7.16

12

36

null-31-null

5.38

20

7

77-31-39

6.71

12

24

null-31-39

6.83

14

22

null-31-null

5.38

9

2

77-31-null

4.94

7

1

83

97

43

49

points are circled in Figure 3: Entry 1 (Green); Entry 2 (Red); Entry 3 (Purple); Entry 4

(Orange).

The tendency of data points (barcodes) at the intersections of lines to possess the largest values of counts is noticeable in both the experimental and simulated data sets ( Figure 1 and Figure 3 respectively). For the simulated data set, it is possible to breakdown the underlying cause of each barcode’s counts. Table 4 Entry 2 details a highly enriched barcode (see Figure 3, red circle), whose counts are primarily due to a select truncated cycle-2, -3 combination. This truncated byproduct brings about the vertical line that this barcode resides on. The barcode detailed in Table 4 Entry 4 (see Figure 3, orange circle) also resides on this vertical line, but has fewer counts. The barcode in Entry 2 (red circle) has more counts because it lays at the intersection with the horizontal line described by cycle-1 building block ID-70 (Figure 3). This horizontal line results from a cycle-3 truncate possessing building block ID-70 at cycle-1, and ID-31 at cycle-2; the truncate is assigned a Log(10) Ka of 7.17. Thus, Entry 2 benefits from having numerous products which bind the target, all encoded by the same barcode, and therefore possesses a larger number of counts. Table 4 Entry 3 (see purple circle, Figure 3) resides on the same horizontal line as Entry 2. However, Entry 3 does not intersect with a heavily populated vertical line. No vertical line exists because the cycle-3 building block assigned to Entry 3 has a high synthetic yield, and results in products that fail to bind the in silico target.

ACS Paragon Plus Environment

ACS Combinatorial Science

16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

DELs that possess 2-cycles of chemistry are expected to exhibit the same artifacts as discussed above for 3-cycle libraries. In particular, barcodes at the intersections of heavily populated lines are expected to possess high values of counts due to synthetic yield. This conclusion is consistent with numerous reported experimental data sets derived from 2-cycle libraries (2, 17).

Simulated screens to assess methods of minimizing byproduct formation Formation of byproducts during split-and-pool library synthesis is acknowledged as a potential confounding factor for DEL screens (30). Practical methods of byproduct minimization vary depending upon the chemistry used to produce the library and include i) HPLC purification of the library following acylation with FMOC protected amino acids (14), ii) capping of unreacted amines with reagents such as acetic anhydride, and thereby preventing truncated byproducts from undergoing further reactions, and iii) biotinylation of library members, such that truncates may be entirely removed from the library mixture by later purification with immobilized streptavidin (17, 23, 30-33). However, there is no reported evidence that any particular technique significantly improves screening results. DEL-S2 was generated using the probability tree provided in Figure 6. DEL-S2 mimics a library produced via sequential addition of amino acids resulting in a library of encoded tripeptides. DEL-S2 contains 3 cycles of chemistry, each possessing 100 building blocks, resulting in a total of 1 million in silico barcodes. As illustrated in Figure 6, each barcode encodes one of eight different in silico product types. As for DEL-S1, the in silico synthetic yield for each building block is treated as a random variable that takes a value from a half-normal distribution (mean = 60, sd = 20). The result of a simulated screen of DEL-S2 is visualized in Figure 7A, where lines in all three planes of the cube are observed (see supporting information Section 3). Analogs of DEL-S2 (DELs-S3 and –S4) were also generated (see supporting information Section 4 for details). DEL-S3 mimics a library capped with acetic anhydride, where truncated products are prevented from undergoing further reactions. The probability tree for DEL-S3 is provided in Figure S2. DEL-S4 mimics a library where left-over starting materials are biotinylated following each cycle of chemistry, and entirely removed from the library mixture. In this case, all truncated byproducts are removed from the library mixture. Simulated screens for DELs-S3 and –S4 are visualized in Figure 7B and -C respectively.

ACS Paragon Plus Environment

Page 16 of 27

Page 17 of 27

ACS Combinatorial Science

17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Probability tree for a single in silico DEL-S2 DNA barcode. The probability tree represents a DEL synthesized by stepwise amino acid coupling steps, generating a library of encoded tripeptides. Every ith cycle of chemistry results in product with yield Yi , or left-over starting material with yield (1-Yi). All reactions are assumed to result in only desired product, truncated byproduct, or left-over starting material.

ACS Paragon Plus Environment

ACS Combinatorial Science

18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Effect of minimizing amount of truncates on DEL screen output. Each data point represents a DNA barcode; data point size and color are a function of counts. Screens are visualized in the same manner as for experimental data sets; the threshold for counts is increased until patterns are revealed (counts > 17). A) Simulated screen for in silico DEL-S2 (truncates not removed). B) Simulated screen for in silico DEL-S3 (mimics a library where unreacted DNA-conjugated starting material, following each cycle of chemistry, is capped with a reagent such as acetic anhydride). Green stars denote lines which are observed in panel A, but not in panel B. Orange stars denote lines which are readily observed in panel A, but less obvious in panel B. C) Simulated screen for in silico DEL-S4 (all truncates are removed). The red x’s denote lines observable in panel A and -B screening results. Note that only 2 of the 3 lines marked in panel A and -B are readily observed in panel C. The red box denotes three DNA barcodes which are observed in all 3 simulated screens, and provides a reference to better compare the three data sets. The most heavily populated lines in Figure 7A and -B result from truncates. This is because a truncate is a byproduct of every reaction at a specific cycle of chemistry, and therefore encoded by every DNA barcode at that cycle of chemistry. This simulated result is consistent with reported experimental data; heavily populated lines often result in the discovery of biochemically active truncated byproducts (11,14, 17). The result of acetic anhydride capping, as mimicked by DEL-S3, is the complete disappearance of lines resulting from truncations at cycles-1 and -2 (see lines denoted by green stars in Figure 7). Other lines, denoted by yellow stars in Figure 7, become less visible, as these lines in Figure 7A arise from a

ACS Paragon Plus Environment

Page 18 of 27

Page 19 of 27

ACS Combinatorial Science

19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

combination of truncated and fully enumerated products. Removal of all truncates, represented by DEL-S4 and visualized in Figure 7C, results in the complete disappearance, or significant reduction in density, of the heavily populated lines observed in the visualization of DEL-S2 and – S3 screening data. Figure 7 predicts that decreasing truncate formation during library synthesis will result in the disappearance of the most striking patterns otherwise observed in the visualized data. It is theoretically possible to estimate equilibrium association constants of library members by conducting DEL screens over a range of protein concentrations (28). As the protein concentration used in the DEL screen is decreased, counts will rise for library members whose equilibrium dissociation constant is less than the protein concentration. Library member

counts will reach a maximum when the equilibrium dissociation constant is equal to the protein concentration (note that the equilibrium dissociation constant is simply the inverse of the equilibrium association constant ). To investigate the influence of byproducts on the accuracy of estimated equilibrium association constants, simulated screens were run using DELs-S2 and S4 at three protein concentrations (5, 1, and 0.2 µM). Counts for each library member may be plotted against protein concentration as shown in Figure 8C. As protein concentration trends from high to low, counts may steadily increase, decrease, or peak at 1µM protein. A plot is generated for each library member, and the plot is then binned according to its appearance as increasing, decreasing, or convex (Figure 8C). The result for DEL-S2 shows the relationship between plot-appearance and assigned equilibrium association constant to be weak (Figure 8A). In contrast, the result for DEL-S4 shows a strong relationship, where plot-appearance is predictive of equilibrium association constant (Figure 8B). Comparison of Figure 8A and Figure 8B illustrates the detrimental influence of truncates on the ability to estimate equilibrium association constants from DEL screening data. However, it may still be possible to extract useful information from screens derived from unpurified libraries such as DEL-S2 (see section ‘Data Analysis by Aggregation.txt’).

ACS Paragon Plus Environment

ACS Combinatorial Science

20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Influence of truncated byproducts on the ability to estimate equilibrium association constants of library members from screening data. DEL screens were simulated at three protein concentrations; 5, 1, and 0.2 µM (these protein concentrations are roughly equivalent to those reported by Leimbacher et. al., assuming solution-phase conditions (31) ). Counts versus protein concentration for each in silico barcode was plotted (exemplar plots shown in panel C). Each barcode was binned according to the appearance of its plot, as convex, decreasing slope, or increasing slope. The relationship between library member (barcode) plot-appearance and assigned equilibrium association constant (Log10(Ka)) is then visualized, with jittering along the x-axis. Note that the y-axis scales for panels A and B are not identical. Size of data points are a function of counts at the highest protein concentration (5 µM). A) Relationship between plotappearance and equilibrium association constant following simulated screening of DEL-S2. B) Relationship between plot-appearance and equilibrium association constant following simulated screening of DEL-S4. C) Exemplar plots of counts versus protein concentration. These plots are derived from panel B library members (circled in red).

ACS Paragon Plus Environment

Page 20 of 27

Page 21 of 27

ACS Combinatorial Science

21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Data analysis by aggregation The visualization of the DEL-S2 simulated screen (Figure 7A) is misleading, as it implies that each heavily populated line results from a large number of different chemical structures, and not a single truncate encoded by many different barcodes; a more accurate visualization is to aggregate on two of the three chemistry cycles, and ignore the building block ID at the truncated cycle. The most heavily populated lines observed in Figure 7A result from truncated byproducts. This result is consistent with literature reports, where truncates associated with heavily populated lines are found to possess biochemical activity (11,14, 17). DEL-S2 is designed to mimic a library synthesized via 3-cycles of chemistry (Figure 6); aggregation of the screening data involves ignoring one of these cycles, and focusing on the other two cycles. For instance, Figure 9 shows the same data set as Figure 7A, but aggregated on cycle-1,2 building block IDs (see supporting information Rcode_aggregation.txt for code); in this case, the building block at cycle-3 is ignored. The DEL-S2 screening output should also be aggregated on cycle-2,3 building block (data not shown), as this will result in the observation of a different subset of enriched truncated byproducts. Screening data may also be aggregated on each single cycle of chemistry (in this instance, two cycles of chemistry are ignored); for example, Figure 10 shows the DEL-S2 screening data aggregated on cycle-1 (where cycle-2 and -3 are ignored).

ACS Paragon Plus Environment

ACS Combinatorial Science

22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 9. Aggregation of DEL-S2 screening output on cycles-1 and -2. Only data points with

counts > 200 are shown. Data points with greater counts are shown as larger and darker, the largest data point having 10100 counts. Heavily populated lines observed in Figure 7A now appear as single data points (denoted by red ×’s). Vertical lines denoted by the red and purple stars, and the red ×, are observed as single data points upon aggregation cycle-1 (Figure 10).

Figure 10. Aggregation of DEL-S2 screening output on cycle-1. This visualization allows for rapid assessment of important and/or problematic building blocks. Data points denoted by the red and purple stars, and the red ×, appear as vertical lines in Figure 9 and planes in Figure 7A. For example, the heavily populated lines denoted by three red ×’s in Figure 7A reside in a single plane, which now results in a single data point, also marked by a red ×.

ACS Paragon Plus Environment

Page 22 of 27

Page 23 of 27

ACS Combinatorial Science

23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7A, -Figure 8A, -9, and -10 all provide visualizations of the simulated screening output for DEL-S2. The heavily populated lines observed in Figure 7A primarily result from truncated byproducts. Thus, for each observed line, up to 100 data points are enriched due to a single chemical structure. This many-to-one link between barcode and ligand structure brings about the lack of correlation between counts and plot-appearance observed in Figure 8A. A more accurate visualization of the lines observed in Figure 7A can be achieved by aggregating the data on two of the three chemistry cycles, as illustrated in Figure 9. Each enriched truncated byproduct is now represented by a single data point, providing a one-to-one connection between barcode and ligand structure. The technique demonstrated in Figure 8A may now be applied to the aggregated data provided in Figure 9; binned plot-appearance now correctly predicts equilibrium association constant (plot not shown). The above aggregation technique has limitations; it requires correct assessment of data points as belonging to a fully enumerated product, or truncate, prior to assessment of equilibrium association constants. In some cases this assignment may be straightforward, as heavily populated lines or planes are obvious results of truncations (see references 14 and 17 for examples) . However, it may not always be clear from the visualization if a line is the result of a truncation, or instead due to a series of structurally related and fully enumerated products (see references 6 and 7 for examples). Often, trial-and-error synthesis and testing of select compounds will be required to determine the underlying cause of patterns observed in the experimental screening output.

Conclusions Conclusions As predicted in Figure 3 and Table 4, the presence of synthetic artifacts may complicate interpretation of DEL screening output. Visualized patterns in the data, such as lines, can potentially arise from high, or low, yields of particular building blocks. Data points at the intersections of lines should be treated with caution, as they are predicted to be particularly influenced by synthetic yield. Figure 8 demonstrates methods of rank-ordering equilibrium association constants of library members. Following aggregation, this technique takes into account both variance in product synthetic yield, and the aforementioned presence of truncates in the library mixture. Rank ordering of equilibrium association constants from the screening data (as shown in Figure 8B) may also be used to validate methods of DEL production and screening. If rank ordering is not generally predictive of corresponding biochemical activity, then screening protocols, or library encoding methods, may require modification. It is desirable that DEL screens be consistent with the solution-phase thermodynamic model used herein (28), as only then will screening output predict a value of physical significance (in particular, the equilibrium

ACS Paragon Plus Environment

ACS Combinatorial Science

24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

association constant). Additionally, the model herein helps practitioners in the field avoid erroneous conclusions. For example, Figure 4 predicts that screens, run at a single protein concentration, will provide a weak correlation between counts and equilibrium association constant. Lacking this insight, practitioners in the field may misinterpret a weak correlation as resulting from confounding factors, for instance compound management errors during library synthesis.

Supporting Information Python script, R scripts, supplemental figures, and tables.

ACS Paragon Plus Environment

Page 24 of 27

Page 25 of 27

ACS Combinatorial Science

25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1. Ottl, J. Reported applications of DNA-encoded library chemistry. In Handbook for DNAEncoded Chemistry; Goodnow, Robert A., Jr, Ed.; Wiley: New York, 2014; pp 319-347. 2. Franzini, R.M.; Neri, D.; Scheuermann, J. DNA-Encoded chemical libraries: advancing beyond conventional small-molecule libraries. Acc. Chem. Res. 2014, 47, 1247−1255. 3. Daguer, J.; Zambaldo, C.; Ciobanu, M.; Morieux, P.; Barluenga, S.; Winssinger, N. DNA display of fragment pairs as a tool for the discovery of novel biologically active small molecules. Chem. Sci. 2015, 2015 6, 739-744. 4. Li, G.; Zheng, W.; Liu, Y.; Li, X. Novel encoding methods for DNA-templated chemical libraries.

Curr. Opin. Chem. Biol. 2015, 2015 26, 25-33. 5. Connors, W.; Hale, S.; Terrett, N. DNA-encoded chemical libraries of macrocycles. Curr. Opin.

Chem. Biol. 2015, 2015 26, 42-47. 6. Kollmann, C. S.; Bai, X.; Tsai, C.-H.; Yang, H.; Lind, K. E.; Skinner, S. R.; Zhu, Z.; Israel, D. I.; Cuozzo, J. W.; Morgan, B. A.; Yuki, K.; Xie, C.; Springer, T. A.; Shimaoka, M.; Evindar, G. Application of encoded library technology (ELT) to a protein–protein interaction target: Discovery of a potent class of integrin lymphocyte function-associated antigen 1 (LFA-1) antagonists. Bioorg. Med. Chem. 2014, 2014 22, 2353-2365. 7. Deng, H.; O'Keefe, H.; Davie, C.P.; Lind, K.E.; Acharya, R.A.; Franklin, G. J.; Larkin, J.; Matico, R.; Neeb, M.; Thompson, M.M. Discovery of highly potent and selective small molecule ADAMTS-5 inhibitors that inhibit human cartilage degradation via encoded library technology (ELT). J. Med. Chem. 2014, 2014 55, 7061-7079. 8. Mandal, P.; Berger, S.B.; Pillay, S.; Moriwaki, K.; Huang, C.; Guo, H.; Lich, J.D.; Finger, J.; Kasparcova, V.; Votta, B.; Ouellette, M.; King, B.W.; Wisnoski, D.; Lakdawala, A.S.; DeMartino, M.P.; Casillas, L.N.; Haile, P.A.; Sehon, C.A.; Marquis, R.W.; Upton, J.; Daley-Bauer, L.P.; Roback, L.; Ramia, N.; Dovey, C.M.; Carette, J.E.; Chan, F.K-M.; Bertin, J.; Gough, P.J.; Mocarski, E.S.; Kaiser, W.J. RIP3 induces apoptosis independent of pronecrotic kinase activity. Mol. Cell, 2014, 2014 56, 481– 495. 9. Deng, H.; Zhou, J.; Sundersingh, F. S.; Summerfield, J.; Somers, D.; Messer, J. A.; Satz, A. L.; Ancellin, N.; Arico-Muendel, C. C.; Bedard, K. L. (S.; Beljean, A.; Belyanskaya, S. L.; Bingham, R.; Smith, S. E.; Boursier, E.; Carter, P.; Centrella, P. A.; Clark, M. A.; Chung, C.-W.; Davie, C. P.; Delorey, J. L.; Ding, Y.; Franklin, G. J.; Grady, L. C.; Herry, K.; Hobbs, C.; Kollmann, C. S.; Morgan, B. A.; Kaushansky, L. J.; Zhou, Q. Discovery, SAR, And X-Ray Binding Mode Study of BCATm Inhibitors from a Novel DNA-Encoded Library. ACS Med. Chem. Lett. 2015, 2015 6 (8), 919–924. 10. Encinas, L.; O’Keefe, H.; Neu, M.; Remuiñán, M. J.; Patel, A. M.; Guardia, A.; Davie, C. P.; PérezMacías, N.; Yang, H.; Convery, M. A.; Messer, J. A.; Pérez-Herrán, E.; Centrella, P. A.; ÁlvarezGómez, D.; Clark, M. A.; Huss, S.; O’Donovan, G. K.; Ortega-Muro, F.; Mcdowell, W.; Castañeda, P.; Arico-Muendel, C. C.; Pajk, S.; Rullás, J.; Angulo-Barturen, I.; Álvarez-Ruíz, E.; Mendoza-Losana, A.; Pages, L. B.; Castro-Pichel, J.; Evindar, G. Encoded Library Technology As a Source of Hits for the Discovery and Lead Optimization of a Potent and Selective Class of Bactericidal Direct Inhibitors of Mycobacterium Tuberculosis InhA. J. Med. Chem. 2014, 2014 57 (4), 1276–1288. 11. Disch, J. S.; Evindar, G.; Chiu, C. H.; Blum, C. A.; Dai, H.; Jin, L.; Schuman, E.; Lind, K. E.; Belyanskaya, S. L.; Deng, J.; Coppo, F.; Aquilani, L.; Graybill, T. L.; Cuozzo, J. W.; Lavu, S.; Mao, C.;

ACS Paragon Plus Environment

ACS Combinatorial Science

26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Vlasuk, G. P.; Perni, R. B. Discovery Of Thieno[3,2- d ]Pyrimidine-6-Carboxamides as Potent Inhibitors of SIRT1, SIRT2, and SIRT3. J. Med. Chem. 2013, 2013 56 (9), 3666–3679. 12. Gilmartin, A. G.; Faitg, T. H.; Richter, M.; Groy, A.; Seefeld, M. A.; Darcy, M. G.; Peng, X.; Federowicz, K.; Yang, J.; Zhang, S.-Y.; Minthorn, E.; Jaworski, J.-P.; Schaber, M.; Martens, S.; Mcnulty, D. E.; Sinnamon, R. H.; Zhang, H.; Kirkpatrick, R. B.; Nevins, N.; Cui, G.; Pietrak, B.; Diaz, E.; Jones, A.; Brandt, M.; Schwartz, B.; Heerding, D. A.; Kumar, R. Allosteric Wip1 Phosphatase Inhibition through Flap-Subdomain Interaction. Nat. Chem. Biol. 2014, 2014 10 (3), 181–187. 13. Litovchick, A.; Dumelin, C. E.; Habeshian, S.; Gikunju, D.; Guié, M.-A.; Centrella, P.; Zhang, Y.; Sigel, E. A.; Cuozzo, J. W.; Keefe, A. D.; Clark, M. A. Encoded Library Synthesis Using Chemical Ligation And the Discovery of SEH Inhibitors from a 334-Million Member Library. Sci. Rep. 2015, 2015 5, 10916. 14. Clark, M. A.; Acharya, R. A.; Arico-Muendel, C. C.; Belyanskaya, S. L.; Benjamin, D. R.; Carlson, N. R.; Centrella, P. A.; Chiu, C. H.; Creaser, S. P.; Cuozzo, J. W.; Davie, C. P.; Ding, Y.; Franklin, G. J.; Franzen, K. D.; Gefter, M. L.; Hale, S. P.; Hansen, N. J. V.; Israel, D. I.; Jiang, J.; Kavarana, M. J.; Kelley, M. S.; Kollmann, C. S.; Li, F.; Lind, K.; Mataruse, S.; Medeiros, P. F.; Messer, J. A.; Myers, P.; O'keefe, H.; Oliff, M. C.; Rise, C. E.; Satz, A. L.; Skinner, S. R.; Svendsen, J. L.; Tang, L.; Vloten, K. V.; Wagner, R. W.; Yao, G.; Zhao, B.; Morgan, B. A. Design, Synthesis and Selection of DNA-Encoded Small-Molecule Libraries. Nat Chem. Biol. 2009, 2009 5 (9), 647–654. 15. Creaser, S.P.; Acharya, R.A. Exercises in the synthesis of DNA-encoded libraries. In Handbook for DNA-Encoded Chemistry; Goodnow, Robert A., Jr, Ed.; Wiley: New York, 2014; pp 123-152. 16. Macconnell, A. B.; Mcenaney, P. J.; Cavett, V. J.; Paegel, B. M. DNA-Encoded Solid-Phase Synthesis: Encoding Language Design And Complex Oligomer Library Synthesis. ACS Comb. Sci. 2015, 2015 17 (9), 518–534. 17. Wichert, M.; Krall, N.; Decurtins, W.; Franzini, R. M.; Pretto, F.; Schneider, P.; Neri, D.; Scheuermann, J. Dual-Display of Small Molecules Enables the Discovery of Ligand Pairs and Facilitates Affinity Maturation. Nature Chemistry. Nature Chem. 2015, 2015 7 (3), 241–249. 18. Li, G.; Zheng, W.; Liu, Y.; Li, X. Novel encoding methods for DNA-templated chemical libraries.

Curr. Opin. Chem. Biol. 2015, 2015 26, 25-33. 19. Hansen, M. H.; Blakskjær, P.; Petersen, L. K.; Hansen, T. H.; Højfeldt, J. W.; Gothelf, K. V.; Hansen, N. J. V. A Yoctoliter-Scale DNA Reactor For Small-Molecule Evolution. J. Am. Chem. Soc. 2009, 2009 131 (3), 1322–1327. 20. Cao, C.; Zhao, P.; Li, Z.; Chen, Z.; Huang, Y.; Bai, Y.; Li, X. A DNA-Templated Synthesis of Encoded Small Molecules by DNA Self-Assembly. Chem. Commun. 2014, 2014 50 (75), 10997-10999. 21. Li, X.; Liu, D. R. DNA-Templated Organic Synthesis: Nature's Strategy For Controlling Chemical Reactivity Applied to Synthetic Molecules. Angew. Chem. Int. Ed. 2004, 2004 43 (37), 4848– 4870. 22. Luk, K.-C.; Satz, A.L. DNA-compatible chemistry. In Handbook for DNA-Encoded Chemistry; Goodnow, Robert A., Jr, Ed.; Wiley: New York, 2014; pp 67-98. 23. Gouliaev, A. H.; Franch, T., P.-O.; Godskesen, M.A.; Jensen, K.B. (Nuevolution). Bi-functional complexes and methods for making and using such complexes. Patent Application WO 2011/127933 A1, 2012.

ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27

ACS Combinatorial Science

27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24. Satz, A. L.; Cai, J.; Chen, Y.; Goodnow, R.; Gruber, F.; Kowalczyk, A.; Petersen, A.; NaderiOboodi, G.; Orzechowski, L.; Strebel, Q. DNA Compatible Multistep Synthesis And Applications to DNA Encoded Libraries. Bioconjugate Chem. 2015, 2015 26 (8), 1623–1632. 25. Satz, A.L. Foundations of a DNA-encoded library (DEL). In Handbook for DNA-Encoded Chemistry; Goodnow, Robert A., Jr, Ed.; Wiley: New York, 2014; pp 99-122. 26. Hale, S.P. Screening large compound collections. In Handbook for DNA-Encoded Chemistry; Goodnow, Robert A., Jr, Ed.; Wiley: New York, 2014; pp 281-318. 27. Franzini, R. M.; Ekblad, T.; Zhong, N.; Wichert, M.; Decurtins, W.; Nauer, A.; Zimmermann, M.; Samain, F.; Scheuermann, J.; Brown, P. J.; Hall, J.; Gräslund, S.; Schüler, H.; Neri, D. Identification Of Structure-Activity Relationships from Screening a Structurally Compact DNA-Encoded Chemical Library. Angew. Chem. 2015, 2015 127 (13), 3999–4003. 28. Satz, A.L. DNA encoded library selections and insights provided by computational simulations. ACS Chem. Biol. 2015, 2015 10, 2237-2245. 29. Hintersteiner, M.; Buehler, C.; Auer, M. On-bead screens sample narrower affinity ranges of 2012 protein–ligand interactions compared to equivalent solution assays. Chem. Phys. Chem. 2012, 13, 3472-3480. 30. Franzini, R. M.; Biendl, S.; Mikutis, G.; Samain, F.; Scheuermann, J.; Neri, D. “Cap-And-Catch” Purification for Enhancing the Quality of Libraries of DNA Conjugates. ACS Comb. Sci. 2015, 2015 17 (7), 393–398. 31. Leimbacher, M.; Zhang, Y.; Mannocci, L.; Stravs, M.; Geppert, T.; Scheuermann, J.; Schneider, G.; Neri, D. Discovery Of Small-Molecule Interleukin-2 Inhibitors from a DNA-Encoded Chemical Library. Chem. Eur. J. 2012, 2012 18 (25), 7729–7737. 32. Quesnel, A.; Delmas, A.,;Trudelle, Y. Purification of synthetic peptide libraries by affinity chromatography using the avidin-biotin system. Anal. Biochem., 1995, 1995 231, 182-187. 33. Millward, S.W.; Takahashi, T.T.; Roberts, R.W. A general route for post-translational cyclization of mRNA display libraries. J. Am. Chem. Soc. 2005, 2005 127, 14142-14143.

ACS Paragon Plus Environment