Coping with Complexity in Ligand-Based De Novo Design - ACS

Oct 5, 2016 - 1Swiss Federal Institute of Technology (ETH), Department of ... ACS Symposium Series , Vol. ... 6 ), super-additivity (7)) or so-called ...
0 downloads 0 Views 841KB Size
Chapter 8

Coping with Complexity in Ligand-Based De Novo Design Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Gisbert Schneider1,* and Petra Schneider1,2 1Swiss

Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland 2inSili.com LLC, Segantinisteig 3, CH-8049 Zurich, Switzerland *E-mail: [email protected]

Computational de novo design provides fresh ideas for medicinal chemistry. Over the past two decades, this discovery concept has matured so that current implementations can consider multi-dimensional design goals for hit identification and lead optimization. A focus of software development is on predicting the affinities and the synthesizability of the computationally obtained designs. Here, we present our concept of reaction-based molecular design and highlight selected case studies. We show that fragment-based approaches enable the efficient and robust navigation of virtual chemical spaces and deliver synthetically accessible low-complexity ligands as tool compounds and starting points for hit-to-lead optimization. Machine-learning models for activity prediction proved to be sufficiently accurate for compound ranking and scoring so that both target-selective and promiscuous ligands could be readily obtained by automated de novo design.

Introduction Designing objects de novo means constructing something complex from something simple (1). Structurally complex molecular structures with desirable properties and pharmacological activities can be obtained by combining molecular building blocks like atoms and fragments. Loosely speaking, this concept represents the essence of synthetic medicinal chemistry. Importantly, “complex” does not necessarily mean “complicated”. Just as a structurally intricate natural © 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

product is not necessarily complicated to understand or rationalize, the terms “chaotic” and “random” are not equivalent (2). Randomness defines the lack of pattern or predictability in events, whereas a chaotic system is at is at least partially computationally accessible and predictable, or process is at least partially computationally accessible and predictable (3, 4). There are numerous examples of seemingly small structural modifications that have very different effects (activity cliffs (5, 6), super-additivity (7)) or so-called "privileged scaffolds" (8) showing target-promiscuous binding behavior (9). With these and other observed exceptions to the rules, drug discovery essentially complies with many attributes of complex adaptive systems (10, 11):

• • • • •

The number and types of parts in the system and the number of relationships between the parts is non-trivial and/or nonlinear; The system has memory and the ability to forget; The system includes feedback; The system can adapt itself according to its memory and/or feedback; and The system is sensitive to initial conditions.

Essentially, complex systems and processes show nonlinear behavior that is unpredictable and cannot be simply inferred from the behavior of the components (12). Similarly, fragment-based molecular de novo design represents a task that cannot be solved in a straightforward manner but requires a robust optimization strategy to find useful solutions with minimal consumption of time and resources (13, 14). Numerous computational methods have been devised for automated molecular design, all of which implement algorithms for

(i) molecule assembly (the structure generation problem), (ii) compound evaluation (the scoring problem), and (iii) the structural adaptation towards user-defined objectives optimization problem) (1, 15, 16).

(the

Complexity issues play a role in each of these processes because one usually wishes to obtain synthetically easily accessible new compounds that can be further optimized (issue of structural complexity), bind to a defined panel of on-targets while having as few as possible effects on off-targets, and possess suitable physicochemical properties, e.g., water solubility (issue of the complexity of scoring). Finally, the initial designs must be fine-tuned, which typically takes place in a chemical space of high cardinality that easily exceeds 10 (10) potential solutions (issue of search space complexity) (1, 17). Here, we summarize our de novo design concept and the methodology for rapidly finding new bioactive molecules that may serve as tool compounds and starting points for hit-to-lead progression. 144 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Compound Generation Computational de novo design must generate compounds that can be synthesized in few steps. The structural complexity of designed compounds as an estimate of their “undesirability” can be analyzed in terms of structural motifs (18). Many quality indices have been proposed (19). As a rough guideline, low-complexity fragments not exceeding approximately 300 g/mol with high ligand-efficiency often provide suitable starting points for drug discovery (20, 21). In fact, fragment growing and linking were among the first and most successful de novo methods developed (22, 23). In 2000, we proposed using virtual fragments that were obtained from pseudo-retrosynthetic fragmentation of known drugs and other pharmacologically active compounds (24), based on the hypothesis that re-ligation of such potentially “privileged” fragments might result in molecular structures with a high probability of binding to pharmacologically relevant macromolecular targets. Our TOPAS (TOPology-Assigning System) software used the retrosynthetic combinatorial analysis procedure (RECAP) (25) for both the dissection and the forward pseudo-synthesis. Fully automated de novo design with TOPAS resulted in readily synthesizable, potent new chemical entities, such as Kv1.5 potassium channel blockers (26) and cannabinoid receptor 1 antagonists (27, 28), thereby confirming the initial hypothesis. Our next-generation design technique DOGS (Design Of Genuine Structures) builds on this concept and uses physically available building blocks instead of virtual fragments, together with organic synthesis schemes compound construction (Figure 1) (29). Currently, 25,144 educts and 58 reaction schemes (plus regioselective variants thereof) are stored in a MySQL database, where each building block is indexed by a reaction bitstring. The reaction schemes are coded in terms of the Reaction-MQL language (30). DOGS generates new compounds by taking a template ligand as a reference so that the designs are either scaffold-hops away from the template or are analogs, depending on the scoring function settings used (vide infra). In a large-scale design study, we observed that approximately 80% of the de novo designed molecular structures contained novel scaffolds compared to the known ligands of human drug targets (31). The majority of all DOGS designs that have been synthesized in our laboratory were obtained by following the synthetic pathway suggested by the software. This outcome corroborates the reaction-based ligation of fragments as a practical approach for generating readily synthesizable compounds de novo. As in modern parallel chemistry platforms used in the pharmaceutical industry (32–34), we rely on a select set of one-step reaction schemes for rapid ligand prototyping, such as reductive amination, amide formation, cross-coupling reactions, and click reactions. In this case, the design algorithm selects only suitable building block combinations; it does not vary the reactions. Unless an exhaustive library analysis is feasible, we use stochastic algorithms like ant colony systems (MAntA, Molecular Ant Algorithm) or particle swarm optimization for this purpose (35, 36). For several of the reactions, we have coupled the computer-based design process to the actual synthesis of the suggested compounds using a small microfluidics platform (Figure 2). 145 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Figure 1. Reaction-driven generation of synthetically accessible chemical entities. For each virtual synthetic step, the DOGS algorithm first selects a reaction scheme by generating and scoring minimalistic “dummy products”. It then enumerates the full library in a breadth-first fashion.

Figure 2. Schematic of a microfluidics platform for the synthesis of computer-designed compounds by reductive amination. The algorithm selects the building blocks R1-R4 but does not vary the underlying reaction. Inline analytics of the product spectrum helps optimize the reaction conditions and tailor the building block database used for compound design. There is a customized reactor setup for each synthetic reaction. This setup allows us to monitor the actual synthetic accessibility of the designs and rapidly obtain material for testing. The results of the analytics help us customize the building block database used for virtual compound generation. For further applications of integrated microfluidic-based lab-on-a-chip systems to drug discovery, we refer the reader to recent review articles (37, 38). We are convinced that automated robotic synthesis coupled with molecular design 146 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

software will become a method of choice for rapidly generating prototype and tool compounds for medicinal chemistry and chemical biology (39).

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Compound Scoring In DOGS, we rely on a composite ligand-based similarity index (ISOAK, Iterative Similarity Optimal Assignment Kernel) (40) for the actual molecular design process. The objective is to minimize this value. The ISOAK works on the molecular (structure) graph, based on iterative graph similarity and optimal assignment kernels (41). Let G = (V, E) and G′ = (V′, E′) be two undirected labeled molecular graphs (e.g., the design template in DOGS and a virtually generated molecule). Based upon a measure kG,G′ of similarity between the vertices of G and G′, each vertex of the smaller graph is assigned to a vertex of the larger graph such that the total similarity between the assigned vertices is maximized:

where we assume that |V| < |V′|. The maximum is taken over all possible assignments π of the vertices in |V| to vertices in |V′|, i.e., all prefixes of length |V| of permutations of size |V′|. To prevent the value of the kernel from depending on the size |V| of the smaller molecular graph, it is normalized using . The matrix of pairwise vertex similarities X is computed using an iterative procedure, with initial values

and an update step

for |ν| < |ν′|, with the roles of vertex ν and ν′ exchanged otherwise. The maximum is taken over all possible assignments of the neighbors of ν to the neighbors of ν′. In other words, the algorithm optimally assigns the neighbors of the vertex with smaller degree to the neighbors of the vertex with larger degree, based on the similarity values of the previous iteration. The parameter α weights the influence of the constant and the recursive parts of the equation. Using α, this method allows us to emphasize the direct vertex (α → 0) or the vertex neighborhood similarity (α → 1) between the design template and the computationally generated new molecules. Consequently, one will preferably obtain either structural analogs of a given template compound or scaffold-hops, depending on the project requirements (42). The parameter α allows for the continuous blending of direct (vertex-label based) with indirect (neighborhood based) molecular graph similarity. Because of the flexible vertex labeling scheme, the approach can be strictly atomistic or pharmacophore-type based, for example. An example of a conservative compound de novo design 147

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

with DOGS is shown in Figure 3a. Taking compound 1, a potent but receptor subtype non-selective vascular endothelial growth factor receptor (VEGFR) 2 kinase inhibitor, as the template, we obtained the subtype-selective VEGFR-2 inhibitor 2 by in silico fragment morphing (43).

Figure 3. Examples of compounds that were obtained by fragment-based de novo design. In a) and b), the de novo generated (sub)structures are shown in blue. 148 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

We usually run DOGS with several choices of α. Additional compound ranking is performed post hoc, such as predicting target binding, to avoid an overly complex scoring function. For target affinity prediction, we have successfully used Gaussian Process (GP) regression models (44), among other approaches. We pioneered the GP approach for drug discovery several years ago for the virtual screening of compound collections and the de novo design of target-panel selective compounds (45). Recently, we have computed quantitative structure-activity relationship (QSAR) GP models for all human drug targets annotated in ChEMBL19 (46), for which at least 200 compound activities were annotated. 352 of these GP models yielded q2 > 0.6 on external validation data. In the latest application of DOGS for fragment-based design, we correctly identified the de novo generated compound 4 and its derivative 5 (azosemide) as functional mimetics of the template fasudil, 3 (Figure 3b) (47). The automated design software enabled the identification of a fragment-like new chemical entity that inhibits human death-associated protein kinase 3 (DAPK3), which we computationally predicted using a GP regression model and experimentally confirmed using the first X-ray crystal structure of the DAPK3 homodimer in complex with the de novo designed ligand (PDB-ID: 5a6n; Figure 4). Using the GP QSAR models, we also identified carbonic anhydrase IX as a hitherto unknown target of the anti-hypertensive drug azosemide. The chemical structure of azosemide, 5, represents a grown version of the de novo designed DAPK3 fragment-like inhibitor, 4. In a sense, one drug, fasudil, was morphed into another drug, azosemide, via the computer-generated intermediate. The results of this study demonstrate that automated molecular design in combination with target prediction and structure-based validation allows for rapid ligand prototyping and bears exceptional potential for future drug discovery and chemical biology. In this set-up, we completed the full de novo design cycle, encompassing computational ligand design and target prediction, chemical synthesis, biochemical testing, and biophysical determination of the ligand-target complex.

Figure 4. Active site model of hDAPK3 with bound de novo designed fragment-like inhibitor 4. Interacting side chains and hydrogen bonds (dotted lines) are highlighted. There is no observable ligand interaction with the hinge residues in this crystal structure (PDB-ID: 5a6n). 149 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Using multi-dimensional target affinity scoring in combination with an ant colony system for compound selection (MAntA), we obtained the selective ligandefficient compounds 6 and 7 as minimalist sigma-1 receptor and dopamine D4 receptor ligands (Figure 3c) (48). The nanomolar potencies and the selectivity of the hits obtained were accurately predicted by GP QSAR models with an overall success rate of 90%. In another recent project, we used the MAntA molecular design software for combinatorial design for the example of reductive amination as a privileged tool reaction to obtain 5-HT2B-selective ligands. Using both CATS (49) pharmacophores and Morgan-type substructure fingerprints (www.rdkit.org), the machine-learning algorithm suggested 5774 preferred products, from which we selected four for synthesis and biochemical testing. The predicted 5-HT2B selectivity over 5-HT2A/C served as the guiding criterion, with synthesizability as the second (implicit) objective. We obtained the perfectly 5-HT2B-selective compound 8 from this multi-objective study, relying again on a panel of quantitative bioactivity GP models. Computationally generating promising compounds (positive design) is as important as eliminating the bad apples to avoid undesired effects (negative design). However, how do we determine which of the predicted “hits” merit follow-up? Which should be discarded? A gut feeling in medicinal chemistry will take us only so far. Importantly, for molecular design, compound library curation and the evaluation of the hits found by experimental screening and de novo design, one wishes to identify potential false positives as well as promiscuous ligands that interact with multiple targets. We originally introduced the term "frequent hitter" to designate compounds that turned up as hits in multiple experiments, independent of the particular assay type used (50). Although potentially reactive, poorly soluble, and aggregation-prone compounds should be avoided (unless aiming for covalent target binding, for example) (51), multi-target engagement can be valuable (52). In a pilot study, we pursued a computational approach that may help with identifying these promiscuous binders, as outlined in the following section (9). Following our previous work on frequent hitter prediction (50), we trained feed-forward artificial neural networks to distinguish between potential false-positives and promiscuous binders. We extracted the required training data from ChEMBL19. The false-positive set contained 13,468 compounds that received at least three flags indicating undesirable substructures. The promiscuous binder set contained 2,043 compounds that were annotated as potent (KD/i or IC/EC50 ≤ 1 µM) ligands of multiple targets stemming from at least three different target classes (GPCRs, proteases, kinases, other enzymes, nuclear receptors, ion channels). We represented all compounds in terms of their topological pharmacophores (CATS) so that the neural network could perform substructure-independent feature extraction from the training data. This preliminary neural network tool (single hidden layer, 10 hidden neurons) attained a Matthews correlation of 0.61±0.5 on cross-validation test data. Extensive compound library analysis using this model showed that 7-11% of the known drug-like compounds with annotated bioactivities might possess pronounced multi-target binding potential. In an initial prospective design study, we analyzed 150

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

a virtual combinatorial library of 2,469,832 reductive amination products, which we assembled from commercially available molecular building blocks. Compound 9 received a high promiscuous binder score (98% pseudo-probability), which we confirmed after synthesis and testing. In this project, we used our qualitative target prediction tool SPiDER (Self-organizing map–based Prediction of Drug Equivalence Relationships) (53). Six of the eight predicted targets were hit by compound 9. This result confirms the predictions made by the promiscuous binder and SPiDER models. This prototypic method for identifying multi-target ligands might guide the design of custom polypharmacological compound libraries and prioritize certain molecular fragments for drug discovery. Further improvement of the robustness of the model should be possible by considering other machine-learning algorithms and molecular representations.

Compound Optimization Iterative synthesize-and-test cycles are key to the optimization of compound properties (54). We recently demonstrated that there are optimal combinations of the size of a screening library and the number of iterative screening rounds when the goal is to minimize the experimental cost (55). Machine-learning methods can guide a molecular design process that constantly adapts to a dynamic structure-activity relationship model (“active learning” concept) (56). Consequently, a central idea of our approach to drug design is an adaptive fitness landscape as a mathematical model of the underlying structure-activity relationship for a given drug target or design objective. Such a model organizes parts of chemical space (that is, all compounds that can be synthesized with a given set of chemical reactions and molecular building blocks) into regions of high and low predicted bioactivity or any other property of interest. At the beginning of a drug discovery project, in the absence of (m)any known active and inactive compounds, this fitness landscape is largely unbiased (Figure 5). As increasing numbers of active and inactive compounds are generated during a lead discovery project, the machine-learning model incorporates the new knowledge into the adaptive landscape, and this guides the next round of compound synthesis and testing (57). The actual compound selection/optimization is performed in high-dimensional descriptor space, and the visualization of fitness landscapes can help supervise the process, as we have shown for the hit-to-lead progression of somatostatin receptor subtype 5 antagonists with our software tool LiSARD (Ligand Structure-Activity Relationship Display) (58). Several such methods have been developed, implemented and applied to medicinal chemistry projects (59, 60). There still are only a few published hit and lead identification studies in which explicit active learning was performed (61, 62). This fact might be a consequence of the way hit-to-lead optimization is usually carried out in medicinal chemistry. Often, high-throughput screens provide the initial hits, only some of which are subsequently followed-up by the project teams. Information on all tested 151

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

compounds, including the negative data, is often ignored, and rapid iteration through multiple design-synthesize-test cycles is rarely pursued, which may in part simply be a consequence of the special type of compound and assay handling required for active learning. In a preliminary prospective exercise, we sought to identify new antagonists of chemokine receptor CXCR4 using active learning (63). We trained an initial random forest QSAR model using public CXCR4 data from ChEMBL (287 curated ligands), and then performed two active learning cycles with 30 compounds each, which we purchased from compound vendors. The new compounds were tested for CXCR4 antagonism in cell-based assays. After completion of each testing round, we updated the random forest model with the newly obtained activity data. Importantly, the predictive uncertainty decreased with each learning cycle (Figure 6). The active learning process sampled the screening compound pool (1.5 million compounds) so that the 2 ´ 30 = 60 added compounds captured the CXCR4 structure-activity relationship. Although it is still in its early stages, the results of this preliminary study suggest the applicability of active machine-learning to drug discovery projects. The concept might be particularly helpful for projects that use demanding and resource-intensive assays that preclude high-throughput applications.

Figure 5. Evolving fitness landscape. Molecular de novo design can be controlled by autonomous software. P(x) is a computed pseudo-probability function, e.g., a QSAR machine-learning model. x’ and x” are the coordinates of the projected descriptor space X. Both active and inactive compounds contribute to the model. Multiple landscapes can be combined for “polypharmacological” compound design and optimization.

152 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

Figure 6. Estimation of QSAR model adaptation by active learning. The predictive uncertainty of the machine-learning function (here: a random forest model) decreases with each screening round.

Conclusions The ligand-based discovery platform described herein can be widely used to quickly identify starting points for various drug target families, provided that templates for de novo design are available. Our target prediction models currently encompass several hundred human drug targets. Additional and improved models should become available with publicly accessible database updates. The active learning concept will enable continuous online training so that fully automated and updated panels of QSAR functions will be available for the medicinal chemist in the near future. The results of our studies suggest a feasible solution for fast fragment-based de novo design of compounds with accurately predicted designer polypharmacological or selectivity profiles. Together with the polygenic nature of several diseases, such platforms may even be suited to economically prototype efficacious tools for personalized medicine (64). The results obtained validate the combination of advanced machine-learning methods (65) with automated chemical synthesis and fast bioassay turnover as a general approach for rapid hit and lead discovery.

Acknowledgments The authors thank all present and former members of the Computer-Assisted Drug Design group at ETH Zurich for their contributions and stimulating discussion.

References 1.

Schneider, G., Ed. De Novo Molecular Design; Wiley-VCH: Weinheim, New York, 2013. 153

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

2. 3.

4. 5.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

6.

7.

8. 9.

10. 11. 12. 13. 14. 15.

16. 17. 18. 19. 20.

Johnson, N. F. Simply Complexity: A Clear Guide to Complexity Theory; Oneworld Publications: London, 2009. Testa, B.; Bojarski, A. J. Molecules as complex adaptative systems: Constrained molecular properties and their biochemical significance. Eur. J. Pharm. Sci. 2000, 11, S3–S14. Schneider, G. De novo design – hop(p)ing against hope. Drug Discovery Today Technol. 2013, 10, e453–e460. Dimova, D.; Stumpfe, D.; Hu, Y.; Bajorath, J. Activity cliff clusters as a source of structure-activity relationship information. Expert Opin. Drug Discovery 2015, 10, 441–447. Husby, J.; Bottegoni, G.; Kufareva, I.; Abagyan, R.; Cavalli, A. Structure-based predictions of activity cliffs. J. Chem. Inf. Model. 2015, 55, 1062–1076. Nazaré, M.; Matter, H.; Will, D. W.; Wagner, M.; Urmann, M.; Czech, J.; Schreuder, H.; Bauer, A.; Ritter, K.; Wehner, V. Fragment deconstruction of small, potent factor Xa inhibitors: Exploring the superadditivity energetics of fragment linking in protein-ligand complexes. Angew. Chem., Int. Ed. 2012, 51, 905–911. Zhao, H; Dietrich, J. Privileged scaffolds in lead generation. Expert Opin. Drug Discovery 2015, 10, 781–790. Schneider, P.; Röthlisberger, M.; Reker, D.; Schneider, G. Spotting and designing promiscuous ligands for drug discovery. Chem. Commun. 2016, 52, 681–684. Koza, J. R. Genetic Programming; The MIT Press: Cambridge, 1992. Schneider, G.; So, S.-S. Adaptive Systems in Drug Design; Landes Bioscience: Georgetown, 2001. Dracopoulos, D. C. Evolutionary Learning Algorithms for Neural Adaptive Control; Springer: London, 1997. Schneider, G. De novo design – hop(p)ing against hope. Drug Discovery Today Technol. 2012, 10, e453–e460. Jorgensen, W. L. Efficient drug lead discovery and optimization. Acc. Chem. Res. 2009, 42, 724–733. Schneider, G; Hartenfeller, M; Reutlinger, M; Tanrikulu, Y; Proschak, E; Schneider, P. Voyages to the (un)known: Adaptive design of bioactive compounds. Trends Biotechnol. 2009, 27, 18–26. Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discovery 2005, 4, 649–663. Hartenfeller, M.; Schneider, G. De novo drug design. Methods Mol. Biol. 2011, 672, 299–323. Boda, K.; Johnson, A. P. Molecular complexity analysis of de novo designed ligands. J. Med. Chem. 2006, 49, 5869–5879. Bickerton, G. R.; Paolini, G. V.; Besnard, J.; Muresan, S.; Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 2012, 4, 90–98. Jhoti, H.; Williams, G.; Rees, D. C.; Murray, C. W. The ‘rule of three’ for fragment-based drug discovery: Where are we now? Nat. Rev. Drug Discovery 2013, 12, 644–645. 154

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

21. Nilar, S. H.; Ma, N. L.; Keller, T. H. The importance of molecular complexity in the design of screening libraries. J. Comput. Aided Mol. Des. 2013, 27, 783–792. 22. Böhm, H.-J. The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J. Comput. Aided Mol. Des. 1992, 6, 61–78. 23. Barreiro, G.; Kim, J. T.; Guimarães, C. R. W.; Bailey, C. M.; Domaoal, R. A.; Wang, L.; Anderson, K. S.; Jorgensen, W. L. From docking false-positive to active anti-HIV agent. J. Med. Chem. 2007, 50, 5324–5329. 24. Schneider, G.; Lee, M.-L.; Stahl, M.; Schneider, P. De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J. Comput. Aided Mol. Des. 2000, 14, 487–494. 25. Lewell, X. Q.; Judd, D. B.; Watson, S. P.; Hann, M. M. RECAP – retrosynthetic combinatorial analysis procedure: A powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 1998, 38, 511–522. 26. Schneider, G.; Clément-Chomienne, O.; Hilfiger, L.; Schneider, P.; Kirsch, S.; Böhm, H.-J.; Neidhart, W. Virtual screening for bioactive molecules by evolutionary de novo design. Angew. Chem., Int. Ed. 2000, 39, 4130–4133. 27. Rogers-Evans, M.; Alanine, A. I.; Bleicher, K. H.; Kube, D.; Schneider, G. Identification of novel cannabinoid receptor ligands via evolutionary de novo design and rapid parallel synthesis. QSAR Comb. Sci. 2004, 23, 426–430. 28. Alig, L.; Alsenz, J.; Andjelkovic, M.; Bendels, S.; Bénardeau, A.; Bleicher, K.; Bourson, A.; David-Pierson, P.; Guba, W.; Hildbrand, S.; Kube, D.; Lübbers, T.; Mayweg, A. V.; Narquizian, R.; Neidhart, W.; Nettekoven, M.; Plancher, J. M.; Rocha, C.; Rogers-Evans, M.; Röver, S.; Schneider, G.; Taylor, S.; Waldmeier, P. Benzodioxoles: Novel cannabinoid-1 receptor inverse agonists for the treatment of obesity. J. Med. Chem. 2008, 51, 2115–2127. 29. Hartenfeller, M.; Zettl, H.; Walter, M.; Rupp, M.; Reisen, F.; Proschak, E.; Weggen, S.; Stark, H.; Schneider, G. DOGS: Reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 2012, 8, e1002380. 30. Reisen, F. H.; Schneider, G.; Proschak, E. Reaction-MQL: line notation for functional transformation. J. Chem. Inf. Model. 2009, 49, 6–12. 31. Hartenfeller, M.; Eberle, M.; Meier, P.; Nieto-Oberhuber, C.; Altmann, K.H.; Schneider, G.; Jacoby, E.; Renner, S. Probing the bioactivity-relevant chemical space of robust reactions and common molecular building blocks. J. Chem. Inf. Model. 2012, 52, 1167–1178. 32. Albert, J. S.; Blomberg, N.; Breeze, A. L.; Brown, A. J.; Burrows, J. N.; Edwards, P. D.; Folmer, R. H.; Geschwindner, S.; Griffen, E. J.; Kenny, P. W.; Nowak, T.; Olsson, L. L.; Sanganee, H.; Shapiro, A. B. An integrated approach to fragment-based lead generation: Philosophy, strategy and case studies from AstraZeneca’s drug discovery programmes. Curr. Top. Med. Chem. 2007, 7, 1600–1629. 33. Hillisch, A.; Heinrich, N.; Wild, H. Computational chemistry in the pharmaceutical industry: From childhood to adolescence. ChemMedChem 2015, 10, 1958–1962. 155

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

34. Lessel, U. Fragment-based design of focused compound libraries. In De Novo Molecular Design, Schneider, G., Ed.; Wiley-VCH: Weinheim, New York, 2013; pp 349–371. 35. Hiss, J. A.; Reutlinger, M.; Koch, C. P.; Perna, A. M.; Schneider, P.; Rodrigues, T.; Haller, S.; Folkers, G.; Weber, L.; Baleeiro, R. B.; Walden, P.; Wrede, P.; Schneider, G. Combinatorial chemistry by ant colony optimization. Future Med. Chem. 2014, 6, 267–280. 36. Hartenfeller, M.; Proschak, E.; Schüller, A.; Schneider, G. Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization. Chem. Biol. Drug Des. 2008, 72, 16–26. 37. Dittrich, P. S.; Manz, A. Lab-on-a-chip: Microfluidics in drug discovery. Nat. Rev. Drug Discovery 2006, 5, 210–218. 38. Rodrigues, T.; Schneider, P.; Schneider, G. Accessing new chemical entities through microfluidic technology. Angew. Chem., Int. Ed. 2014, 53, 5750–5758. 39. King, R. D.; Rowland, J.; Oliver, S. G.; Young, M.; Aubrey, W.; Byrne, E.; Liakata, M.; Markham, M.; Pir, P.; Soldatova, L. N.; Sparkes, A.; Whelan, K. E.; Clare, A. The automation of science. Science 2009, 324, 85–89. 40. Rupp, M.; Schneider, G. Graph kernels for molecular similarity. Mol. Inf. 2010, 29, 266–273. 41. Rupp, M.; Proschak, E.; Schneider, G. Kernel approach to molecular similarity based on iterative graph similarity. J. Chem. Inf. Model. 2007, 47, 2280–2286. 42. Klenner, A.; Hartenfeller, M.; Schneider, P.; Schneider, G. ‘Fuzziness’ in pharmacophore-based virtual screening and de novo design. Drug Discovery Today Technol. 2010, 7, e237–e244. 43. Rodrigues, T.; Kudoh, T.; Roudnicky, F.; Lim, Y. F.; Lin, Y. C.; Koch, C. P.; Seno, M.; Detmar, M.; Schneider, G. Steering target selectivity and potency by fragment-based de novo drug design. Angew. Chem., Int. Ed. 2013, 52, 10006–10009. 44. Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; MIT Press: Cambridge, 2006. 45. Rupp, M.; Schroeter, T.; Steri, R.; Zettl, H.; Proschak, E.; Hansen, K.; Rau, O.; Schwarz, O.; Müller-Kuhrt, L.; Schubert-Zsilavecz, M.; Müller, K. R.; Schneider, G. From machine learning to natural product derivatives that selectively activate transcription factor PPARgamma. ChemMedChem 2010, 5, 191–194. 46. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. 47. Rodrigues, T.; Reker, D.; Welin, M.; Caldera, M.; Brunner, C.; Gabernet, G.; Schneider, P.; Walse, B.; Schneider, G. De novo fragment design for drug discovery and chemical biology. Angew. Chem., Int. Ed. 2015, 54, 15079–15083.

156 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

48. Reutlinger, M.; Rodrigues, T.; Schneider, P.; Schneider, G. Multi-objective molecular de novo design by adaptive fragment prioritization. Angew. Chem., Int. Ed. 2014, 53, 4244–4248. 49. Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inf. 2013, 32, 133–138. 50. Roche, O.; Schneider, P.; Zuegge, J.; Guba, W.; Kansy, M.; Alanine, A.; Bleicher, K.; Danel, F.; Gutknecht, E. M.; Rogers-Evans, M.; Neidhart, W.; Stalder, H.; Dillon, M.; Sjögren, E.; Fotouhi, N.; Gillespie, P.; Goodnow, R.; Harris, W.; Jones, P.; Taniguchi, M.; Tsujii, S.; von der Saal, W.; Zimmermann, G.; Schneider, G. Development of a virtual screening method for identification of "frequent hitters" in compound libraries. J. Med. Chem. 2002, 45, 137–142. 51. Bauer, R. A. Covalent inhibitors in drug discovery: From accidental discoveries to avoided liabilities and designed therapies. Drug Discovery Today 2015, 20, 1061–73. 52. Zhao, H.; Dietrich, J. Privileged scaffolds in lead generation. Expert Opin. Drug Discovery 2015, 10, 781–790. 53. Reker, D.; Rodrigues, T.; Schneider, P.; Schneider, G. Identifying the macromolecular targets of de novo designed chemical entities through self-organizing map consensus. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 4067–4072. 54. Muegge, I. Synergies of virtual screening approaches. Mini Rev. Med. Chem. 2008, 8, 927–933. 55. Schneider, G.; Schüller, A. Adaptive combinatorial design of focused compound libraries. Methods Mol. Biol. 2009, 572, 135–147. 56. Reker, D.; Schneider, G. Active-learning strategies in computer-assisted drug discovery. Drug Discovery Today 2015, 20, 458–465. 57. Schneider, G. Future de novo drug design. Mol. Inf. 2014, 33, 397–402. 58. Reutlinger, M.; Guba, W.; Martin, R. E.; Alanine, A. I.; Hoffmann, T.; Klenner, A.; Hiss, J. A.; Schneider, P.; Schneider, G. Neighborhoodpreserving visualization of adaptive structure-activity landscapes: Application to drug discovery. Angew. Chem., Int. Ed. 2011, 50, 11633–11636. 59. Reutlinger, M.; Schneider, G. Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery. J. Mol. Graph. Model. 2012, 34, 108–117. 60. Bajorath, J.; Peltason, L.; Wawer, M.; Guha, R.; Lajiness, M. S.; Van Drie, J. H. Navigating structure-activity landscapes. Drug Discovery Today 2009, 14, 698–705. 61. Besnard, J.; Ruda, G. F.; Setola, V.; Abecassis, K.; Rodriguiz, R. M.; Huang, X. P.; Norval, S.; Sassano, M. F.; Shin, A. I.; Webster, L. A.; Simeons, F. R.; Stojanovski, L.; Prat, A.; Seidah, N. G.; Constam, D. B.; Bickerton, G. R.; Read, K. D.; Wetsel, W. C.; Gilbert, I. H.; Roth, B. L.; Hopkins, A. L. Automated design of ligands to polypharmacological profiles. Nature 2012, 492, 215–220. 157

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch008

62. Desai, B.; Dixon, K.; Farrant, E.; Feng, Q.; Gibson, K. R.; van Hoorn, W. P.; Mills, J.; Morgan, T.; Parry, D. M.; Ramjee, M. K.; Selway, C. N.; Tarver, G. J.; Whitlock, G.; Wright, A. G. Rapid discovery of a novel series of Abl kinase inhibitors by application of an integrated microfluidic synthesis and screening platform. J. Med. Chem. 2013, 56, 3033–3047. 63. Reker, D.; Schneider, P.; Schneider, G. Multi-objective active machine learning rapidly improves structure-activity models and reveals new protein-protein interaction inhibitors. Chem. Sci. 2016, DOI: 10.1039/ C5SC04272K. 64. Evans, W. E.; Johnson, J. A. Pharmacogenomics: The inherited basis for interindividual differences in drug response. Annu. Rev. Genomics Hum. Genet. 2001, 2, 9–39. 65. Gawehn, E.; Hiss, J. A.; Schneider, G. Deep learning in drug discovery. Mol. Inf. 2016, 35, 3–14.

158 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.