Perspective pubs.acs.org/jmc
De Novo Design at the Edge of Chaos Miniperspective Petra Schneider†,‡ and Gisbert Schneider*,† †
Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Swiss Federal Institute of Technology (ETH), Vladimir-Prelog-Weg 4, 8093 Zürich, Switzerland ‡ inSili.com LLC, Segantinisteig 3, 8049 Zürich, Switzerland ABSTRACT: Computational medicinal chemistry offers viable strategies for finding, characterizing, and optimizing innovative pharmacologically active compounds. Technological advances in both computer hardware and software as well as biological chemistry have enabled a renaissance of computer-assisted “de novo” design of molecules with desired pharmacological properties. Here, we present our current perspective on the concept of automated molecule generation by highlighting chemocentric methods that may capture druglike chemical space, consider ligand promiscuity for hit and lead finding, and provide fresh ideas for the rational design of customized screening of compound libraries.
■
DRUG DESIGN IS A COMPLEX ADAPTIVE PROCESS Drug design can be regarded as a sampling problem. Medicinal chemists must select a small set of promising candidates from an inconceivably large chemical compound pool. Any attempt at pure random selection will likely fail because of the most unfavorable odds. For example, there are more than 1013 possibilities to compile a subset of 10 compounds from a pool of 100. The cardinality (“size”) of a typical corporate screening compound collection often exceeds one million substances, and there are several million of compounds available from commercial vendor collections. However, the cardinality of the virtually accessible “de novo” druglike space can only be estimated. Excessively large search spaces prohibit exhaustive searching, even with massive increase of computational power and storage capabilities. The fact that highthroughput screening actually delivers hits is because screening collections do not contain arbitrary molecules but instead have been delicately assembled, often over decades, based on explicit or implicit medicinal chemical knowledge. In this game of large numbers, knowledge-driven approaches provide a feasible, albeit not foolproof, way to avoid compounds with undesirable properties and adverse functions (also referred to as “negative design”).1 Once the bulk is eliminated, one can eventually focus on “activity islands” in chemical space that contain attractive chemical entities for a given drug discovery project (“positive design”). There are several knowledge areas that can be accessed to obtain guidelines for de novo drug design and optimization. These include the following: (i) individually acquired domain knowledge (e.g., chemical principles, heuristics, intuition); (ii) macromolecular target structures (e.g., X-ray or NMR structures, free-electron laser spectroscopy, comparative modeling); (iii) ligands with known effects (e.g., synthetic drugs and fragments, natural products, substrates); © XXXX American Chemical Society
(iv) structure−activity relationship (SAR) models including chemogenomics data. However, rationalizing the actual design process is far from straightforward. In medicinal chemistry, we have to address partially unpredictable system behaviors that are often governed by chance events (“serendipity”),2 a situation one might consider to be at the edge of chaos.3 Just as a complex molecular structure, such as a structurally intricate natural product, does not necessarily mean that it is complicated to understand or rationalize, the terms “chaotic” and “random” are not equivalent. Randomness defines the lack of pattern or predictability in events, whereas a chaotic system or process is at least partially computationally accessible and predictable. Nevertheless, at least for now, there is no apparent deterministic shortcut to finding new molecules with certain desired pharmacological properties.4,5 There are numerous examples of a seemingly small structural modification having a hugely different effect (“activity cliff”)6,7 or a so-called “privileged scaffold″8 showing target-promiscuous binding behavior. With these observed exceptions to the rules, drug discovery essentially complies with many attributes of complex adaptive systems:9−12 • The number and types of parts in the system and the number of relations between the parts are nontrivial and/ or nonlinear. • The system has memory and the ability to forget. • The system includes feedback. • The system can adapt itself according to its memory and/or feedback. • The system is sensitive to initial conditions. Special Issue: Computational Methods for Medicinal Chemistry Received: November 30, 2015
A
DOI: 10.1021/acs.jmedchem.5b01849 J. Med. Chem. XXXX, XXX, XXX−XXX
Journal of Medicinal Chemistry
Perspective
“An adaptive system is a system which is provided with a means of continuously monitoring its own performance in relation to a given figure of merit or optimal condition and a means of modifying its own parameters by a closed loop action so as to approach this optimum. Adaptive systems are inherently nonlinear.”13 Complex systems show nonlinear emergent behavior that is unpredictable and cannot be simply inferred from the behavior of its components. For medicinal chemistry, the entire situation becomes even more precarious considering the resurgence of phenotypic “high-content” screening and the inherently multifaceted nonlinear relationships between the molecular architecture of a drug and the observed phenotypic effects of an effector molecule in sophisticated cellular assays and in vivo.14−16 Undoubtedly, the traditional rational drug design concept requires an overhaul. The ability to release old beliefs and traditions and consequently view transdisciplinary research with an open mind and constructive attitude toward scientific viewpoints as well as an openness toward new ideas may therefore be the best option we have for drug design that will be sustainable and successful in the future.17 There is a large body of scientific literature regarding the modeling of complex systems, and adopting some of these theories may be wise for drug discovery; however, the multifaceted nature of the underlying problem, including potency optimization, multitarget engagement and the resulting polypharmacology, pharmacokinetic liabilities, metabolism and toxicity, genomic variability, epigenetic disposition, and so on, must be considered. To this end, computer-assisted molecular design approaches employing pattern recognition methods, among other techniques, have a long-standing history and offer a design concept that can help propel the discovery of useful new chemical entities (NCEs) as tools and leads for chemical biology and medicinal chemistry.18−20 For example, a recent publication by Bayer Healthcare revealed that computerassisted drug design technologies decisively contributed to the discovery and optimization of at least half of their NCEs that were undergoing phase I clinical trials in 2015.21 Similarly, Roche emphasized their positive experience with computational methods in the context of the continuously increasing data availability for drug design and optimization.22 Where do new ideas for a new drug molecule come from? A project team of medicinal chemists learns from the data obtained during compound activity determination, often starting from a large-scale screening campaign, and modifies chemical structures accordingly for iterative property optimization. This behavior is called “active learning”.23 Figure 1 showcases a de novo design project that started from a template (1), and via hit generation (2, 3) by computational scaffoldhopping, medicinal chemistry optimization led to a clinical candidate (4).24 This project, like the large majority of published examples, relied on automated in silico design only during the first step of the innovation pipeline without performing explicit active machine learning. The most advanced de novo design tools mimic full active learning behavior by allowing the underlying mathematical model for hypothesis generation to adapt to the particular states of a discovery project. This means that such a model will be flexible and able to adjust parameters so that it can improve with each synthesize-and-test cycle. Currently, there are few prospective applications of active learning in computational drug discovery, but the awareness of adaptive modeling is increasing and we expect more widespread use of this technique in the future.25
Figure 1. Molecular design cycle. (a) Drug design means interplay between inductive modeling (formulation of a hypothesis based on observations and patterns) and deductive reasoning (testing new compounds that were generated from a model or hypothesis). (b) De novo design and optimization of cannabinoid-1 receptor (CB1) inverse agonists. The computer-generated structures 2 and 3 provided the scaffolds for the synthesis of focused compound libraries.
Generating better multidimensional design hypotheses will be a key to sustained success.26,27 In the following discussion, we will examine selected computational strategies to reach this goal. De novo design faces three challenges: (i) structure generation, (ii) scoring, and (iii) optimization.28 The problem of innovative structure generation has generally been solved. While some of the earlier design methods constructed overoptimized, synthetically challenging molecular structures, the current modeling toolbox contains receptor- and ligandbased algorithmic approaches for combining molecular building blocks (e.g., atoms, fragments) in a chemically meaningful way to obtain structurally novel lead- and druglike virtual structures.29−31 Sophisticated methods have been conceived for fragment growing and linking using chemical transformations, including the matched molecular pair approach.32,33 Recent advances in computer science (e.g., GPU and parallel computing) have simultaneously enabled brute-force structure enumeration of many billions of virtual compound structures for the fast prioritization of candidates.5,34,35 The truly tricky part of computational molecular design no longer is the actual structure generation process but picking the right compounds from the pool of NCEs generated. Advances in computing increasingly reliable estimates of the binding free energy of a ligand−receptor complex will facilitate QSAR modeling and guide hit-to-lead progression.36−39 However, for quickly sifting through millions of virtual designer molecules, the scoring problem is as relevant as ever.40−42 Figure 2 illustrates the reaction-based molecule assembly and prioritization implemented in the DOGS (Design Of Genuine Structures) software.43 This entirely ligand-based de novo design approach comprises a set of 25 000 commercially available building blocks for rapid chemical space exploration by performing virtual organic syntheses of NCEs with up to 58 reaction schemes and variations thereof.44,45 By selecting the most promising virtual pseudo-synthesis path and repetitive comparison of the virtual intermediates with a given template, typically a known drug, the software generates NCEs that mimic the pharmacophore pattern and constitution of the template but feature different chemical scaffolds.46 This B
DOI: 10.1021/acs.jmedchem.5b01849 J. Med. Chem. XXXX, XXX, XXX−XXX
Journal of Medicinal Chemistry
Perspective
designed compounds from recent publications (5;54 6, 10;55 7;56 8;57 9;58 11;59 12;60,61 13;62 14;63 15;64 16;65 1766).
■
PLAYING DIRTY Good compounds are overlooked for various reasons. Computers can help by examining molecular features no chemist can see. Therefore, their full potential for drug discovery and design should be utilized. Identifying promising candidates (positive design) is equally important as eliminating the bad apples to avoid undesired effects (negative design) as early as possible in the drug discovery process. While medicinal chemists excel in optimizing hits to eventually become lead structures and enter clinical trials, the computer’s domain is to rapidly sift through many millions of molecules to discard the bulk before any screening assay is performed with the selected hits that remain after thorough in silico scrutiny. In fact, computationally designing target-focused compound libraries can help increase in vitro hit rates while keeping the overall cost of experiments low. Thus, screening campaigns can benefit from tailored compound pools.67,68 It has been common practice to divide the complex drug design task, including structure generation and scoring, into manageable parts and address the issue of complexity by breaking it down into the following disparate optimization problems. First, a series of molecules are synthesized and tested until the desired potency or target affinity is obtained. Then, pharmacokinetic and pharmacodynamic properties are considered, and finally a lead structure is selected based on multiple more or less stringently defined criteria. While this divide-andconquer concept has worked well many times with and without computational support, lead discovery and prioritization would certainly benefit from explicit multidimensional optimization toward several design goals in parallel instead of tackling them consecutively.67,69 The difficult lessons learned from numerous
Figure 2. Reaction-based de novo design. Iterative synthesis and testing of virtual reaction products enable molecule optimization by fragment growing. The software DOGS uses the pharmacophore and substructure similarity between the virtual reaction products and a template (reference drug) as criterion for the choice of the most appropriate reaction scheme. A new iteration starts with the selected intermediate product as input for the next round of fragment growing by pseudo-synthesis.
scaffold-hopping method has led to the discovery of NCEs with the desired biochemical and pharmacological effects.47 Moreover, the design concept is implemented in several related programs (e.g., TOPAS,48 BI Builder49) and together with other receptor- and ligand-based tools50−53 provides a broad choice of working methods for idea generation in drug discovery. Figure 3 shows selected examples of de novo
Figure 3. Examples of computationally designed de novo compounds. C
DOI: 10.1021/acs.jmedchem.5b01849 J. Med. Chem. XXXX, XXX, XXX−XXX
Journal of Medicinal Chemistry
Perspective
set of ligand scaffolds for a given target or observable activity critically depends on the permissiveness of the descriptor used to represent the molecular structure.82,83 Figure 5 presents a
failures of target-selective clinical candidates and the insights gained from systems biology made it evident that [..] the vast majority of chronic diseases are multifactorial diseases with more than one signaling pathway and more than one protein involved in the pathology.70 Recent advances in multiparameter optimization methods for de novo design have enabled the consideration of both primary (on- and off-target binding) and secondary design constraints (synthetic accessibility, ADMET properties, etc.).71 Multiobjective de novo design has the potential to play a liberating role for stalled discovery projects by systematically addressing network pharmacology,72 which for the human mind is impossible to fully comprehend given the enormous interconnectivity of molecular interaction cascades. From our own theoretical studies, approximately 11 macromolecular targets on average are expected for a given druglike compound.73 Similar investigations have reported 3− 10 targets per drug, depending on the target class and potency levels considered.74 The trouble frequently begins when the assay results arrive and the most promising reported active compounds (“hits”) must be selected. Which of them are “true positives” and merit follow-up? Which should be discarded? A gut feeling in medicinal chemistry will only take us so far. While there is often agreement between the software prediction and a chemist’s assessment of synthetic accessibility for a de novo designed compound,75 the decision is less clear for selecting the most promising candidate for hit-to-lead progression. While contemporary de novo design algorithms suggest appealing compounds in the eye of a trained medicinal chemist,76 there are reports of the same molecule serving as the chosen lead structure in different drug discovery projects with different macromolecular targets and pharmacological indications, as well as often deviating opinions and context-dependent decisions among the scientists involved.77,78 We originally introduced the term “frequent hitter” to designate compounds that turned up as hits in multiple experiments, independent of the particular assay type used (Figure 4).79 While this class of
Figure 5. Schematic relationship between chemical structure space S and biological function space F. In S, the dots represent ligands, while in F, the dots are macromolecular targets (receptors). While (Q)SAR models link structure with function, de novo design associates function with structure (“inverse QSAR”). Molecule s* is an example of a “promiscuous compound”, and f* represents a “promiscuous target”.
scheme of the relationship between chemical structure and biological function. While this picture is certainly simplistic, it may serve to pinpoint the delicate issue of target selectivity. (Q)SAR models that employ “fuzzy” nonatomistic representations such as pharmacophores or flexible shape descriptors have the ability to map different chemical structures to the same function; i.e., different molecules receive a similar predicted target activity.84−87 The opposite approach means finding ligands for a given target or solving the “inverse QSAR” problem.88−90 The latter becomes equivalent to de novo design if the mapping function is combined with a molecular structure generator. Reymond et al. identified the combinatorial explosion that occurs when trying to computationally generate all chemically feasible organic molecules exceeding 17 atoms.91 As a potential solution to the problem of finding NCEs by inverse QSAR, Funatsu and co-workers suggested an exhaustive enumeration of all molecular architectures within the applicability domain of a predictive model, thereby drastically reducing the total number of virtual molecules to be generated and scored.92,93 These and related theoretical considerations now await practical realization.
■
MULTIDIMENSIONAL DESIGN Automated molecular design enables the systematic optimization of molecular architecture toward several design goals in parallel. Several multiobjective receptor- and ligand-based approaches and general optimization tactics have been developed for this purpose, often borrowed from the field of algorithmics and artificial intelligence research.94−96 Figure 6 shows recent examples of computationally obtained molecules that possess rationally designed multitarget effects (18;97 20;30 22;98 23, 24;99 25100). When the aim is to modulate cellular processes by simultaneously blocking or activating several key nodes of a biochemical pathway, multitarget engagement may actually be a desirable compound property.101,102 In fact, multiaction drugs have been intentionally developed with this exact goal in mind, e.g., central nervous system agents acting as nonselective GPCR modulators but with a clearly defined target profile.103,104 A prominent success story is the development of the dual-action drug tapentadol for the treatment of neuropathic pain (Figure 7).105 This compound, which received FDA
Figure 4. Measured “activity” of test compounds may have various causes. While promiscuous target engagement might be a coveted feature, it is desirable to identify false positives early in the drug discovery process to prevent hit-to-lead progression in medicinal chemistry programs.
potentially reactive, poorly soluble, and aggregation-prone compounds should be avoided (unless aiming for covalent target binding, for example),80 multitarget engagement by leadlike molecules and properly decorated “privileged” scaffolds can be valuable.81 Computational molecular design offers innovative chemical scaffolds and pharmacophore constellations that may serve as design templates for these “promiscuous” target engagements. The ability to find a diverse D
DOI: 10.1021/acs.jmedchem.5b01849 J. Med. Chem. XXXX, XXX, XXX−XXX
Journal of Medicinal Chemistry
Perspective
Figure 6. Examples of a compound designed as a promiscuous ligand (18) and compounds that were generated by multiparameter de novo design strategies (20, 22−25). Dashed boxes in the structures of compounds 20 and 22 indicate the computed structural modifications.
pockets may accept similar ligands.109,110 Several methods have already been developed for prospective pocket deorphaning, and we can expect full-fledged expansion of this idea to the de novo design of ligands in situ in the foreseeable future.111,112 In a preliminary ligand-centric study we wanted to discern whether “fuzzy” molecular representations could also be useful for identifying promiscuous ligands. We compiled two sets of frequent hitters: (i) potentially undesirable compounds that receive at least three substructure alerts by the flag lists of Rishton,113 Hann et al.,114 and the PAINS lists A and B115 and (ii) ligands without substructure alerts but experimentally confirmed interaction (KD, Ki, KB, IC/EC50 values of