Multistep Reaction Based De Novo Drug Design: Generating

Mar 31, 2016 - We describe a “multistep reaction driven” evolutionary algorithm approach to de novo molecular design. Structures generated by the ...
1 downloads 8 Views 5MB Size
Article pubs.acs.org/jcim

Multistep Reaction Based De Novo Drug Design: Generating Synthetically Feasible Design Ideas Brian B. Masek,* David S. Baker, Roman J. Dorfman, Karen DuBrucq, Victoria C. Francis,† Stephan Nagy, Bree L. Richey,‡ and Farhad Soltanshahi Certara, 210 N. Tucker Blvd, Suite 350, Saint Louis, Missouri 63101, United States S Supporting Information *

ABSTRACT: We describe a “multistep reaction driven” evolutionary algorithm approach to de novo molecular design. Structures generated by the approach include a proposed synthesis path intended to aid the chemist in assessing the synthetic feasibility of the ideas that are generated. The methodology is independent of how the design ideas are scored, allowing multicriteria drug design to address multiple issues including activity at one or more pharmacological targets, selectivity, physical and ADME properties, and off target liabilities; the methods are compatible with common computer-aided drug discovery “scoring” methodologies such as 2D- and 3D-ligand similarity, docking, desirability functions based on physiochemical properties, and/or predictions from 2D/3D QSAR or machine learning models and combinations thereof to be used to guide design. We have performed experiments to assess the extent to which known drug space can be covered by our approach. Using a library of 88 generic reactions and a database of ∼20 000 reactants, we find that our methods can identify “close” analogs for ∼50% of the known small molecule drugs with molecular weight less than 300. To assess the quality of the in silico generated synthetic pathways, synthesis chemists were asked to rate the viability of synthesis pathways: both “real” and in silico generated. In silico reaction schemes generated by our methods were rated as very plausible with scores similar to known literature synthesis schemes.



INTRODUCTION Successful drug discovery requires the consideration of many parameters that will ultimately impact success in the clinic. For example, a drug candidate will need to demonstrate sufficient potency against pharmacological target(s) relevant to the disease state and have selectivity against off targets consistent with a reasonable safety profile, ADME properties consistent with adequate exposure at the target tissue, physical properties conducive to good ADME and formulation, and sufficient novelty to ensure IP rights for commercial success, to name just a few of the challenges faced. Drug discovery’s challenge is to find compounds that balance the various SAR’s and meet all of the various requirements of a successful therapeutic agent. De novo design is increasingly being recognized as an effective way to identify drug design ideas that balance multiple parameters.1−6 To highlight a few examples: Hasegawa et al. described the design of compounds selective for the D2 receptor using de novo design methods coupled to machine learning models for D2, D3, and D4.7 Hopkins et al. described successful drug designs that targeted multiple pharmacological end points (including avoidance of antitarget activity) and simultaneously considered a number of ADME end points.8 The recent review by Segall1 is comprehensive, and the reader is directed to this excellent review for further reading. Synthesis of drug candidates is clearly a critical requirement for discovery; in silico (or human) generated design ideas must be translated into real compounds that can be made and tested. © 2016 American Chemical Society

In order for multicriteria drug design to reach its full potential, the synthetic feasibility of the design ideas needs to be addressed. Early de novo design methods focused on structure based design and optimizing ligand−protein interactions and tended to generate design ideas that were synthetically inaccessible. Addressing synthetic feasibility of de novo design methods has received increased attention and significant progress is being made. Two approaches have emerged: one involves scoring the synthetic feasibility9−13 of the chemical structures generated by de novo design in order to direct the results toward compounds that are predicted to be synthesizable; a second approach involves developing new methods for constructing chemical structures in silico such that the transformations correspond to those likely to be accessible in the lab. Reaction-driven de novo design14 is used to describe methods where the transformations used correspond to reactions that could be performed in the lab. This reactiondriven approach has the advantage of providing a proposed synthesis path for each idea to aid in the assessment of the quality of the design ideas generated. However, the two approaches are not mutually exclusive and could be used in combination. The size of medicinally relevant chemistry space that needs to be explored by de novo design methods is huge. Even if the Received: November 20, 2015 Published: March 31, 2016 605

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

reaction scheme as in library design). The method should allow a chemist to provide a scaffold and direct the elaboration and design of one or more R-groups within the general constraints of synthetic feasibility without limitation to a particular synthetic path. Further, we hoped the same multistep reaction based de novo methodology would allow scaffold hopping, enabling the construction of scaffolds linking multiple “preserved” R-groups. In this scenario, the chemist would provide a list of reagents containing the preserved R-groups and the method would suggest products which met the design objectives, including the synthesis of the scaffold and the linking of the R-group reagents to the scaffold. Overview of Multistep Reaction Based De Novo Design Process. De novo design is the application of an evolutionary algorithm19 for global optimization applied to the optimization of chemical structures. Figure 1 gives an overview

chemistry space is limited or focused to a synthetically accessible chemistry universe, if one considers a library of ∼80 generic bimolecular reactions, where 1000 reactants of each class are available, the size of the chemistry space generated by all possible reaction schemes up to five synthesis steps is on the order of 1000 × (80 × 1000)5 ≈ 3 × 1027. A number of methods for exploring synthetically accessible chemistry space have been reported. AllChem15 was capable of exhaustively searching very large virtual synthetic chemistry spaces but relied on a piecewise topomer similarity approach to make searching possible. A large number of hits were typically generated and required other criteria to be applied to filter down or prioritize the results and hence post processing could involve large computational expenses. A number of reaction driven de novo design methods have been reported including SYNOPSIS,16 DOGS,17 and Reaction Vectors.18 These approaches iteratively perform transformations corresponding to a single reaction step. To manage the potential combinatorial explosion of structures generated, the intermediate products are typically scored and a selection or filtering step is applied before the next reaction transformation is applied. Our concern with this “single reaction step” approach is that the synthetic intermediates generated in the synthesis of a drug, either in silico or in the lab, often do not share the properties or characteristics of the drug itself. The intermediate products may have reactive or potentially toxic groups, will often have physical properties quite different from the final (drug) product, and will seldom have the same biological activity. If the properties of the intermediates are only loosely correlated with the properties of the final product of a synthesis, we hypothesize that this approach may lead to premature or unproductive filtering of intermediates that could go on to yield very promising drug candidates. To address this concern, we have developed a full multistep approach to reaction based de novo design. In this paper, we will present novel methods for multistep reaction based de novo design. We then discuss experiments to assess the extent to which small molecule drug space can be covered by our approach. In addition, we assessed the quality of the in silico generated synthetic pathways by comparing the viability of synthesis pathways, both “real” and in silico generated, as judged by synthetic chemists.

Figure 1. Overview of the evolutionary de novo design process.

of the evolutionary process in general terms. The process is stochastic; for each cycle of the de novo design, each “parent” undergoes a randomly selected transformation to generate a population of “children.” The children are then evaluated or scored according to some scoring scheme. Based on the scores, a selection of a new generation of parents is made from the combined population of parents and children. This new generation of parents is fed into the next cycle which is repeated until a termination criterion is met. Through this process, the chemical structures evolve to optimize the fitness or scoring scheme. As applied to multistep reaction based de novo design, an analogy can be drawn to biology. A multistep reaction scheme can be thought of as a gene which can be transcribed to a chemical structure corresponding to the product resulting from the reaction scheme. The resulting product or chemical structure in turn has observable pharmacological, physicochemical, ADME, and safety properties which together constitute the observable key characteristics of a drug, analogous to a biological phenotype. Predictive models transcribe or predict the “drug phenotype” from the chemical structure. In this way, the evolutionary principles of biology are applied to the optimization of the profile of key drug design properties (“the observable or predicted phenotype”) through modification of a population of individuals, each defined by a multistep reaction scheme genome. In the sections below, we address the various elements of the evolutionary process as applied to multistep reaction de novo design.



METHODS We had multiple goals which shaped our method development efforts. We sought to develop a true, multistep approach to reaction based de novo design that would be capable of traversing the universe of synthetically accessible chemistry to generate design ideas that fulfill multiple design criteria based strictly on the desirability of the final products of the reaction pathway. As with previous reaction based de novo design methods, we wanted to keep the approach generic and independent of the methods used to score the ideas generated. The methodology should allow any valid computer-aided drug discovery (CADD) modeling approach or a combination of multiple approaches to be used to score design ideas. As with previous methods, we saw great value in providing a proposed synthesis pathway for the designed structures to inform decision making on synthetic feasibility. We wanted a method that addresses the generation of whole molecules “from scratch.” In addition, we also wanted to address the optimization of R-groups attached to a scaffold in a synthetically general way (not constrained to a particular 606

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling Initial Population. For multistep reaction based de novo design, the initial population of genes corresponds to a population of reaction schemes. In our current implementation, we have chosen the simplest version of a reaction scheme as the initial population; we start with a population of reactants or starting materials. These starting materials can be randomly selected from a database, or can be provided by the user in order to bias or focus the chemistry. These starting materials could be simple building blocks pulled from a supplier catalog or elaborate intermediates that are the result of custom synthesis depending on the nature of the design problem at hand. The only requirement is that the starting materials must be capable of undergoing one or more of the available in silico reactions. If starting materials are not explicitly provided, the method automatically makes a random selection of starting materials from the reactant database (described below). Multistep Synthesis Operators. To enable true multistep reaction based de novo design, one of more methods for transforming a current multistep reaction scheme (parent) to generate a new reaction scheme (child) is required. For each cycle of the de novo design, each “parent” reaction scheme undergoes a randomly selected transformation to generate a population of “children.” We will refer to the multistep reaction scheme transformations as “operators” which modify the reaction schemes which are stored by the method. The new reaction scheme that results can then be transcribed to generate the new product (chemical compound). We used the previously reported gensyn reaction engine15 for storing and transcribing reaction schemes. From our previous work on “non-reaction based” chemical change operators for de novo design, we found several principles to be very useful. First, each operation should have an “inverse” operation which is capable of undoing it. For example, if there is an operator for adding one or more reaction steps to a synthetic scheme, there needs to be a mechanism for removing one or more reaction step(s). This provides a way to remove nonproductive modifications to a synthesis scheme (and the resulting product). This also ensures that reaction schemes can grow or shrink depending on the feedback from scoring. Second, we have found that it is useful to have a diverse set of operators, including different operators that can make small or large scale changes to the chemical structure. Operators that make large scale changes are useful for efficiently exploring the vast chemistry space. Operators that make small scale changes are useful for efficiently optimizing a structure within the local area of chemistry space. We have developed a set of multistep reaction based operators which we describe below. We have performed a set of experiments to gauge the productivity of these operators in exploring synthetic chemistry space and these experiments have allowed us to refine the operators and optimize their utility. Build Operator. Adds “M” synthetic steps. When the Build operator is selected, follow the steps below:

a. Randomly select a reaction that is compatible with the parent (or current intermediate) structure. If there is no available reaction site, the operator terminates and another operator is randomly selected. b. Randomly select a reactant compatible with the reaction selected in step 3.a c. Add the reaction to the reaction scheme and generate the structure of the (intermediate or final) product. Demolish Operator. Removes “M” synthetic steps. When the Demolish operator is selected, follow the steps below: 1. If the parent reaction scheme has no reaction steps (is a starting material), the operator terminates. Another operator is randomly selected. 2. Randomly select the number of steps to be removed from the synthesis pathway up to a maximum that is the current number of steps in the reaction scheme. 3. Remove the selected number of synthesis steps from the end of the synthesis scheme and generate the structure of the product of the remaining steps. Removing all steps would regenerate the starting material. The concept of Demolish was to create an operator that was the inverse of the Build operator. However, we found that the Demolish operator very often generated an intermediate product that had already been encountered in the course of the de novo design, returning again and again to the same intermediates. This is inefficient. We realized a more efficient set of operators would combine Demolish and Build capabilities. The Change Starting Material, Change Reaction, and Change Reactant operators listed below achieve this goal. Therefore, in practice we do not find the Demolish operator, by itself, to be very useful and we typically set its probability to zero so that it is never used. Add a Single Reaction Operator. This operator is identical to the Build operator above but only adds a single reaction step. We found we needed to enhance the probability of making smaller scale changes to the reaction scheme to improve the efficiency of local optimizations. We created this operator as a means of separately controlling the probability of making a single step addition to the reaction scheme. Change Starting Material Operator. Randomly select a new starting material and then apply the Build operator. As a starting material chosen for the initial population may not lead to interesting products, this operator provides a way to “start over” and replace a starting material with a new starting material. Change a Reaction Operator. When the “Change a Reaction” operator is selected, follow the steps below: 1. The step which is to be replaced in the reaction scheme is randomly selected. 2. All steps forward from and including the selected step are demolished. 3. The last remaining intermediate is used as the starting point for a Build operation. There is a debate over how many steps the Build operation should be allowed to add. One option is that the number of steps added should not exceed the number of steps in the original (parent) reaction scheme. In this case, the child synthetic scheme will have the same or fewer reaction steps as the parent. We chose this option.

1. If the parent reaction scheme is already at the maximum number of reactions steps permitted by the user, the operator terminates. The roulette wheel is used to select another operator. 2. Randomly select the number of steps to be added such that the maximum number of steps permitted by the user is not exceeded. 3. For each step to be added 607

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling Change the Last Reaction Operator. This operator is similar to the Change a Reaction operator above but limited to changing the last step of a synthetic scheme. We found we needed to enhance the probability of making smaller scale changes to the reaction scheme to improve the efficiency of local optimizations. Therefore, we created this operator as a means of separately controlling the probability of changing the last reaction step. Change a Reactant Operator. When the “Change a Reactant” operator is selected, follow the steps below: 1. The step in which to replace a reactant in the reaction scheme is randomly selected. 2. All steps forward from and including the selected step are demolished. However, the type of reaction performed in the selected step is preserved. A new reactant that is complementary to the selected step is randomly selected. 3. A Build operation is initiated with the total number of reaction steps limited to the number of steps in the original parent reaction scheme. Change Reactant for Last Step Operator. This operator is similar to the Change a Reactant operator above but limited to changing the particular reactant added in the last step of a synthetic scheme. Again, we found we needed to enhance the probability of making smaller scale changes to the reaction scheme to improve the efficiency of local optimizations. Therefore, we created this operator as a means of separately controlling the probability of changing the reactant use in the last reaction step, and thereby making relatively minor changes to the resulting product. Crossover Operator. A crossover operation is a process for taking more than one parent solutions and generating children from them.20 The concept of convergent synthesis21 pathways is well established in synthetic chemistry. Applying these concepts to multistep reaction based de novo design, we envisioned reacting two parent structures to form a new product. The idea is that we may have, in the course of the design, synthesized two parts of drug molecule and all we need is to couple the two pieces to form the full drug molecule. Our implementation of a crossover operator is as follows: 1. All parents are used as reactants to be combined with each other in all possible combinations. All possible children that can be generated from the reaction of two parents are generated. 2. From this pool, we randomly select enough children to fill the generation, subject to the constraint that the number of steps in the reaction scheme does not exceed the maximum specified by the user. In this way, two reaction schemes (parents) are combined to generate a new and convergent reaction scheme (child). Our implementation of crossover is exhaustive in that all possible products from the combination of parents are generated. This operator differs from the other operators in that an entire generation of children is produced when this operator is selected. We chose this approach as it was programmatically efficient based on our existing reaction “engine.” As suggested by a referee, an alternative crossover operator that would select two parents at random and attempt to couple them to form a single child could also be envisioned. We did not pursue this second approach to crossover, but believe it could be worth exploring. Operator Probabilities: Process for Optimizing Operator Usage. At each step of the de novo design process, each

parent reaction scheme undergoes a randomly chosen transformation according to one of the operators discussed above. A question naturally arises, what probability should be given to choosing a particular operator? We took an empirical approach to address this question. We performed a set of studies, described in the Results and Discussion section, to empirically determine how the probabilities affect the de novo process and determine a set of probabilities that optimizes the efficiency of the de novo process. Reaction Library. The reaction library we used is focused on general reactions with high reliability and reasonable yield and involving a versatile and well-defined range of reactants. The reaction library we used consists of 88 generic reactions (see Supporting Information for detailed description). In developing this reactant library, we drew heavily from the work by Hartenfeller et al., who developed a library of 58 reactions for use in de novo design.22 These reactions were derived from an analysis of the types of chemical reactions most commonly used by industrial medicinal chemists in the synthesis of drug candidates.23 We incorporated these 58 reactions in the reaction library we used. We augmented this with an additional 30 reactions from ref 23 we found necessary to duplicate know drugs as we performed validation studies. A list of the reactions employed in this work is given as Supporting Information. Reactions were encoded in a previously described ASCII file format using SLN substructural query patterns.15 As with the work of Hartenfeller,22 we did not attempt to address protection/deprotection chemistry in our reaction library. As they noted, incorporation of protection group chemistry would likely complicate the de novo process because of the considerable number of additional reaction steps which do not affect the final generated molecule. In addition, we performed an informal survey of synthetic chemists where we assessed whether including protection/deprotection chemistry in the reaction schemes increased their ability to assess the quality of the scheme. The results indicated that protection/ deprotection chemistry was “implicitly obvious” for most chemists and including it was not an aid to their decision making. Reactant Database. For all of the studies discussed in the Results and Discussion section, we employed a reactant data set of ∼20 000 structures which were selected from the PubChem database.24 Reactants were first selected based on their ability to be used in at least one of the defined reactions (above). This list was then filtered based on number of heavy atoms less than 15. When possible, ∼500 reactants where selected for each defined reaction. If more than 500 reactants where available for a reaction, reactant selection was biased toward smaller more diverse reactants.25 This set of reactants is intended to be used to as a reasonable starting point for de novo design where no custom reactant database is available. Scoring. At each step of the de novo design process, a new generation of “children” are produced which must be evaluated according to some scoring or fitness criteria. The operators used to generate multistep synthetic schemes and transcribe them to products are independent of the methods for scoring the resulting structures. Therefore, any valid CADD modeling approach or a combination of multiple approaches could be used to score the design ideas. In practice, this is achieved through the architecture of the software; at the point in each de novo cycle when the structures are to be scored, the generation of children structures is written out to a file in a standard 608

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Modified Fitness Selection. Selection is based on a modified fitness proportionate selection algorithm.33 The method we used is a hybrid mix of truncation selection and fitness proportionate selection. The top 20% of compounds were selected based on score and the remainder of the generation is then selected based on fitness proportionate selection where the probability of selection of any individual is proportionate to the fitness of that individual relative to the total fitness of the population. The probability of the ith compound being selected in the fitness selection process is given by

molecular format (SMILES, SLN, SYBYL-X.mol2, or SYMYX.sdf format), and a user specified process for scoring is forked. That process is monitored for completion at which point the software looks for an ASCII file containing the resulting scores and property data. This data is read in, and the de novo design proceeds to the selection process. Multicriteria Ligand Similarity Based Scoring. We have employed a ligand based scoring protocol for some of the studies reported in the Results and Discussion section. The objective of this scoring function is to generate drug design ideas that have 3D shape similarity to a known drug or reference compound and are also likely to share similar biological activity, have desirable physicochemical properties, and are novel in 2D chemical structure. The components of our scoring function are the 3D morphological similarity as computed by Surflex-Sim,26 UNITY 2D fingerprint Tanimoto similarity,27 and desirability functions28 based on set of chemical properties (ALogP, TPSA,29 hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, molecular weight, number of Lipinski violations.) To prevent the evolutionary process from simply redesigning the reference compound being used for scoring, a score adjustment that imposes a penalty based on the UNITY 2D fingerprint similarity of the designed compound to the lead compounds being used for scoring, as previously reported by one of the authors.6 The penalty function is a linearly scaled multiplicative factor applied between similarities of 1.0, when the scaling factor is 0.0, causing an exact fingerprint match to receive a score of 0.0, to a similarity of 0.6, at which point the scaling factor reaches 1.0. This penalty has the effect of pushing designs away from the lead compound based on two-dimensional fingerprint similarity, while the Surflex-Sim similarity score rewards similarity to the lead compound based on threedimensional shape and alignment. Golden Needle Scoring. We have previously found that a combination of simple chemical properties can be used to identify a unique chemical structure or a set of closely related analogs in chemistry space.30 Therefore, we have assembled a “Golden Needle” scoring function based on hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, number of rings, number of chiral atoms, molecular weight, AlogP, MlogP, and two 2D fingerprint Tanimoto similarities (MACCS keys31 and Tripos Atom-Type Pairs32). We use a simple function, given below, to compute the score. di , j =

(1 − TMACCS)2 + (1 − TTripos)2 +

∑ all properties, P

pi = Dist_Scorei / ∑ Dist_Scorej j = 1, n

The exact formula for Dist_Score depends on whether the score is to be maximized or minimized. If the score is to be maximized, Dist_Score is given by Dist_Scorei = C(Scorei − Median(Score))/STDEV(Score)

If the score is to be minimized, Dist_Score is given by Dist_Scorei = C(Median(Score) − Scorei)/STDEV(Score)

where • Median(Score) and STDEV(Score) are the median and standard deviation of the scores of the current pool of parents and children from which the selection is taking place. • C is a constant that adjusts the selection pressure. In practice, we found a value of C = 4 seems to give a reasonable level of selection pressure. Age Based Penalty during Selection. We have observed de novo designs may become trapped in local minima. In these cases, none of the children survive and same parents are passed forward for many cycles. When this happens, a set of compounds that score well come to dominate the population, and the design becomes stagnant. We sought a mechanism that would enable the design to productively explore fresh areas of chemistry space. To this end, we introduced into the selection process a fitness penalty based on the age of a parent. We define age in terms of the number of generations, or de novo cycles, since a parent was first generated. In this way, parents have a limited lifetime in which to generate productive children. We draw an analogy to biological systems, where individuals are not immortal, and eventually die, and therefore do not perpetually reproduce or produce offspring.34,35 We believe this concept is novel in the context of genetic or evolutionary based algorithms and may be generally useful for any genetic or evolutionary algorithm or process. Periodic “Mass Extinctions”. Another method we explored as a mechanism to encourage the exploration of chemistry space. In this approach, the lowest scoring set of compounds (e.g., lowest 25% of the current generation) are replaced with a completely new set of products generated by randomly selecting new starting materials and performing Build operations on them. This mass extinction would occur every N de novo cycles. We explored mass extinctions every 20 generations. We did not find this approach to be as useful as other approaches like fitness selection and age based penalties. Termination Criteria, Population Size, etc. We chose a simple termination criteria based on the number of de novo design cycles (the number of generations). We have examined the productivity of the de novo process as a function of the

(Pi − Pj)2 max(Pi , Pj)2

Where di,j is the score computed between a de novo generated structure and the target or reference compound. We have found the “Golden Needle” scoring function to be quite useful for methodological testing, as it is a very fast method that can pinpoint a structure in chemistry space. Selection Algorithms. A number of methods for selection were explored during the course of our experiments on the efficiency of traversing known drug space. The choice of selection algorithms can have a large impact on the balance between exploring the vastness of synthetically accessible chemistry space and the ability to optimize a solution within a local area of chemistry space. Truncation Selection. This method takes the population of parents and children and selects the “N” most fit as the next generation of parents. 609

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling Scheme 1. Structures of Cymbalta, Darunavir, and Abilify Used for Validation Studies

number of de novo cycles in terms of both the ability to optimize the scoring function, and the ability to continue to productively identify new areas of chemistry space that have good scores. We find that we can typically identify interesting solutions within 50−100 generations using a population size of 100. We normally retain a history of all structures generated during the de novo design process. We find this to be useful because the most interesting structures are not necessarily limited to those that survive to the last generation. High scoring structures may be generated at any time in the course of the de novo design, but due to the stochastic nature of selection, these structures may not survive into the final generation. At the end of simulation, we typically sort the list of structures that have been generated by score and/or filter by the various properties in order to highlight the most interesting solutions. Note on Filtering out Reactive or Toxic Groups. In our earlier work on de novo design methods, we found that removing or filtering structures with undesirable, reactive, or toxicophoric groups was important in producing medicinally relevant structures.6 For reaction based de novo design, filtering intermediate products based on the presence of reactive groups would be counterproductive. It is just such groups that may be needed to complete a reaction sequence leading to a desirable final product. However, excluding such reactive groups from the final products is also valuable. In order to balance these two goals, we have divided substructural filters into two types. Substructural filters that meet the definition of a reactant for any of the reactions in the reaction library are considered as “reactive group filters.” Products (children) that contain reactive groups are allowed to remain in the breeding population and continue to evolve. These filters are applied only at the end of the de novo design, to remove structures that are reactive and would not be of interest. The remaining substructural filters operate as usual and are applied to all child structures generated by the operators above.

Figure 2. First and second “sterile” generation plotted as a function of the population size. A sterile generation is one where none of the children produced survive into the next generation. All simulations were run for 200 generations.

assess the extent to which our methods were capable of addressing the chemistry space of small molecule drugs. To assess the performance of our methodology, we require a scenario where we can be reasonably confident what “a right answer” should be. Therefore, we developed the Golden Needle validation experiment. For a Golden Needle experiment, a known drug structure is selected and used as a target or reference. We know the drug compound exists and has been successfully synthesized. We have previously reported a combination of simple chemical properties can be used to identify a unique chemical structure or a set of closely related analogs in chemistry space. In addition, we found de novo design methods (nonsynthesis driven) are capable of successfully rediscovering known drugs based on the information in a combination of simple chemical properties.30 We employ the Golden Needle scoring function based on a combination of simple chemical properties to guide the de novo process toward the target drug we have selected. Using this approach, we can assess the extent to which our new multistep reaction based de novo design methods can “rediscover” a known drug compound.



RESULTS AND DISCUSSION In this section, we first present the results of studies performed to understand the influence of various methodological and algorithmic parameters; for example, the population size, the number of generations, the relative probabilities of the various operators, the choice of selection algorithm, etc. Based on these studies, we were able to arrive at a rational choice of parameters that achieves both global exploration of synthetically accessible chemistry space and the ability to optimize a solution within a local area of chemistry space. We then performed studies to 610

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling Table 1. Unique Children Structures Generated by Operator

a

The relative proportion of unique children generated is the ratio of the number of unique, nonduplicate structures generated by each operator divided by the maximum.

Initial Validation Studies: Cymbalta. In our first set of studies, we tested whether the methods were capable of “reinventing” Cymbalta. We employed the Golden Needle scoring function using Cymbalta (Scheme 1) as the reference structure. At this early stage in the method development, we had not yet recognized the need for all of the operators, nor had we recognized the need for improved selection algorithms. The Truncation selection method was used and the operators employed for this study were: Build, Demolish, Add a Single Reaction, Change Starting Material, Change a Reaction, Change a Reactant, and Crossover. Each operator had an equal probability of being selected as we had no information at this point to guide the assignment of operator probabilities. We performed a number of simulations using population sizes of 50, 100, 200, and 300 structures. All simulations were run for 200 generations. The first observation we made was that none of the simulations successfully yielded Cymbalta. The structures that were generated bore little resemblance to Cymbalta and the scores of the best structures we identified were significantly higher than zero, the optimal score possible. In order to understand how we might improve upon this result, we performed additional analyses. From examining the progress of the molecular evolutions, it became clear the simulations quickly converge to a set of structures which were nonoptimal. At that point, the simulation became sterile; very few child structures were able to outscore the existing generation of parents, and the same set of parent structures were carried over from generation to generation. Increasing the population size did have a slight impact on extending the number of productive generations, as shown in Figure 2. However, the end results were not improved. We examined the structures of the sterile parents and the children they generated and noted that the children were often fairly dis-similar from the parent that generated them. The multistep reaction based operators had made predominantly large scale changes to the chemical structure. We hypothesized a better balance of large and small scale operators would improve the efficiency of local minimization. We therefore initiated efforts to introduce a set of “small change” operators, as mentioned in the Methods section and described below. In addition, we also hypothesized the simulations become too easily trapped in a nonoptimal set of structures. We suspected the truncation selection method was not ideal for global optimization and the results suggested the need to explore alternative selection methods that would allow new areas of chemistry space to survive and be explored.

We also assessed the ability of the operators to generate new, as yet unseen, structures. The results are summarized in Table 1. We found that, with the exception of the Demolish operator, all operators were reasonably likely to creating novel chemical products. We found the Demolish operator very often generated an intermediate product that had already been encountered in the course of the de novo design. As a result, its probability was set to zero in all further studies. From Table 1, it can be seen that the relative number of children produced by each operator changes with population size; in particular the crossover operator becomes more dominant in producing children as the population size increases. We believe this may be a result of the exhaustive nature of the current crossover operator. These results suggests an alternative crossover operator that operates on two parents to produce a single child may be less sensitive to population size with respect to the relative number of children generated per operator. We also examined the number of unique children generated by each of the operators as a function of the generation number (See Figure 3). As can be seen, the number of children produced by each individual operator varies by generation but stays within a consistent band throughout the simulation. Small Change Operators and Improving Local Optimization. In our initial study, we observed the multistep reaction based operators described above had made predominantly large scale changes to the chemical structure. We hypothesized a better balance of large and small scale operators would improve the efficiency of exploring a local region of chemistry space. We therefore introduced two small change operators: the Change the Last Reaction Operator and the Change Reactant for Last Step Operator. In order to assess the effects of these operators, a set of simulations was performed using the multicriteria ligand similarity based scoring function. Abilify (Scheme 1) was used as the template. We chose this scoring function because it is driven primarily by 3D shape and physical chemical property criteria rather than 2D structural similarity. The Golden Needle scoring function, on the other hand, is based on 2D structural similarity and biased to produce close analogs of the template. We hoped the multicriteria ligand similarity based scoring function would give a more representative indication of the balance of local vs global sampling of chemistry space. A set of six simulations were run; three simulations were run including these two new operators and three simulations were run without. As mentioned above, because the Demolish operator was found to be unproductive, it is probability was set to zero. All simulations were run for 100 generations using a 611

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Figure 3. continued

612

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Figure 3. Number of unique children generated by each operator during the course of a de novo design simulation. Simulation employed the Golden Needle scoring function with Cymbalta as the reference structure; the population size was 300 structures and simulations. The operators all had an equal probability of being selected.

Table 2. Unique Structures Generated by Operatora

a

The number given is the ratio of the number of unique, nonduplicate structures generated by each operator divided by the maximum.

Global Optimization: Effects of Selection Methods. As mentioned above, we suspected the truncation selection method was not ideal for global optimization and we explored alternative methods that would improve sampling of chemistry space. We began by comparing the Truncation and Modified Fitness selection methods. A set of nine simulations were run for each selection method. The Golden Needle scoring function was employed; three known drugs, Cymbalta, Darunavir, and Abilify (see Scheme 1) were used as targets and three simulations were performed for each drug. All simulations were run for 100 generations with a population of 100. The operators employed for this study were: Build, Demolish, Add a Single Reaction, Change Starting Material, Change a Reaction, Change the Last Reaction, Change a Reactant, Change the Last Reactant, and Crossover. Operator probabilities were as given in Table 3. An analysis of the distribution of scores generated by each method is given in Figure 5. Our overall conclusion from this plot was that, for these simulations, there is no statistically significant difference in the two selection methods with regards to optimizing the score; the scores of the best scoring ideas

population size of 100; the Truncation selection method was used. As in the first validation study, we assessed the ability of the operators to generate new, as yet unseen, structures. The results are summarized in Table 2. As can be seen, the new operators were reasonably effective at creating novel chemical products. Visual comparison of structure similarity maps36 (Figure 4) of the structures generated with and without small change operators showed an enrichment of structures similar to high scoring solutions in the simulations with the new operators; demonstrating an increased sampling in the chemistry space near high scoring structures as intended. The scores of the best scoring structures identified by simulations with (average score = 12.94 ± 0.07) and without (average score = 12.83 ± 0.55) did not differ significantly. For comparison, Abilify generates a score of 13.84, the maximum possible score using this scoring function. From these results, we conclude that the small change operators are useful in enhancing sampling in the local chemistry space around a solution, but did not have an impact on global searching/optimization. 613

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Figure 4. Structure similarity maps of de novo generated structures with and without small change operators. Three simulations included small change operators (right) and three simulation did not (left). Each point represents a compound from the de novo design. Similar structures (based on Unity 2D fingerprints) will be near each in the map. Points have been colored from dark (best score) to light (low score) with a color gradient. The same mapping space was used to generate the plots for all six similarity maps.

In Figure 6, the number of children from each generation that survived into the next generation is plotted as a function of the generation number. Here, a comparison clearly shows a difference between the two selection methods; simulations using the Truncation selection method quickly become stagnant; few new children are able to survive after the first ∼30 generations. In contrast, simulations using Modified Fitness based selection continued to allow new children to be added to the population throughout the life of the simulation. As seen in Figure 6, ∼30% of the population is replaced with new children each generation. We examined the longer simulations (1000 generations) using the Modified Fitness selection method and observed no decline in the number of surviving children as a function of the generation (data not shown). These results suggested that simulations based on the Modified Fitness selection method allow a larger number of children to survive, providing an opportunity for more regions of chemistry space to be explored. However, for simulations of the size we report, this does not lead to solutions with better scores; both selection methods performed similarly in this regard. Comments on the Crossover Operator. In the course of the validation studies above, we made the empirical observation

Table 3. Listing of the Multistep Reaction Based Operators and Their Probabilities operator

probability

Build Add a Single Reaction Change Starting Material Change a Reaction Change the Last Reaction Step Change a Reactant Change Reactant for Last Step Crossover Demolish

13.6% 13.6% 18.1% 18.1% 18.1% 9.1% 9.1% 0.2% 0.0%

were generally similar. We examined longer simulations (1000 generations with a population of 100) using Abilify as the target to see if longer simulations might lead to differences in the top scores achieved. Five simulations were performed for each selection method; the average score of the top ranked structure using Modified Fitness selection (0.58 ± 0.10) was statistically indistinguishable from that obtained using Truncation selection (0.56 ± 0.08). We concluded that both methods generate solutions of similar quality as judged by the score. 614

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Figure 5. Distribution of the Golden Needle scores of the 10 best solutions found in simulations based on Abilify, Cymbalta, and Darunavir. Three simulations were performed with Modified Fitness selection, and three simulations were performed with Truncation selection.

Figure 6. Comparison of the number of surviving children for truncation selection method (left side) vs modified fitness selection method (right side).

615

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

Macrocycles were excluded as we first wished to assess how the reaction based methods performed on more traditional small molecule drugs, which we expect to be less complex synthetically. For each of the 795 approved drugs which passed these filters, a set of three de novo design simulations (Population = 100 × 100 Generations) were performed using the Golden Needle scoring function with the drug as the target or reference compound. Operator probabilities were as listed in Table 3. The modified fitness based selection algorithm was used with an age based penalty. A history of all structures generated during the de novo design process was retained and we examined these results to assess whether we had successfully rediscovered the known drug or a close analog. The results are shown in Figure 7. The results suggest a number of conclusions. First, to find a single unique chemical structure in the vastness of synthetically accessible chemistry space remains a challenging task: we were able to exactly reproduce 2.3% (18 of 795) approved drugs. For comparison, Hartenfeller et al.37 was able to recover 14% (96 of 871) of approved drugs38 extracted from DrugBank; a data set similar to the data set we used above. Their study was based on an exhaustive reconstruction routine, using a library of 58 generic reactions and a database of ∼26 000 reactants. As the reaction library we used in our de novo studies included the same 58 generic reactions, we hypothesize the main difference in the results is due to the more exhaustive nature of the search process Hartenfeller et al. used; the search process was specifically directed to reconstructing known compounds, although we cannot rule out differences in the reactant libraries being a factor. For idea generation, finding a close analog will often be sufficient. We were encouraged that within a similarity of 75− 80%, our de novo simulations were able to successfully identify analogs much more frequently. A Tanimoto similarity based on UNITY 2D fingerprints of 85% has been shown to be biologically significant39,40 and enrichment of actives has been found at similarity thresholds below the traditional 85% cutoff.41 These results suggest that, using a set of 88 generic reactions and a database of ∼20 000 reactants, the multistep reaction based de novo methodology is able to successfully (re)cover biologically meaningful analogs for approximately 50% of approved drug chemistry space. Computational Expense. The above simulations were run on the Amazon cloud using m4.4xlarge instances (16 core Intel Xeon processors) running under CentOS 6 and runtime statistics for each simulation were logged. The total CPU time reported includes both the de novo method and the Golden Needle scoring and varied between 17.8 and 75.0 s per simulation, with an average CPU time consumed of 30.8 s per simulation. Simulations based on other, more computationally intensive scoring methods will be dominated by the time required for the scoring the invented compounds. Are the in Silico Reaction Schemes Useful? A small, blinded survey was performed to assess the viability of the synthetic chemistry schemes generated by our methods. The survey consisted of 18 synthesis pathways: 8 of the synthesis pathways were taken from the literature synthesis of known drugs; 10 of the synthesis pathways were pathways generated in silico taken from the results of several de novo design simulations. Three practicing industrial synthetic chemists were asked to rate each of the 18 synthesis pathways on a scale of 1−10. A rating of 10 indicated the chemist felt the reaction scheme would likely work in the lab with little/no

Figure 7. Approved small molecule drugs rediscovered by de novo design.

Figure 8. Comparing the viability of virtual and literature synthesis pathways.

that very few children of the crossover operator survived to the next generation. The low survival rate for children of the crossover operator becomes evident in simulations using the modified fitness selection method and can be seen in Figure 6 (right side); negative peaks in the plots of surviving children as a function of generation indicate a generation where no children are selected to survive into the next generation. These negative peaks correspond to generations where the population of children was generated entirely by the crossover operation. While the modified fitness selection algorithm does give even poor solutions some small probability of surviving, we observe these children are seldom selected. We observed similar behavior with both the Golden Needle and the Multicriteria Ligand Similarity Based scoring functions. Because of the generally low probability of survival of children from the Crossover operator, we typically set the probability of using the Crossover operator very low, relative to other operators (see Table 3.) How Well Is the Chemistry Space of Known Drugs Covered? To assess the extent to which our methods were capable of addressing the chemistry space of small molecule drugs, we looked at the ability of our methods to “rediscover” approved drugs as extracted from DrugBank. To focus on small molecule drugs, we filtered the approved drug set as follows: 30 < MW < 300; contains only C, H, N, O, S, F, Cl, Br, I, or B; no more than 3 aromatic rings; and max ring size less than 8. 616

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling Scheme 2. Synthesis Pathways for Darunavir

617

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling

general constraints of synthetic feasibility without limitation to a particular synthetic path. Further, we hoped the same multistep reaction based de novo methodology would allow scaffold hopping, enabling the construction of scaffolds linking multiple “preserved” R groups. We have already extended the methodology reported in this paper to both of these areas and are working to validate the utility of this approach. If a set of leads are known, it is possible to bias the set of reactants and reactions used for de novo design to focus on chemistry that leads to structures similar to the lead structures. We have implemented a methodology, similar to that reported by Hartenfeller,37 for biasing the selection of reactions and reactants based on one or more “interesting” structures the user provides. Methodology and results will be presented in a future publication. Other topics we believe will be important for future development of reaction driven de novo design include the treatment of stereochemistry and tautomerism. Our current ̈ and the synthesis engine, gensyn,15 is stereochemically naive products it produces are stereochemically ambiguous and contain no stereochemical information. Enhancements to the reaction engine to allow appropriate treatment of stereochemistry will be important as the field continues to develop. We envision several potential alternatives to address tautomerization; integration of tautomer generation engines44 to generating plausible tautomers prior to scoring would be one approach. Alternatively, the reaction library could be augmented with common tautomeric transformations so that tautomer reactions occur as part of the overall evolution of the reaction schemes. Both stereochemistry and tautomerism will be important areas for future research.

modification; a rating of 1 indicated the chemist felt the reaction scheme was highly implausible and could not be modified to work in the lab. The results are summarized in Figure 8. The average score of the virtual synthesis pathways was 7.5 indicating the chemists viewed these schemes as highly plausible. This average score was slightly lower than the average score for the literature pathways (8.2). Given the subjective nature of the scoring system, it is not surprising that the standard deviation of average scores (virtual or literature) was 1.6 units. From this survey, we conclude the viability of in silico reaction schemes generated by our methods, as rated by synthetic chemists, were very plausible and similar to known literature synthesis schemes. We have also analyzed the synthetic pathway generated from the Golden Needle simulations of Darunavir; one of the approved drugs we were able to exactly reproduce. The synthesis pathway generated by the de novo design is shown in Scheme 2 along with two actual synthetic routes; the original synthesis reported by Arun42 and a later patent43 for an improved manufacture route. The route generated in silico is similar to the route reported in the patent; the final step is the same in both cases. However, the in silico generated synthesis eliminates the second step of the route from the patent and instead uses a starting material where the isopropyl amine functionality has already been incorporated, eliminating the need for this coupling. The in silico synthesis potentially suffers from a regiochemistry problem in the first step, where the primary amine must compete with the secondary amine for reaction with the hexahydrofurofuran reactant. For Darunavir, we would suggest that while the in silico generated synthesis scheme does not replicate a known synthesis; it does present a useful starting point for consideration of the synthetic accessibility of the design idea.





ASSOCIATED CONTENT

S Supporting Information *

CONCLUSIONS A “multistep reaction driven” evolutionary algorithm approach to de novo molecular design has been explored. We found it was critical to “tune” the method to balance between exploring the vastness of synthetically accessible chemistry space and the ability to optimize a solution within a local area of chemistry space. Two aspects of the method were found to be particularly important; the choice of the selection method and including operators that make both small and large scale changes to the synthesis pathways and resulting products. To assess the extent to which the methods were capable of addressing the chemistry space of small molecule drugs, we looked at the ability of our methods to rediscover approved drugs. We find that our methods can successfully (re)cover biologically meaningful analogs for approximately 50% of approved drug chemistry space using a modest library of reactions and reactants. Design ideas generated by the approach include a proposed synthesis path intended to aid the chemist in assessing the synthetic feasibility of the ideas that are generated. For one case, Darunavir, the in silico generated synthesis scheme is similar to a known synthesis and serves as a useful starting point for consideration of the synthetic accessibility of the design idea. The viability of in silico reaction schemes generated by our methods, as rated by synthetic chemists, were very plausible and similar to known literature synthesis schemes. Future Directions. As mentioned in the Introduction, one of our goals was to develop a multistep reaction based de novo approach to the design or elaboration of R groups within the

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.5b00697. Reactions defined in the reaction library used in this work (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel.: +1-314-951-3409. Present Addresses †

V.C.F.: Oakland University William Beaumont School of Medicine, Rochester, MI. ‡ B.L.R.: Eli Lilly and Company, Indianapolis, IN. Notes

The authors declare the following competing financial interest(s): This work was fully funded by Certara, Inc.



ACKNOWLEDGMENTS The authors would like to acknowledge Lei Wang, Alex Steudle, and Bernd Wendt for their helpful discussions in the preparation of this manuscript.



ABBREVIATIONS ADME, absorption, distribution, metabolism, and excretion; CADD, computer-aided drug discovery; QSAR, quantitative structure−activity relationship; SAR, structure−activity relationship; TPSA, topological polar surface area 618

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling



(22) Hartenfeller, M.; Eberle, M.; Meier, P.; Nieto-Oberhuber, C.; Altmann, K. H.; Schneider, G.; Jacoby, E.; Renner, S. A Collection of Robust Organic Synthesis Reactions for In Silico Molecule Design. J. Chem. Inf. Model. 2011, 51, 3093−3098. (23) Roughley, S. D.; Jordan, A. M. The Medicinal Chemist’s Toolbox: An Analysis of Reactions used in the Pursuit of Drug Candidates. J. Med. Chem. 2011, 54, 3451−3479. (24) Bolton, E.; Wang, Y.; Thiessen, P. A.; Bryant, S. H. PubChem: Integrated Platform of Small Molecules and Biological Activities. In Annual Reports in Computational Chemistry; American Chemical Society: Washington, DC, 2008, Vol. 4, Chapter 12. (25) The Selector module of SYBYL-X was used for diversity selection. SYBYL-X is available from Certara L.P., Princeton, NJ, USA. (26) Jain, A. N. Morphological Similarity: A 3D Molecular Similarity Method Correlated with Protein-Ligand Recognition. J. Comput.-Aided Mol. Des. 2000, 14, 199−213. (27) Clark, R. D. Relative and Absolute Diversity Analysis of Combinatorial Libraries. In Combinatorial Library Design and Evaluation; Ghose, A. K., Viswanadhan, V. N.; Marcel Dekker: New York, 2001; pp 337−362. (28) See, for example: Wager, T. T.; Hou, X.; Verhoest, P. R.; Villalobos, A. Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach to Enable Alignment of Druglike Properties. ACS Chem. Neurosci. 2010, 1, 435−449. (29) Ertl, P.; Rohde, B.; Selzer, P. Fast Calculation of Molecular Polar Surface Area as a sum of Fragment-based Contributions and its Application to the Prediction of Drug Transport Properties. J. Med. Chem. 2000, 43, 3714−3717. (30) Masek, B. B.; Shen, L.; Smith, K. M.; Pearlman, R. S. Sharing Chemical Information Without Sharing Chemical Structure. J. Chem. Inf. Model. 2008, 48, 256−261. (31) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Model. 2002, 42, 1273−80. (32) Patterson, D. E.; Cramer, R. D.; Ferguson, A. M.; Clark, R. D.; Weinberger, L. E. Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” Descriptors. J. Med. Chem. 1996, 39, 3049−3059. (33) Wikipedia. https://en.wikipedia.org/wiki/Fitness_ proportionate_selection (accessed Mar 28, 2016). (34) The analog between evolutionary algorithms and evolutionary biology may be superfluous. For a discussion of the role of aging in evolutionary biology, see: Kirkwood, T. B.; Austad, S. N. Why Do We Age? Nature 2000, 408, 233−238. (35) Lee, R. D. Rethinking the Evolutionary Theory of Aging: Transfers, Not Births, Shape Senescence in Social Species. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 9637−9642. (36) Medina-Franco, J. L.; Martínez-Mayorga, K.; Giulianotti, M. A.; Houghten, R. A.; Pinilla, C. Visualization of the Chemical Space in Drug Discovery. Curr. Comput.-Aided Drug Des. 2008, 4, 322−333. (37) Hartenfeller, M.; Eberle, M.; Meier, P.; Nieto-Oberhuber, C.; Altmann, K. H.; Schneider, G.; Jacoby, E.; Renner, S. Probing the Bioactivity-Relevant Chemical Space of Robust Reactions and Common Molecular Building Blocks. J. Chem. Inf. Model. 2012, 52, 1167−1178. (38) Excludes 173 drugs recovered by “zero step synthesis” because the drug was included in the reactant database. (39) Taylor, R. Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Agrochemicals. J. Chem. Inf. Model. 1995, 35, 59−67. (40) Matter, H. Selecting Optimally Diverse Compounds from Structure Databases: A Validation Study of Two-dimensional and Three-dimensional Molecular Descriptors. J. Med. Chem. 1997, 40, 1219−1229. (41) Martin, Y. C.; Kofron, J. L.; Traphagen, L. M. Do Structurally Similar Molecules have Similar Biological Activity? J. Med. Chem. 2002, 45, 4350−4358.

REFERENCES

(1) Segall, M. Advances in Multiparameter Optimization Methods for De Novo Drug Design. Expert Opin. Drug Discovery 2014, 9, 803−817. (2) Ekins, S.; Honeycutt, J. D.; Metz, J. T. Evolving Molecules Using Multi-objective Optimization: Applying to ADME/Tox. Drug Discovery Today 2010, 15, 451−460. (3) Gillet, V. J.; Bodkin, M. J.; Hristozov, D. Multiobjective De Novo Design of Synthetically Accessible Compounds. In De novo Molecular Design; Schneider, G., Ed.; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2013; Chapter 11. (4) Besnard, J.; Ruda, G. F.; Setola, V.; Abecassis, K.; Rodriguiz, R. M.; Huang, X. P.; Norval, S.; Sassano, M. F.; Shin, A. I.; Webster, L. A.; Simeons, R. C.; Stojanovski, L.; Prat, A.; Seidah, N. G.; Constam, D. B.; Bickerton, G. R.; Read, K. D.; Wetsel, W. C.; Gilbert, I. H.; Roth, B. L.; Hopkins, A. L. De Novo Design of Ligands Against Multitarget Profiles. In De novo Molecular Design; Schneider, G., Ed.; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2013; Chapter 12. (5) Schneider, G. Future De Novo Drug Design. Mol. Inf. 2014, 33, 397−402. (6) Damewood, J. R., Jr; Lerman, C. L.; Masek, B. B. NovoFLAP: A Ligand-based De Novo Design Approach for the Generation of Medicinally Relevant Ideas. J. Chem. Inf. Model. 2010, 50, 1296−1303. (7) Hasegawa, K.; Keiya, M.; Funatsu, K. Visualization of Molecular Selectivity and Structure Generation for Selective Dopamine Inhibitors. Mol. Inf. 2010, 29, 793−800. (8) Besnard, J.; Ruda, G. F.; Setola, V.; Abecassis, K.; Rodriguiz, R. M.; Huang, X. P.; Hopkins, A. L.; et al. Automated Design of Ligands to Polypharmacological Profiles. Nature 2012, 492, 215−220. (9) Baber, J. C.; Feher, M. Predicting Synthetic Accessibility: Application in Drug Discovery and Development. Mini-Rev. Med. Chem. 2004, 4, 681−692. (10) Allu, T. K.; Oprea, T. I. Rapid Evaluation of Synthetic and Molecular Complexity for In Silico Chemistry. J. Chem. Inf. Model. 2005, 45, 1237−1243. (11) Boda, K.; Seidel, T.; Gasteiger, J. Structure and Reaction Based Evaluation of Synthetic Accessibility. J. Comput.-Aided Mol. Des. 2007, 21, 311−325. (12) Zaliani, A.; Boda, K.; Seidel, T.; Herwig, A.; Schwab, C. H.; Gasteiger, J.; Claussen, H.; Lemmen, C.; Degen, J.; Paern, J.; Rarey, M. Second-generation De Novo Design: A View from a Medicinal Chemist Perspective. J. Comput.-Aided Mol. Des. 2009, 23, 593−602. (13) Gasteiger, J. De Novo Design and Synthetic Accessibility. J. Comput.-Aided Mol. Des. 2007, 21, 307−309. (14) Hartenfeller, M.; Renner, S.; Jacoby, E. Reaction-Driven De Novo Design: A Keystone for Automated Design of Target FamilyOriented Libraries. In De novo Molecular Design; Schneider, G.; WileyVCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2013; Chapter 10. (15) Cramer, R. D.; Soltanshahi, F.; Jilek, R.; Campbell, B. AllChem: Generating and Searching 10̂20 Synthetically Accessible Structures. J. Comput.-Aided Mol. Des. 2007, 21, 341−350. (16) Vinkers, H. M.; de Jonge, M. R.; Daeyaert, F. F.; Heeres, J.; Koymans, L. M.; van Lenthe, J. H.; Lewi, P. J.; Timmerman, H.; Van Aken, K.; Janssen, P. A. J. SYNOPSIS: SYNthesize and OPtimize System In Silico. J. Med. Chem. 2003, 46, 2765−2773. (17) Hartenfeller, M.; Zettl, H.; Walter, M.; Rupp, M.; Reisen, F.; Proschak, E.; Weggen, S.; Stark, H.; Schneider, G. DOGS: ReactionDriven De Novo Design of Bioactive Compounds. PLoS Comput. Biol. 2012, 8, e1002380. (18) Patel, H.; Bodkin, M. J.; Chen, B.; Gillet, V. J. Knowledge-based Approach to De Novo Design Using Reaction Vectors. J. Chem. Inf. Model. 2009, 49, 1163−1184. (19) Wikipedia. http://en.wikipedia.org/wiki/Evolutionary_ algorithm (accessed Mar 28, 2016). (20) Wikipedia. http://en.wikipedia.org/wiki/Crossover_(genetic_ algorithm) (accessed Mar 28, 2016). (21) Wikipedia. http://en.wikipedia.org/wiki/Convergent_synthesis (accessed Mar 28, 2016). 619

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620

Article

Journal of Chemical Information and Modeling (42) Ghosh, A. K.; Kincaid, J. F.; Cho, W.; Walters, D. E.; Krishnan, K.; Hussain, K. A.; Koo, Y.; Cho, H.; Rudall, C.; Holland, L.; Buthod, J. Potent HIV Protease Inhibitors Incorporating High-affinity P2 ligands and (R)-(Hydroxyethylamino) Sulfonamide Isostere. Bioorg. Med. Chem. Lett. 1998, 8, 687−690. (43) Mizhiritskii, M.; Marom, E. Process for the Preparation of Darunavir and Darunavir Intermediates. U.S. Patent No. 8,829,208, Sep 9, 2014. (44) Madhavi Sastry, G. M.; Adzhigirey, M.; Day, T.; Annabhimoju, R.; Sherman, W. Protein and Ligand Preparation: Parameters, Protocols, and Influence on Virtual Screening Enrichments. J. Comput.-Aided Mol. Des. 2013, 27, 221−234.

620

DOI: 10.1021/acs.jcim.5b00697 J. Chem. Inf. Model. 2016, 56, 605−620