The Application of Cheminformatics in the Analysis of High

Oct 5, 2016 - Considering that a 0.1% hit rate of a million compound screening collection will lead to 1,000 hits, selecting an appropriate lead serie...
0 downloads 11 Views 889KB Size
Chapter 13

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

The Application of Cheminformatics in the Analysis of High-Throughput Screening Data W. Patrick Walters,* Alexander Aronov, Brian Goldman, Brian McClain, Emanuele Perola, and Jonathan Weiss Modeling & Informatics, Vertex Pharmaceuticals Incorporated, Boston, Massachusetts 02210, United States *E-mail: [email protected]

Although high-throughput screening (HTS) has become a common method of identifying chemical starting points for drug discovery programs, the evaluation of hit sets and ultimate selection of one or more chemical series to be optimized is often a labor intensive, and somewhat arbitrary, process. In this chapter, we will outline some of the techniques we have adopted for the computational analysis of HTS results, and demonstrate how these techniques have been integrated into an internally developed software tool, the HTS Viewer.

Introduction Over the last 20 years, high-throughput screening (HTS) has become an essential component of pharmaceutical drug discovery (1–5). HTS provides an efficient means of identifying chemical starting points for drug discovery, and has become the predominant means of lead identification. Through the use of automation, drug discovery teams are now able to routinely screen millions of compounds as part of a hit identification effort. Hit rates for high throughput screens typically range between 0.1 and 1%, resulting in hundreds to thousands of hits. Once a screen has been performed, a drug discovery team must select one or more hit series that will form the basis of an optimization effort. If a chemistry team is large enough, multiple series may be selected and pursued in parallel. Given the chemistry resource available for a drug discovery project, no more than 2 or 3 series are typically pursued at once.

© 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Sorting through HTS hits and selecting an appropriate lead series creates a data analysis challenge. Considering that a 0.1% hit rate of a million compound screening collection will lead to 1,000 hits, selecting an appropriate lead series can create a data analysis challenge. Examining such a large number of chemical structures, and associated biological data, is both cumbersome and beyond the memory and patience limits of a typical scientist. The dearth of software tools available for HTS data analysis, has led discovery teams often resorting to less than optimal methods for selecting a series to pursue. Many criteria can be used to prioritize HTS hits. *

* *

*

Historical precedent can provide either a positive or a negative impact on the perception of a hit series. A series that demonstrated a good PK profile may be considered more attractive, while a series that failed in a toxicology study may be viewed in a more negative light. Without suitable access to the relevant data, this aspect of the analysis can be highly subjective and particularly problematic. Decisions can be highly influenced by team membership and the projects with which these individuals have been associated. In addition, there is often a tendency to deprioritize a series based on observations from a small number of compounds. Chemists on a team may find a particular series more attractive if they are familiar with synthetic routes that would facilitate analog synthesis. There is typically a tendency to pursue the most active compounds. While this is a logical approach to selecting a series to invest in, teams can sometimes overlook compound liabilities such as poor physical properties. It may be that an alternate series that is less active with better physical properties would provide a superior starting point for optimization. Teams tend to pursue a series where multiple similar compounds are active, or even better, show a range of activities. This may enable a preliminary understanding of the structure-activity relationships (SAR) that can inform the design of libraries for early exploration. While the presence of preliminary SAR is an attractive feature of a series, bias towards widely explored chemotypes may lead a company to pursue multiple drug discovery programs with similar chemical series. Since the majority of the compounds synthesized in a drug discovery program will ultimately end up in a company’s screening collection, a lack of cross-program diversity can negatively impact subsequent drug discovery programs.

In later sections of this chapter, we will review a number of key tasks that comprise the analysis of HTS hit sets. We will first describe our approach to organizing a set of chemical structures based on chemical scaffolds. Once a set of chemical structures has been organized, we must be able to visualize the biological data and physical properties associated with these molecules. We will describe a number of ways that data visualization can be used to understand the distributions of on-target and off-target activity, as well as physical properties. Finally we 270 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

will examine ways in which automated literature searches can be used to better characterize a hit series identified through HTS.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Identifying Scaffold Classes As mentioned above, a high-throughput screen can often generate hundreds to thousands of hits. As such, it can be difficult to navigate these large hit sets and identify trends that could indicate promising preliminary SAR. A number of methods for organizing large sets of molecules currently exist. One of the most common methods for organizing sets of chemical structures is clustering (6–10). Clustering is typically performed by calculating a set of molecular descriptors for each molecule, and grouping sets of molecules with similar descriptors. Molecules with similarity values that meet a pre-defined threshold are placed into the same cluster. The output of a clustering method may simply be a set of molecules and associated cluster identifiers, or can be a more elaborate representation. In a technique known as hierarchical clustering (11), the output is a tree, known as a dendrogram that highlights the association between molecules. While clustering is a powerful technique it does present a few drawbacks. Clustering tends to be sensitive to the threshold values used and can sometimes provide results that are less chemically intuitive. A small change in the threshold for cluster membership can sometimes create a significant change in the resulting clusters. In addition, while hierarchical clustering methods may enable visualization of intra-cluster relationships, dendrograms are only practical for smaller datasets. Another promising approach to organizing large sets of chemical structures is the Scaffold Tree method described by Shufenauer (12). In this approach, a set of molecules is initially reduced to a set of simple rings and ring systems. These rings and ring systems are then expanded into larger substructures in a hierarchical fashion. This hierarchy is then used to create a series of “trees” with simple substructures at the root. These simple substructures are then linked to a series of more complex substructures as one moves progressively through the tree. While this approach is compelling, a limited number of implementations are currently available. Hopefully, as additional groups implement this method, it will become an effective means of exploring larger HTS datasets. Our process for navigating the chemical data generated in an HTS builds upon the work of Bemis and Murcko (13), who developed a widely used method for reducing chemical structures to representative molecular scaffolds. In brief, the Bemis and Murcko method proceeds by successively removing monovalent atoms from a chemical structure until no monovalent atoms remain. The atoms and bonds that remain comprise ring systems and acyclic linkers, which form the molecular scaffold. In order to preserve the hybridization states of atoms, exocyclic bonds to heteroatoms (e.g. carbonyl groups) are retained. In our method, scaffolds are further reduced by removing single, non-ring bonds, to arrive at scaffolds containing a total of three rings. All combinations of non-ring bonds are removed so that the method is not dependent on the order in 271 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

which rings are removed. If a structure contains three or fewer rings, then all rings are retained. Once all of the chemical structures have been reduced to a set of three or fewer ring scaffolds, the scaffold frequencies are recorded and the scaffolds are used to organize the molecules. Figure 1 provides a schematic description of the method. The three molecules, 1,2, and 3 on the left can be reduced to the five scaffolds in the center. This set of five consists of 3 different scaffolds A, B, and C. By examining the frequency of occurrence of the scaffolds, we can see that scaffold A on the right occurs in all of the molecules. In order to provide a broader view of the hit set, the method allows a molecule to belong to multiple scaffold classes. In Figure 1, molecule 2 can belong to both scaffold classes A and B, while molecule 3 can belong to scaffold classes A and C.

Figure 1. Reducing a set of molecules to three-ring scaffolds. The three molecules on the left are reduced to scaffolds A, B, and C. Scaffold A is chosen because it is common to all three molecules.

272 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

The resulting set of scaffolds can then be used to organize a set of molecules for display. A two panel user interface as shown in Figure 2 can be set up to enable navigation of chemical structures. The scaffolds are shown in the panel at the left. Selecting one of these scaffolds displays the corresponding structures in the panel on the right. While the interface defined in Figure 2 can provide a relatively efficient means of navigating a large set of chemical structures, it does not provide a simple means of examining a set of related scaffolds. Upon identifying an “interesting” scaffold, one often wants to examine the activities of related scaffolds. One way of identifying related scaffolds is to simply sort the set of scaffolds based on their chemical similarity to a scaffold of interest. This process is illustrated in Figure 3. In this case, we are interested in compounds from the scaffold at the top and want to identify compounds from related scaffolds. In the HTS Viewer, a right mouse click on any scaffold brings up a menu, which sorts the scaffolds in order of decreasing similarity to the selected scaffold. In principle, any molecular similarity metric could be used for the comparison. We have found that a simple path-based fingerprint similarity measure provides an effective means of comparing and sorting scaffolds.

Figure 2. Browsing scaffolds in the HTS Viewer. The scaffolds are shown in the panel on the right. Selecting a scaffold will display molecules containing that scaffold, and associated biological activity values, in the panel on the right.

273 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Figure 3. Identifying related scaffolds. Once similar scaffolds have been identified, scaffolds can be sorted based on similarity to a scaffold of interest.

Visualizing On-Target Activity Distributions Once scaffolds have been identified and molecules have been grouped by scaffold, one typically wants to compare the activity distributions between scaffolds. Ideally, we would like to identify scaffolds with a range of activity values. A series with a range of activity values may possess some preliminary SAR that would provide direction for an exploratory chemistry effort. There are many techniques for comparing data distributions. Four of the most common methods are shown in Figure 4. In Figure 4, the histogram in Graph A is the most commonly used representation. While histograms can provide a simple means of visualizing a distribution, they do not provide an ideal method for comparing distributions. It is not possible to cleanly superimpose histograms, and even side-by-side comparison can sometimes be difficult. These comparisons become even more complex when more than two histograms are to be compared. In Figure 4, the density plot in Graph B shows another common method for comparing distributions. A density plot is based on a mathematical method known as a kernel density estimate, and can be thought of as a curve fit to a histogram. Different functional forms and bandwidth can be applied to adjust the fit of the curve to the distribution described by the histogram. Density plots provide an advantage in that they can be overlaid. Additionally, each distribution can be shown in a different color or line style to facilitate comparison. While density plots can be easily compared when superimposed, comparing side-by-side distributions presents challenges similar to those faced when comparing histograms. In Figure 4, Graph C shows the same distributions compared as box plots. In a box plot, the box represents the middle 50% of the distribution, with a line drawn inside the box representing the median of the distribution. The whiskers outside the box represent the extents of the distribution, with outliers drawn as dots. Box plots provide a convenient method of comparing multiple distributions. Multiple box plots can be arrayed horizontally or vertically to facilitate comparisons. 274

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Figure 4. Different ways of graphically representing distributions. Clockwise from the upper left are histogram, density plot, box plot, violin plot. Note that the x axis in the plots on the bottom row are on a log scale Figure 4 Graph D shows a violin plot, a representation that combines many of the features of the density plot and the box plot. A violin plot can be thought of as a mirrored representation of the density plot. If one draws a density plot, places a mirror image below, and fills the region below, a violin plot is produced. The violin plot has the advantage that it can show the shape of a distribution and can be used to readily compare multiple distributions. As shown in Figure 4, each representation provides a different view and can highlight a new aspect of a distribution. In many cases it can be useful to look at multiple representations of the same distribution. In the HTS Viewer, the activity distribution for each scaffold is represented as a box plot displayed in a table cell adjacent to the scaffold structure. A vertical red line can be adjusted to display a user-defined activity threshold. We have found that this representation, coupled with the interactive scaffold display described above, makes it relatively easy to browse HTS data and to quickly identify scaffolds with interesting SAR.

Avoiding Problematic Compounds In some cases, screening collections may contain compounds that possess known liabilities. These compounds may contain functionality that will interfere in some way with a biological assay, be known aggregators, or contain potentially toxic functionalities. Over the last 20 years, numerous groups have developed extensive lists of structural filters designed to highlight molecules containing potentially problematic functionalities (14–16). Models have also been developed to identify molecules that may aggregate under typical biological assay conditions 275 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

(17). While the presence of functionalities that trigger one of these alerts may not provide a sufficient reason to ignore a hit, it is always best to consider all available evidence when selecting a lead series. Another area of concern in HTS is the physical properties of the compounds being screened. Extremely lipophilic compounds may precipitate or adhere to a screening plate. As a result, even if active, these compounds will not show a response in an assay and would be considered “false negatives”. In other cases lipophilic compounds may interfere with an assay and produce what appears to be a positive response. In these cases, the compounds would be considered as “false positives”. In their highly cited 1997 paper (18), Lipinski and coworkers point out a number of downsides to lipophilic compounds in HTS. These authors proposed a number of computational and experimental filters that could be used to potentially avoid these problems. More recently, some have questioned these filters on the basis that they may be steering the field away from productive areas of chemical space (19, 20). As with any computational filters, calculated properties should be viewed as guides rather than gates. It is often useful to sort a set of screening hits based on a calculated lipophilicity value (CLogP) when assessing a set of screening hits. This sorting can provide additional insights into the potential false positive nature of the hits. In the HTS Viewer, we provide a number of visual cues to potentially problematic functionality. We allow the user to highlight substructures that trigger any of the REOS (14) alerts developed in our group or any of the now popular pan-assay interference (PAINS) substructure filters (21) published by Baell and coworkers. Additionally and as mentioned earlier, we provide plots to highlight the molecular weight and lipophilicity distributions of the compounds.

Visualizing Off-Target Activity Distributions In addition to being able to identify scaffolds with the desired activity, it is also necessary to consider the activity of HTS hits against other targets. Compounds that are part of pharmaceutical screening collections have often been run in dozens or even hundreds of assays. While having extensive data on each compound is helpful, investigating the activity profiles for dozens or even hundreds of hits can quickly become cumbersome. In order to utilize this wealth of screening data and to facilitate the analysis of off-target activity, we created a simple but effective representation known as a promiscuity plot. This plot can provide an easy to interpret representation of the off-target activity of a group of compounds. In the HTS Viewer, these plots are typically used to view the off-target activity of all of the compounds from a particular scaffold. The promiscuity plot is a scatter plot with the number of times a compound has been assayed on the x-axis and the number of times the compound has been active on the y-axis. A more selective series will have the majority of the points clustered toward the x-axis, while a more promiscuous set of compounds will have points clustered closer to the diagonal. We can contrast the on-target and off-target activities by coloring the points in the plot according to their activity in the on-target assay. In Figure 5, we have colored the compounds active in the 276

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

on-target assay green and the inactive compounds red. A similar approach was developed independently by Oprea and coworkers, and is available as part of their BADAPPLE (22) software tool.

Figure 5. A scatterplot showing compound promiscuity. The plot shows the number of assay in which a compound is active along the y-axis and the number of assays in which the compound was tested along the x-axis. Compounds are colored based on activity in the primary assay (green = active, red = inactive). Promiscuous compounds that hit frequently in assays will be represented by points closer to the diagonal.

Visualizing ADME and Property Distributions Another important aspect in selecting hits from an HTS is physical properties and ADME (Absorption Distribution Metabolism and Excretion) assays (23–25). While the physical and ADME properties within a series may be improved over the course of a lead optimization program, it can be challenging to remove a liability yet maintain activity. In addition, there is a tendency to synthesize increasingly more lipophilic compounds over the course of an optimization program, so given the choice, it is typically better to start with a more soluble hit. As with other aspects of HTS data analysis, we need to provide a simple representation that will enable one to quickly assess the properties of a set of compounds (typically from a single scaffold). As a default in the HTS Viewer, we provide “thermometer plots” that present an overview of three calculated (molecular weight, CLogP, polar surface area) and three experimentally determined (aqueous solubility, inhibition of the hERG channel, inhibition of cytochrome P450s) properties. In each bar in the plot, the fraction of compounds falling within limits set as “good”,”fair”, and “poor” are depicted in green, yellow, and red. For instance if we set the following criteria for molecular weight * * *

Good (green) 400 and 500 277

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

the colors in the bar labeled “MW” would reflect the fraction of compounds in that particular scaffold class falling into the designated ranges. To simplify the analysis of large datasets, labels are kept to a minimum. The idea here is to provide a capability to rapidly browse the data and get an overview of the properties of the compounds. Additional detail on criteria defining the ranges for each property can be provided through tool tips that are available when the user hovers the mouse over a plot.

Streamlining Literature Searches The chemical and pharmaceutical literature is another key component in the prioritization of screening hits. It is important for a drug discovery team to understand the biological activity of compounds similar to a hit. While chemical similarity is not a perfect indicator of biological activity it can provide some clues to potential on-target or off-target activities. Another important aspect that must be considered in pharmaceutical drug discovery is intellectual property. Ideally, a new drug must not be covered by another company’s patent. Numerous databases containing chemical and biological data are available. Some of these databases such as ChEMBL (26, 27) and PubChem (28, 29) are freely available, while others such as Scifinder (30), Integrity (31), and Reaxys (32) are available through commercial licenses. While all of the databases mentioned previously contain chemical structures and associated biological activity data, the commercial databases also contain references to issued patents and published patent applications. Recently, the same group that curates and publishes the ChEMBL database released a new database, SureChEMBL (33), which contains more than 10 million chemical structures that were generated through automated extraction of chemical structures and chemical names from published patents. There are many subtleties in the interpretation of claims from chemical patents that are beyond the scope of this chapter. The interested reader can consult the review by Downs (34) for more information. While it is possible to perform a thorough literature search for each of hundreds of screening hits, doing so would be cumbersome and very time consuming. In an effort to streamline these searches and provide a preliminary overview of available information, we have integrated a number of publicly available and commercial databases into the HTS Viewer. For each scaffold identified in the HTS dataset, we perform substructure searches in each of the available databases to identify molecules containing that scaffold. This integrated information can then be viewed in context with the associated on-target, off-target, and property data. The data from the freely available databases can be hosted and searched internally to facilitate performance and interactivity. While many of the commercial databases do not allow direct access to the data, they do offer application programming interfaces (APIs) that allow programatic access to the underlying data. 278

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Figure 6. The display of literature data from the ChEMBL database for a selected scaffold. The pie chart at the top shows the distribution of activities for molecules containing the selected scaffold. Clicking on a wedge in the pie chart shows molecules and activity data (linked to ChEMBL) in the table below the pie chart. Figure 6 provides an example of how literature data is displayed in the HTS Viewer. In the panel on the left, the scaffolds and associated data discussed earlier are displayed. Clicking on a row in the table on the left will show the associated literature data on the right. At the top of the panel is a pie chart with breakdowns of the known biological targets for compounds with the selected scaffold. The table below the pie chart shows the chemical structures of the literature compounds as well as additional details on targets and clinical progression. Each chemical 279

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

structure is linked to one or more primary data sources (Integrity, ChEMBL, etc.). Clicking on the structure will redirect the user to the primary source that contains additional information. While the facilities provided here are not a substitute for a thorough literature search, they do provide an overview that can be used to facilitate hit prioritization.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

Conclusion Although high-throughput screening has become a mainstay of pharmaceutical drug discovery, the analysis and prioritization of hit sets is still a cumbersome, manual process. The majority of available informatics systems are not amenable to large datasets, and do not provide the degree of integration necessary to rapidly triage a set of screening hits. In order to make the appropriate choices, we must be able to carry out a number of key tasks in an integrated fashion. * * * * *

Organize chemical structures in a manner that facilitates navigation Evaluate activity trends for chemical series Compare both on- and off-target activities Readily access information on compound properties and ADME Leverage prior knowledge from the scientific and patent literature

The HTS Viewer software tool described here is an initial foray into the development of a tool that will streamline the evaluation of HTS data and enable drug discovery teams to select appropriate series for investigation. Figure 7 shows a view of a single table row from the HTS Viewer. This row provides information on the on- and off-target activity of molecules in that scaffold class as well as an overview of compound properties. Each plot and structure is linked to additional experimental details allowing a rapid triage of an HTS hit set. Ideally, some of the ideas described here will motivate others to develop new systems and expand this important field.

Figure 7. An integrated display showing a scaffold, on and off-target activity distributions, and properties. In the HTS Viewer each scaffold is represented by a row in a sortable table. Clicking on the row provides additional detail on the selected scaffold. 280 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

References 1. 2. 3.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

4.

5.

6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

23. 24. 25. 26.

27.

Pereira, D. A.; Williams, J. A. Br. J. Pharmacol. 2007, 152, 53–61. Walters, W. P.; Namchuk, M. Nat. Rev. Drug Discovery 2003, 2, 259–266. Yu, H.-B.; Li, M.; Wang, W.-P.; Wang, X.-L. Acta Pharmacol. Sin. 2016, 37, 34–43. Macarron, R.; Banks, M. N.; Bojanic, D.; Burns, D. J.; Cirovic, D. A.; Garyantes, T.; Green, D. V. S.; Hertzberg, R. P.; Janzen, W. P.; Paslay, J. W.; Schopfer, U.; Sittampalam, G. S. Nat. Rev. Drug Discovery 2011, 10, 188–195. Bakken, G. A.; Bell, A. S.; Boehm, M.; Everett, J. R.; Gonzales, R.; Hepworth, D.; Klug-McLeod, J. L.; Lanfear, J.; Loesel, J.; Mathias, J.; Wood, T. P. J. Chem. Inf. Model. 2012, 52, 2937–2949. Holliday, J. D.; Rodgers, S. L.; Willett, P.; Chen, M. Y.; Mahfouf, M.; Lawson, K.; Mullier, G. J. Chem. Inf. Model. 2004, 44 (3), 894–902. Varin, T.; Bureau, R.; Mueller, C.; Willett, P. J. Mol. Graph. Model. 2009, 28, 187–195. McGregor, M. J.; Pallai, P. V. J. Chem. Inf. Model. 1997, 37, 443–448. Stahl, M.; Mauser, H.; Tsui, M.; Taylor, N. R. J. Med. Chem. 2005, 48, 4358–4366. Butina, D. J. Chem. Inf. Model. 1999, 39, 747–750. Barnard, J. M.; Downs, G. M. J. Chem. Inf. Model. 1992, 32, 644–649. Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M. A.; Waldmann, H. J. Chem. Inf. Model. 2007, 47, 47–58. Bemis, G. W.; Murcko, M. A. J. Med. Chem. 1996, 39, 2887–2893. Walters, W. P.; Murcko, M. A. Adv. Drug Delivery Rev. 2002, 54, 255–271. Rishton, G. M. Drug Discovery Today 1997, 2, 382–384. Bruns, R. F.; Watson, I. A. J. Med. Chem. 2012, 55, 9763–9772. Irwin, J. J.; Duan, D.; Torosyan, H.; Doak, A. K.; Ziebart, K. T.; Sterling, T.; Tumanian, G.; Shoichet, B. K. J. Med. Chem. 2015, 58, 7076–7087. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug Delivery Rev. 1997, 23, 3–25. Zhang, M.-Q.; Wilkinson, B. Curr. Opin. Biotechnol. 2007, 18, 478–488. Walters, W. P. Expert Opin. Drug Discovery 2012, 7, 99–107. Baell, J. B.; Holloway, G. A. J. Med. Chem. 2010, 53, 2719–2740. Bologa, C. G.; Oprea, T. I. In Computational Drug Discovery and Design; Baron, R., Ed.; Methods in Molecular Biology; Humana Press: Totowa, NJ, 2012; Vol. 910, pp 125–143. Kassel, D. B. Curr. Opin. Chem. Biol. 2004, 8, 339–345. Obach, R. S.; Lombardo, F.; Waters, N. J. Drug Metab. Dispos. 2008, 36, 1385–1405. Lipinski, C. A. J. Pharmacol. Toxicol. Methods 2000, 44, 235–249. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. Nucleic Acids Res. 2011, 40, D1100–D1107. Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Krüger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; 281

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

28. 29.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch013

30. 31.

32. 33.

34.

Papadatos, G.; Santos, R.; Overington, J. P. Nucleic Acids Res. 2013, 42, D1083–D1090. Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. Nucleic Acids Res 2009, 37, W623–W633 (Web Server). Li, Q.; Cheng, T.; Wang, Y.; Bryant, S. H. Drug Discovery Today 2010, 15, 1052–1057. SciFinder database, Chemical Abstracts Service. http://scifinder.cas.org. Integrity database, Thomson Reuters. http://thomsonreuters.com/en/ products-services/pharma-life-sciences/pharmaceutical-research/integrity. html. Reaxys database, Elsevier. https://www.elsevier.com/solutions/reaxys. Papadatos, G.; Davies, M.; Dedman, N.; Chambers, J.; Gaulton, A.; Siddle, J.; Koks, R.; Irvine, S. A.; Pettersson, J.; Goncharoff, N.; Hersey, A.; Overington, J. P. Nucleic Acids Res. 2016, 44, D1220–D1228. Downs, G. M.; Barnard, J. M. WIREs Comput. Mol. Sci. 2011, 1, 727–741.

282 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.