The Many Facets of Screening Library Design - ACS Symposium

Oct 5, 2016 - To date, the PubChem Compound database (17), the most .... property ranges of molecular weight (140-230), cLogP (0-2), and heavy atom co...
5 downloads 0 Views 349KB Size
Chapter 16

The Many Facets of Screening Library Design

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch016

Markus Boehm, Liying Zhang, Nicole Bodycombe, Mateusz Maciejewski, and Anne Mai Wassermann* Pfizer Inc., 610 Main Street, Cambridge, Massachusetts 02139, United States *E-mail: [email protected]

Many screening approaches for the discovery of leads active against a target or phenotype co-exist in drug discovery, ranging from the use of low molecular weight fragments with biophysical methods to the evaluation of highly complex natural products in cell-based phenotypic assays. Each screening strategy imposes different requirements on the molecules that are being tested. In this chapter, we discuss design rules for various screening sets routinely used by pharmaceutical companies and/or academic screening facilities. Orthogonal approaches, such as chemically diverse versus biologically diverse libraries or pre-plated versus customized compound sets, are contrasted. Additionally, the goal of a screen can greatly influence the selection of compounds. For example, lead and tool compounds may have fundamentally different molecular properties. A common theme for the design of all compound libraries is their high dependence on computational data analysis and algorithms, making screening set design a chemoinformatics task.

Introduction The likelihood of finding a safe, efficacious drug in synthetically accessible chemical space is often compared to the chance of finding a needle in a haystack. The requirements a small molecule needs to fulfill to make it to the market go far beyond demonstrating activity against a disease-relevant target. A © 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch016

drug needs to be preferably orally available, distributed to the disease tissue, exposed long enough at the site of action to have efficacy, and not engage into interactions with unwanted off-targets to avoid safety liabilities – to name just a few of the many parameters that need to be considered and optimized along the different development stages of a drug discovery project (1). The quality of the initial lead for compound optimization is often crucial for later success, and hence it is not surprising that pharmaceutical companies have made significant investments to build high-quality screening collections (2–4). Together with the physiological relevance and robustness of the assay used in a screening campaign, the composition of the screened compound set will ultimately determine the quality of any active compounds identified. It should be clear that designing a screening library, which is in many ways equivalent to navigating the vast expanse of chemical space and selecting molecules from individual – hopefully bioactive – islands, is a task involving the comparison of millions of compounds. The formulation of rules for the “chemical beauty” and drug- and lead-likeness of small molecules (5, 6), which play a decisive role for the inclusion of compounds into a screening library, were only enabled through the computational analyses of compound data sets. In this book chapter, we will put more emphasis on principles and strategic drivers for screening set development rather than on algorithmic details. Traditionally, most pharmaceutical companies have pre-plated screening decks covering millions of small molecules that are routinely tested in high-throughput screens (HTS) across a variety of targets and indications (7). These screening libraries are mostly filtered to cover physicochemically desirable regions of chemical space and be structurally diverse (8). This follows the assumption that structural diversity is a good surrogate for biological diversity and that a library comprising a variety of chemotypes will yield at least a few active compounds for each screening project. More recently, chemical diversity-based approaches have been complemented by biodiversity methods that use profiling experiments or historical activity data to create sets of molecules with diverse biological mechanisms (9, 10). Fragment libraries have been designed to cover chemical space more efficiently (11). Properties of biologically active natural products (NPs) have been studied and inspired the design of NP-like screening collections, while the emergence of diversity-oriented synthesis (DOS) has enabled the rapid generation of compound libraries with rich functional substitution patterns (12). With the increasing availability of biological annotations for small molecules in pharmaceutical screening collections and the paradigm shift from target-focused to phenotypic screening (13), pharmacologically annotated screening subsets have been created that use small molecules as probes in cell-based assay systems to elucidate mechanisms-of-actions that are impacting the phenotypic readout (14). In this chapter, we will describe all of these screening sets and the (computational) design principles used in creating them. It should be understood that library design is always dependent on the purpose of the library. The purpose controls the methodologies used, the types of compound selected and the number of compounds selected. Hence, we will not only discuss different types of libraries but also outline the reason why they are being used and created in the first place. 346

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch016

Structural Diversity and Physicochemical Property Considerations in Screening Library Design Due to advances in high-throughput screening, it has become routine for pharmaceutical companies to identify novel bioactive molecules by screening large chemical collections against various biological targets in an unbiased fashion (15). Theoretically, a desired lead compound can reside anywhere in the vast expanse of chemical space, which has been estimated to range from 1013 to 10100 molecules (16). Despite advances in throughput, it is not feasible to enumerate, let alone synthesize or screen the entirety of chemical space, even if we take the most conservative estimation of 1013 (16). To date, the PubChem Compound database (17), the most comprehensive public resource of synthesized molecules, contains around 70 million molecules, and there are typically 1-10 million compounds in screening collections of pharmaceutical companies, which is only a small fraction of the chemical space. Therefore, the composition of screening libraries determines the bias and potential limitations of a screening campaign (10). By and large, an ideal screening collection should be enriched for bioactive compounds with high diversity in both structures and biological profiles (10). Without doubt, it is an extremely difficult challenge to design such screening libraries. Here we discuss two approaches that are routinely being used in screening library design: maximizing structural diversity and optimizing physicochemical parameters (18). Diversity-based library design is derived from the “similarity property principle” (19), which states that structurally similar compounds tend to have similar properties, including biological activity. The interpretation and practical application of this principle efficiently reduces the size of screening libraries by sampling representative molecules from structurally similar compounds (16). A number of methods aiming at choosing structurally diverse compound sets have been developed. A widely used approach is based on optimizing the distribution of the selected compounds in chemical space as represented by sets of chemical descriptors. These descriptors are often summarized into binary fingerprints which represent 2D and 3D structural features (20). These compound fingerprints can be calculated by public and commercial software packages (21, 22). Algorithms used in diversity selection include pairwise similarity minimization (23), sphere-exclusion of nearest neighborhood (24), and clustering and partitioning (25, 26). Detailed reviews on these algorithms can be found elsewhere (16, 27). For all of these algorithms, the selection of descriptors is subjective but can have a profound impact on the resulting library (28). Alternatively, rule-based compound classification approaches have been developed that classify compounds by chemotype, such as the molecular framework (Murcko framework) (29). However, these methods usually classify molecules based on ring systems and are unsuitable for use with acyclic molecules. In diversity-based libraries the activities of selected molecules, which are representatives of other structurally similar compounds, should be measured with little experimental error. If only one compound is selected out of a series and is tested as a false negative in a screen, the entire series will be missed for further studies, which is a concern for HTS campaigns with intrinsic false negative 347 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch016

rates. Given these challenges, methods to quantify the chemical coverage of a screening library are a critical matter to consider and some investigations have been conducted on it (30, 31). For example, Harper et al presented a quantitative framework to optimize the composition of a screening collection (31). This diversity-based framework can be used to determine the minimal size of a chemical collection needed to identify at least one lead compound with a certain probability. Also it can be used to calculate how many compounds per chemical series should be sampled and screened attempting to create an optimal balance between “focus” and “diversity”. Their method predicts that, without any knowledge about the activity distributions of different chemical series against the target of interest, the optimal solution is to select equal numbers of compounds from all series. Similarly, Nilakantan et al (30) analyzed 18 historical HTS assays to propose that a screening library should be uniformly composed with equal representation of different medicinally relevant ring-scaffolds and each scaffold should have about 100 analogs. In addition, by using activity probabilities and Belief Theory, Bakken et al (2) introduced the concept of “Redundancy”, which helped to determine how many structurally similar compounds are needed as representatives for a region of active compounds in order to identify at least one of those compounds as active in a screening campaign with a > 95% confidence level. It is important to understand that the best selection of compounds for a screening library cannot be achieved by maximizing structural diversity alone. Physicochemical properties are equally important and the concept of “drug-likeness” is widely accepted in screening library design. While chemical space is enormous, the space of “drug-like” compounds is much smaller. In the past two decades, the Lipinski “rule of five” (Ro5), which was derived from a set of clinical candidates reaching phase II clinical trials or further, has been extensively used due to its conceptual simplicity and ease of calculation (18, 32). This rule states that, in general, an orally active drug has no more than one violation of the following criteria: 1) H-bond donors