Chapter 1
Frontiers in Molecular Design and Chemical Information Science: Introduction Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
Rachelle J. Bienstock RJB Computational Modeling LLC, 300 Pitch Pine Lane, Chapel Hill, North Carolina 27514 *E-mail:
[email protected] This introductory chapter is an overview of the material presented in this volume and at the ACS Division of Chemical Information (CINF), Herman Skolnik Award Symposium, held in Boston at the ACS National Meeting in Fall 2015. Dr. Jurgen Bajorath was the awardee for his contributions in the areas of molecular fingerprinting and similarity analysis, virtual screening methodologies, QSAR, and visualization and graphical analysis of large chemical data sets and the application of these methods to drug discovery. The symposium included presentations covering these topical areas, and this volume is a compilation of the material presented and a summary of contributions to this field by Dr. Bajorath and his colleagues.
This book is a collection of papers based on a series of talks presented as a tribute to Dr. Jürgen Bajorath at the ACS Division of Chemical Information (CINF), Herman Skolnik Award Symposium, Fall 2015, in Boston. The Skolnik Award, presented by the division at every fall national ACS meeting, was established to recognize outstanding contributions related to the fields of chemical information, and cheminformatics and is named in honor of Dr. Herman Skolnik, the first awardee (1976). A complete list of awardees can be found on the ACS CINF division website (http://www.acscinf.org/content/herman-skolnik-award). Dr. Bajorath is a world leader in the development and application of cheminformatics and computational solutions to research problems in medicinal chemistry, chemical biology and life sciences and has done pioneering work © 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
in the area of BIG DATA analysis in chemistry. He is widely recognized for his seminal and prolific research work in molecular similarity analysis and ligand-based virtual screening, fingerprint engineering and application of advanced machine learning techniques, application of information theoretic concepts to cheminformatics, large-scale graphical analysis and visualization of structure-activity relationships, BIG DATA analysis in chemistry and evaluating SAR content in medicinal chemistry project progression in novel ways. The Herman Skolnik award was given to Dr. Bajorath in recognition of his seminal contributions in the field. Dr. Bajorath obtained M.S. and Ph.D. degrees from the Free University, West –Berlin (PhD adviser Wolfram Saenger) and worked in the pharmaceutical industry in the U.S. at Bristol Myers Squibb the biotech industry (AMRI), and academia, prior to returning to an appointment as Full Professor and Chair of Life Science Informatics at the University of Bonn, Germany. He has continued associations and appointments as Affiliate Professor with The University of Washington (U.S.), and Guest Professor at University of Strasbourg (France) . Dr. Bajorath currently serves as an Associate Editor of the Journal of Medicinal Chemistry and has served on the editorial board of several major research journals. The speakers, selected by Dr. Bajorath, and whose talks are represented among the chapters in this volume, mostly represent coworkers and collaborators with whom Dr. Bajorath has worked throughout his career. Some were his mentors, some his peers and collaborators and some his former students. The computational and cheminformatics methods discussed, and their application to drug discovery, are essential for sustaining a viable drug development pipeline. It is increasingly challenging to identify new chemical entities and the amount of money and time invested in research to develop a new drug has greatly increased over the past 50 years. Joseph A. DiMasi, Director of Economic Analysis, The Tufts Center for the study of Drug Development (1), reported in 2014 that R & D Expenditures for the New Drugs and Biologics has increased to roughly $50 billion dollars (2013), while the number of new compound approvals has not increased significantly and has fluctuated between 15 and 30 new compounds approved per year, over the past 50 years- despite the dramatic increase in R & D expenditure. The average time to take a drug from clinical testing to approval is currently 7.2 years. Therefore, the need to develop predictive computational techniques to drive research more efficiently to identify compounds and molecules, which have the greatest likelihood of being developed into successful drugs for a target, is of great significance. New methods such as high throughput screening (HTS) and techniques for the computational analysis of hits have contributed to improvements in drug discovery efficiency. Millions of compounds can be routinely screened in bioassays to identify active lead compounds. New types of HTS libraries can be designed. Often specific compound libraries are used, instead of chemically diverse libraries, when there is prior knowledge regarding active compounds. Often fragment libraries are used to increase diversity and coverage of chemical space, and sometimes lead compounds are identified from natural product inspired compound libraries. 2
Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
Computational methods such as SAR (Structure-Activity Relationships) analysis, identifying scaffold classes, clustering using molecular descriptors and in-silico ADME (Absorption, Distribution, Metabolism, and Excretion) properties, and fingerprint similarity are all standard in aiding the modern drug discovery process. The novel ways to visualize and display chemical information data and the use of graph theory to display scaffold-based target selective networks presented in this volume provide for informed analysis of molecules and structure-activity relationship information. Novel SAR visualization and analysis methods include heterogeneous SAR analysis, employing activity versus selectivity cliffs, SAR monitoring using activity cliffs, and using SAR matrices based on compound neighborhoods. Specialized databases developed also play an important role in assisting new cheminformatics for drug development. PharmMapper DB is a database of pharmacophores based on solved protein-ligand structures and PharmMapper also provides a web service that aligns pharmacophores associated with specific targets. Other drug-target interaction databases available include: STITCH, Drug2Gene, PROMISCUOUS, DrugBank, ChEMBL, BindingDB and PubChem Bioassay. Often nonlinear dimensionality reduction techniques such as SOMs (Self-Organizing Maps), or PCA (Principal Component Analysis) are applied to complex high dimensional data to simplify interpretation of the data. Molecular descriptors are calculated that characterize physical properties, such as lipophilicity and electronic properties or ionization potential. 2D descriptors are based on molecular graphs and account for a variety of topological or chemical features and 3D descriptors include molecular conformation-dependent characteristics such as molecular volume and shape. A variety of these methods and their applications were presented at this symposium. In this symposium volume, Dr. W. Patrick Walters (Vertex) discusses the HTS Visualizer tool in use for drug discovery at Vertex. High throughput virtual screening has been an invaluable method applied in modern pharmaceutics for hit identification in new drug discovery. It has been used to decide which series of lead compounds to pursue in drug development. “Big data” now presents many challenges to drug discovery in pharmaceutical companies since libraries contain millions of compounds and associated bioassay data. Hit prioritization is also an issue. Molecular descriptors are often used for clustering compounds to analyze the data, and cluster threshold can be critical. This approach is based upon conventional molecular frameworks (scaffolds). The HTS Visualizer method uses ring scaffolds and relies on the examination of scaffold frequency. Fingerprint similarity can also be used to compare and sort scaffolds. Scaffolds can then be associated with activity data to produce SARs. Histograms, density plots, box plots, and violin plots can be used to compare data distributions, along with REOS (rapid elimination of swill) alerts and PAINS (pan-assay interference compounds) filters. Promiscuity plots can be used to cluster on and off target compounds and ADME assays to classify compounds into good, fair and poor categories. A successful drug design workflow partitions actives into scaffold classes, and profile classes and then correlates these classes with SAR and ADME properties and prioritizes scaffolds. Vertex uses substructure searches linked to Scifinder 3
Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
via Scifinder API, internal Reaxsys databases, CHEMBL and Thomson-Reuter integrity with their own HTS viewer linked to display information. Dr. Veerabahu (Veer) Shanmugasundaram discusses SARMs (StructureActivity Relationship Matrices) and application of SARMs in monitoring changes in SAR information content during project team progression. Spotfire visualization can be used to create a time dependent picture of structure-activity. SAR matrices were developed by Jürgen’s group to extract SAR patterns from data sets for easy clustering and organization for analysis. Veer used the SAR method inside Pfizer in conjunction with TIBCO Spotfire for easy visualization of SAR results. SAR patterns are automatically extracted from datasets, using a matched pair molecular algorithm, and the information and properties in SAR matrix can be color-coded. In SAR matrices that are color coded according to molecular properties, privileged R groups, and activity cliffs can be easily identified. A matrix prioritization scheme can be used to predict the potency of compounds (virtual compounds prior to synthesis and testing) based on the core structure and the substituents of the surrounding neighbors (neighbor analysis based method). Veer discusses the successful application of this method to Pfizer’s neurodegenerative and inflammation targets. Drs. Ye Hu and Bajorath present a large-scale interaction analysis of ligands based on activity data and target annotations rather than structural information. The use of graphical methods helps in that chemical space is too large to explore experimentally. In modeling chemical space representations, through descriptors of molecular similarity with biological activity, can be used to create activity landscapes. The concept of generating target-ligand spaces, is one where targets are organized by structure, or sequences, linked to active ligand- target-ligand networks, with targets as nodes connected by an active ligand. Target-ligand interactions can also include compound skeleton scaffold hierarchies. Certain core structures have selectivity for certain targets, and these are considered privileged structures that generate a target-compound based network. This method can be used to identify promiscuous ligands, scaffolds and chemotypes and similar ligand structures with vastly different activity. This method has been applied to scaffold hopping to identify different scaffolds effective against the same target, with a new approach to mining a compound activity through mapping target-ligand interactions. Ye Hu has developed Analog Explorer, a graphical approach to explore SAR with maximum common structure visualization (MCS) represented by nodes of graphs which are useful for identifying activity cliffs. These reduced graphical representations identify structurally related scaffolds. Dr. Jane Tseng discusses predictive models for ligand receptor binding. QSAR models are used and contributions of the receptor are neglected. COMFA (COMPARATIVE molecular field analysis) is used when ligand-receptor structures are not available. 4D QSAR uses a conformational ensemble, unlike COMFA, which relies on descriptors calculated as grid point interactions between the target molecule and a probe atom. The 4D in 4D QSAR is a sampling of spatial features. The receptor dependent mode is when the structure of the receptor is known. In this way models are derived from the 3D structures of the multiple ligand-receptor complex conformations. Explicit simulation of the induced fit 4
Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
process, 4D-RD-QSAR, gathers binding interaction energies as a descriptor and is a novel method to process ligands. Dr.Gerald Maggiora, University of Arizona, presents “Non-specificity of Drug-Target Interactions- Consequences for Drug Discovery” focusing on polypharmacology. Polypharmacology databases include STITCH4, Drug2Gene, and Promiscuous. 28% of drugs approved by FDA have polypharmocology as their mechanism of action. Polypharmacology provides for drug repurposing, adverse drug reactions and addressing drug discovery from the biological systems approach. Many drug databases and drug-target databases and analysis of drug target networks reveal that the quality of data in databases is always suspect, indicating that drug target discovery requires well validated targets. Dr. Peter Willett discusses molecularly similarity approaches in cheminformatics from a historical perspective. In the 1990s, graph theory and similarity were used (Johnson and Maggiora book’s,” Concepts and Applications of Molecular Similarity”, based on ACS 1998 Symposium). Harrison at ICI in 1968, developed clustering chemical databases. Adamson and Bush, 1973 and 1975, were the first to use 2D substructure searching features, chemical clustering, searching through databases to cluster similar compounds, as a measure for similarity searching, fragment based similarities, and fingerprinting methods. The Tanimoto coefficient method for assessing molecular similarity and useful in searching and clustering molecular databases was adopted, along with the Jarvis and Patrick nearest-neighbor method. The use of substructure searching fragments began in the mid-1980s at Lederle Labs, Upjohn and Pfizer. In the 1990s combinatory chemistry and HTS began along with the interest in increased compound diversity. The cluster based selection and dissimilarly based selection based on the MaxMin and Kennard –Stone algorithm was employed. Peter Willett pointed out than in the molecular similarity literature searched on Web of Science, 86663 citations were identified (Wendy Warr Report) and Jürgen Bajorath and Peter Willett were identified as the most prolific authors. Dr. Alexandre Varnek presents tools for chemical space analysis, and visualizing chemical space, based upon similarity self-organizing Kohonen maps (SOMs). GTM, Generative Topographic Mapping is an improved extension of SOMs. GTM can be used to determine the probability of finding each molecule on a grid, and used to develop activity landscapes or the probabilities as molecular descriptors to make predictions. Dr. Varnek and colleagues developed ISIDA, in silico design and data analysis descriptors. QSAR models created by GTM and molecular activity can be mapped. These chemical space maps can be used for virtual screens. Stargate GTM (S-GTM) is a method in which GTM connects activity and descriptor space, which can be used to predict pharmacological profiles. Dr. Kimito Funatsu discusses a unifying knowledge based platform for pipeline drug discovery using information to correlate a virtual library and interaction data between drug targets and candidates with product quality data and production data. The objective is automation of virtual library, compound synthesis and process monitoring and control, using an automated soft sensor for process monitoring. Mathematical models (e.g. partial least squares or support vector machines) can be used for chemical target phenotype drug modeling 5
Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
to predict compound-protein interactions. Using information about chemical structures and protein sequences, deep-learning techniques can be trained on these interactions to predict protein-phenotype associations. Dr. Gisbert Schneider discusses de novo drug design and target prediction, using a computer based fragment based design. Dr. Schneider and his group developed software for compound design called DOGS, Design of Genuine Structures. This is a ligand based automated in-silico design of novel bioactive compounds which takes into account synthesizability of the compounds. The compounds are assessed based on a graph kernel measuring similarity to known bioactives. The program SPIDER, is used to identify novel designed drug targets. SPIDER is based on SOM (self-organizing maps) consensus scoring and statistical analysis. Dr. Eugene Lounkine presents applications of three different types of fingerprints. These are fingerprints projecting bioactivity onto chemical fingerprints through the use of molecular similarity and Bayesian models with activity awareness and fingerprints clustering chemical-biological descriptors called high throughput screening fingerprints. Bioturbo similarity searching uses chemical similarity to map biological activity of molecules, which is useful for target prediction. “How many fingers does a compound have?” Dr. Lounkine asks. Projecting bioactivity onto chemical fingerprints, and biological molecular fingerprints provide for heterogeneous similarity methods. Translating Bayesian weights to molecular fingerprints is a method where molecular fingerprints are weighted using a Naïve Bayesian model. Dr. Lounkine discusses how high throughput screening fingerprint expansion can be used to find novel compounds. Dr. Anna Mai Wassermann asks, “Could inactive compounds be good starting points for drug discovery? “In sampling bioactive chemical space, she looks at the inactive compounds and find good candidates among “dark compounds” (i.e. inactive compounds). Analysis of NIH molecular libraries provided differences between actives and dark compounds i.e. less hydrophobic, fewer rings, smaller and more soluble. Could dark matter compounds prove to be valuable leads? She demonstrated that when a dark compound is active it often is more selective. The symposium closed with Dr. Jürgen Bajorath’s presentation on the ligand centric view of promiscuity. Tabulating correlated chemical data from huge datasets is a significant problem. Compound promiscuity defined as the ability of small molecules to specifically interact with multiple targets is the molecular basis of polypharmacology. This can lead to new drug development strategies with multiple targets. Compound promiscuity is increasing slightly over time. Many drug compounds are said to be “promiscuous” in that they have more than one target. The ability of drugs to interact with multiple targets can be exploited, and drugs originally directed at one target can then be successfully repurposed as a starting template for a secondary target. Dr. Bajorath and colleagues have been involved with applying computational methods, such as MMPs, (Matched Molecular Pairs) and graphical methods for the analysis of multi target activity and polypharmacology. MMPs are pairs of compounds that differ at only one site so they have small structural changes, which might lead to large changes in apparent promiscuity (i.e. “promiscuity cliffs”). They can be grouped together and developed into a compound series matrix. This 6
Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by 91.239.24.146 on October 16, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch001
serves to organize compounds with similar structures and correlate them with their multi-target activity. The degree of promiscuity is higher among final drug products than initial screening hits and bioactive compounds. SARMs developed by Jürgen and colleagues have enabled display of SAR data in a more transparent scaffold/functional SAR table. There are many tools and databases available for use in applied drug discovery techniques based on polypharmacology. The cheminformatics approaches and methodologies presented in this volume and at the Skolnik Award Symposium will pave the way for improved efficiency in drug discovery. The lectures and the chapters also reflect the various aspects of scientific enquiry and research interests of the 2015 Herman Skolnik award recipient.
References 1.
DiMasi, J. A.; Grabowski, H. G.; Hansen, R. W. Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs; R&D Cost Study Briefing; Tufts Center for the Study of Drug Development: Boston, MA, November 18, 2014.
7 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.