Application Note pubs.acs.org/jcim
Connection Map for Compounds (CMC): A Server for Combinatorial Drug Toxicity and Efficacy Analysis Lei Liu,† Maria Tsompana,‡ Yong Wang,§ Dingfeng Wu,† Lixin Zhu,*,∥,⊥,# and Ruixin Zhu*,† †
Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, People’s Repubic of China ‡ Center of Excellence in Bioinformatics and Life Sciences, the State University of New York at Buffalo, Buffalo, New York 14203, United States § Basic Medical College, Beijing University of Chinese Medicine, Beijing 100029, People’s Republic of China ∥ Digestive Diseases and Nutrition Center, Department of Pediatrics, The State University of New York at Buffalo, Buffalo, New York 14260, United States ⊥ Genome, Environment, and Microbiome Community of Excellence, The State University of New York at Buffalo, Buffalo, New York 14214, United States # Institute of Digestive Diseases, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200032, People’s Republic of China S Supporting Information *
ABSTRACT: Drug discovery and development is a costly and time-consuming process with a high risk for failure resulting primarily from a drug’s associated clinical safety and efficacy potential. Identifying and eliminating inapt candidate drugs as early as possible is an effective way for reducing unnecessary costs, but limited analytical tools are currently available for this purpose. Recent growth in the area of toxicogenomics and pharmacogenomics has provided with a vast amount of drug expression microarray data. Web servers such as CMap and LTMap have used this information to evaluate drug toxicity and mechanisms of action independently; however, their wider applicability has been limited by the lack of a combinatorial drug-safety type of analysis. Using available genome-wide drug transcriptional expression profiles, we developed the first web server for combinatorial evaluation of toxicity and efficacy of candidate drugs named “Connection Map for Compounds” (CMC). Using CMC, researchers can initially compare their query drug gene signatures with prebuilt gene profiles generated from two large-scale toxicogenomics databases, and subsequently perform a drug efficacy analysis for identification of known mechanisms of drug action or generation of new predictions. CMC provides a novel approach for drug repositioning and early evaluation in drug discovery with its unique combination of toxicity and efficacy analyses, expansibility of data and algorithms, and customization of reference gene profiles. CMC can be freely accessed at http://cadd.tongji.edu.cn/webserver/CMCbp.jsp.
1. INTRODUCTION Drug discovery and development is a costly and timeconsuming process. Bringing a single drug to the market takes an average of 10 to 17 years with an approximate cost of $868 million to $1.24 billion including the cost of failed attempts.1−3 The success rate of promising drug candidates from phase I clinical trials to registration is less than 10% and failures are due to many aspects, primarily associated with a drug’s clinical safety and efficacy.4 Thus, eliminating drugs destined to fail as early as possible in the drug development process (i.e., “earlier detection, earlier elimination”) can effectively save energy and costs of meaningless pursuits. The advancement of toxicogenomics and pharmacogenomics benefit drug discovery as well as drug repositioning. Also, high© 2016 American Chemical Society
throughput technologies, such as microarrays, have proved for more than a decade to be reliable methods for identifying differentially expressed genes as molecular biomarkers. On the basis of the accumulation of microarray data, several web servers such as the Connection Map (CMap)5 and LTMap6 were developed to identify potential mechanisms of drug action or toxicity in an independent manner. CMap was first developed as a database of genome-wide transcriptional expression profiles of bioactive small molecules from cultured human cell lines and utilizes a pattern-matching algorithm to detect similarities among genomic signatures. It has been Received: July 7, 2016 Published: August 10, 2016 1615
DOI: 10.1021/acs.jcim.6b00397 J. Chem. Inf. Model. 2016, 56, 1615−1621
Application Note
Journal of Chemical Information and Modeling
the R package affy.20 Probes for each reference’s instance were ranked based on values of the logarithm of fold-change of the ratio of average value of the treatment group to the corresponding controls in ascending order. P-values of the data preprocessed by RMA were calculated using the R package limma.21 P-values of the mas5-preprocessed data were generated with the Wilcoxon rank sum test. To speed up the calculation, CMC offers two additional types of prebuilt reference database besides the full probe lists: (1) One contains the top N and bottom N of probes, which are sorted by their fold-change (where N = 500, 1000, 1500 and 2000). (2) The other is the differentially expressed probes (where | log(fold-change)| > 1 and P-value < 0.05). 2.2.2. Pattern-Matching Algorithms in Toxicity and Efficacy Analysis. Two pattern-matching algorithms are available in CMC: (1) a nonparametric, rank-based Kolmogorov−Smirnov statistic5 and (2) a simple, robust method proposed by Zhang and Gant.18 Compared to CMap that treats upregulated and downregulated genes separately, the method of Zhang et al. sorts genes in a reference instance by their absolute value of fold-change, thus placing the most differentially expressed genes at the top of the list. 2.2.3. Toxicity Score. For each reference instance with histopathological information, the severity grades of every pathological change were sorted in ascending order. With N(x) being the number of pathological findings whose histopathological grades are x, we define the following T-score.
widely used to predict novel drug indications, such as discovering repurposed drug activities against common diseases, including diabetes,7,8 Alzheimer̀s disease,9 solid tumors such as breast cancer10,11 and inflammatory bowel diseases.12 Based on CMap, LTMap was focused on safety drug assessment especially liver drug toxicity. LTMap compares signatures of a biological state induced by a query compound with reference signatures derived from Open TG-GATEs,13,14 a toxicogenomics database developed by the Japanese Toxicogenomics Project, to determine similar toxicological changes between drugs. Though such web servers have achieved a good degree of success, they can be improved in various aspects. Recent consortia of toxicogenomics, such as the Japanese Toxicogenomics Project,14 the InnoMed PredTox project of the EU,15 the Tox21 project,16 and the DrugMatrix database17 have provided with a massive amount of data that could be used as a reference database for drug safety assessment. In particular, DrugMatrix, a large-scale, high-quality and well-designed toxicogenomics database hosted by the National Toxicology Program in United States, contains profiles derived from over 600 compounds and up to 7 different tissues, and offers histopathological information essential for determining drug toxicity degrees. Also, since the development of CMap several studies have been conducted using pattern-matching algorithms for the independent assessment of drug efficacy, providing with a wider variety of choices.18,19 However, performing toxicity or efficacy analysis alone can no longer meet most users’ demands. As an example, the increase in availability of genomic data in traditional Chinese medicine (TCM) has enabled the analysis of both the potential efficacy and toxicity of a TCM. Using currently available analytical tools, such as LTMap that cannot perform efficacy predictions for its limited data, and CMap that is unable to provide with concise toxicity predictions, the above demand cannot be met. To address this limitation, we built a web server that can perform integrated drug toxicity assessment and efficacy analysis based on the principle of “earlier detection, earlier elimination”.
T ‐score =
∑ 0≤x