Platform for Unified Molecular Analysis: PUMA - Journal of Chemical

Mariana González-Medina and José L. Medina-Franco ... Eduardo M. Cortés-Ruiz , Oscar Palomino-Hernández , Karla Daniela Rodríguez-Hernández , Be...
0 downloads 0 Views 2MB Size
Application Note pubs.acs.org/jcim

Platform for Unified Molecular Analysis: PUMA Mariana González-Medina* and José L. Medina-Franco* School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico S Supporting Information *

ABSTRACT: We introduce a free platform for chemoinformatic-based diversity analysis and visualization of chemical space of user supplied data sets. Platform for Unified Molecular Analysis (PUMA) integrates metrics used to characterize compound databases including visualization of chemical space, scaffold content, and analysis of chemical diversity. The user’s input is a file with SMILES, database names, and compound IDs. PUMA computes molecular properties of pharmaceutical relevance, Murcko scaffolds, and diversity analysis. The user can interactively navigate through the graphs and export image files and the raw data of the diversity calculations. The platform links two public online resources: Consensus Diversity Plots for the assessment of global diversity and Activity Landscape Plotter to analyze structure−activity relationships. Herein, we describe the functionalities of PUMA and exemplify its use through the analysis of compound databases of general interest. PUMA is freely accessible at the authors web-site https://www.difacquim.com/d-tools/.



INTRODUCTION Analyses of chemical space, scaffold content, and structure diversity are performed frequently to guide compound acquisition and select databases for screening and lead optimization.1 Since chemical space and molecular diversity depend on molecular representation2 different criteria should be considered, including molecular properties, molecular fingerprints, and scaffolds. Several software and standalone packages are available to perform cheminformatic analysis of compound databases.3 However, standalone applications may be difficult to install or use, and robust cheminformatics software can be expensive. Online resources have advantages over software and standalone applications. For instance, web servers are often freely available, relatively easy to use, and do not require installation. A limitation of open online resources is that they are not generally suited to analyze sensible data or information with potential intellectual property issues. However, some free online tools, such as the one described in this work, do not store any information provided by the user: the files uploaded and generated, are deleted from the server after the user closes the tool. Most web services developed so far have been implemented to perform molecular format interconversions, activity and ADMET predictions,4 and ligand binding analyses. These tools are reviewed elsewhere.5 For instance, ChemMine6 and ChemBioServer7 are used for structure comparisons, similarity searching, compound clustering, and prediction of chemical properties. However, these resources are mostly focused on ligand-based virtual screening rather than diversity analysis. To our knowledge, free online tools that bundle visualization of chemical space and diversity analysis using molecular properties, fingerprints, and scaffolds are scarce. The objective of this work is to describe the implementation and features of Platform for Unified Molecular Analysis, PUMA. © 2017 American Chemical Society

This is a free web server for comprehensive diversity analysis of compound databases. No prior programming experience is required to use it. The major features of PUMA are illustrated with a diverse public screening database with compounds for neglected diseases8−10 and three additional databases of current interest in several projects: approved drugs to treat cancer, compounds focused on epigenetic targets, and compounds Generally Recognized as Safe (GRAS), used in the food industry.11



IMPLEMENTATION AND FEATURES PUMA characterizes compound data sets using molecular properties, scaffolds, and structural fingerprints. The platform performs six major analyses: Chemical Space, Properties Statistics, Properties Similarity or Distance, Cyclic System Recovery (CSR) Curves, Scaled Shannon Entropy (SSE) with scaffolds, and Cumulative Distribution Function (CDF) of pairwise similarity values computed with fingerprints. Table 1 summarizes the major functions in the current version of PUMA. The user interface was implemented with the R package, Shiny.12 All fingerprints, molecular properties, and scaffolds are computed using the R package rcdk.13 The data is plotted using the R package ggplot2 and all the plots are interactive using the R package plotly.14 The user can zoom in and out, select and unselect databases, and use tooltips to identify data points on a plot. The input uploaded to PUMA must be a comma delimited or csv file with three columns: SMILES, database names, and compound IDs. The major functions of PUMA are explained below, and the functions are further explained in the User’s Guide in the Supporting Information. Received: May 5, 2017 Published: July 24, 2017 1735

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740

Application Note

Journal of Chemical Information and Modeling Table 1. Main Functions Available in the Current Version of PUMA function chemical space

properties of pharmaceutical relevance

scaffold diversity

fingerprint similarity

D-Tools a

description of outputa

features

three files: 1. six principal components (PCs) and molecular properties 2. contribution of each property to each PC 3. covariance recovered by the six PCs 2D plot as .tiff image statistics two files: 1. each property statistics 2. compounds that fulfill Lipinski and Veber’s rules 3. compounds that violate no more than one of Lipinski’s rules density plot, boxplot, or histogram for each molecular property similarity and distance two files: 1. pairwise Euclidean similarity or Tanimoto distance 2. inter- and intradatabase comparisons 3. diverse subset with compounds that had the largest distance to the other compounds image with similarity/distance matrix cyclic system recovery (CSR) curves three files: 1. Murcko scaffolds and the IDs assigned to each scaffold 2. metrics obtained from the CSR curves 3. CSR curve image scaled Shannon entropy (SSE) two files: 1. Murcko scaffolds and the IDs assigned to each scaffold 2. SSEn, where n takes a value from 10 to 60 3. diverse subset with unique scaffolds frequency plot with the n most populated scaffolds cumulative distribution function (CDF) file with intradatabase similarity statistics image with CDF curves Integration with Online Tools consensus diversity plots 2D plots to simultaneously visualize up to three diversity metrics activity landscape plotter structure−activity relationship analysis, SAS and DAD maps principal component analysis, 2 and 3 dimension visualizations

ref 18

19

2

20

21

22

11 23

All plots can be exported as 800 × 800, 900 × 800, or 900 × 900 pixels tiff images. All data files can be downloaded as csv.

buttons “Download drug-like compounds” and “Download subset”. PropertiesSimilarity and Distance. PUMA computes pairwise inter and intradatabase chemical diversity of the six molecular properties with Euclidean distance and Tanimoto similarity. A distance/similarity matrix (depending on the user’s selection) is generated with the mean inter- and intradatabase values. The matrix is color coded from gray (low values) to red (high values). Matrices can be downloaded as 800 × 800 tiff. The user can download the raw data using “Download the pairwise results” and the inter- and intradatabase results with “Download data sets results”. Using the button “Download diverse subset”, the user can download a diverse subset, i.e., those compounds that had the largest distances to the other compounds in the databases. Further details are in the User’s Guide. ScaffoldsCyclic System Recovery (CSR) Curves. Scaffold diversity analysis has multiple applications in drug discovery including library design, compounds acquisition, virtual screening and assessment of structure−activity relationships (SAR). PUMA uses rcdk function “get.murcko.fragments” to compute the Murcko ring or largest framework of all the cyclic compounds. To analyze the scaffold diversity, compounds with the same scaffold and all acyclic systems are given the same scaffold_id. To generate the CSR curves, the fraction of chemotypes is plotted on the x-axis and the fraction of compounds that contain those chemotypes is plotted on the yaxis. Acyclic compounds are included to compute the CSR curves. With the “Download scaffolds” button the user generates

Molecular Properties. Molecular properties are often used to evaluate drug-likeness and filter compounds.15,16 PUMA computes six molecular properties of pharmaceutical relevance:17 molecular weight (MW), hydrogen bond donors (nHBDon), hydrogen bond acceptors (nHBAcc), topological polar surface area (TopoPSA), number of rotatable bonds (nRotB), and the octanol−water partition coefficient (ALogP). Chemical Space. Principal component analysis (PCA) facilitates data visualization.18 PUMA uses the R function “prcomp” to scale the six molecular properties to mean cero and variance one and compute six principal components (PCs). The user can choose which PCs to plot in two or three dimensions (2D/3D) and download the plot as a 900 × 800 tiff image, using the “Download image” button. The PCs and molecular properties computed for each compound can be downloaded as a data file using the “Download PCs and properties” button. The “Download PCA loadings” and “Download PCA summary” buttons download the data with the contribution of each property to each PC, and the covariance of the PCs, respectively. PropertiesStatistics. In this tab the user can visualize the distribution of each property with a density plot, boxplot or histogram. All plots can be downloaded as 900 × 900 tiff image and the statistics are downloaded using the “Download properties stats” button. In addition, the user can filter the compounds on a database and download only those that fulfill Lipinki’s and Veber’s rules, as well as those that violate no more than one of Lipinki’s rules. These two filtering are done using the 1736

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740

Application Note

Journal of Chemical Information and Modeling

Figure 1. 3D representation of the chemical space visualized on PUMA using the “Chemical Space” tab. On the website, the use can rotate the 3D image and select points (compounds).

downloaded using the “Download the similarity statistics” button. The user can download an image of the CDF as a 900 × 800 tiff. D-Tools. PUMA is integrated with two free online services that all together form part of D-Tools (available at www. difacquim.com). The user can access the Consensus Diversity Plots (CDPs)11 to compare the global diversity of databases and Activity Landscape Plotter23 to analyze SAR.

a data file with the SMILES, database name, and ID as well as the scaffolds and the scaffold_id. The area under the curve (AUC) and the fraction of chemotypes required to retrieve 50% of the molecules (F50) can be obtained from the CSR curves to quantify the scaffold diversity of databases.19,20,24 Using “Download the CSR curves data”, the user accesses a file with the number of compounds in each database (M), the number of different chemotypes (N), the fraction of chemotypes over the number of compounds (FNM), the number of chemotypes containing only one compound (singletons) (NSING), the fraction of singletons over the number of compounds and over the number of different chemotypes (FNSING.M and FNSING.N, respectively). The plot can be downloaded as a 900 × 800 tiff. ScaffoldsScaled Shannon Entropy (SSE). SSE is used to measure the scaffold diversity of the compounds in the n most populated scaffolds.21 SSE values range from 0, when all the molecules in the database contain only one chemotype; to 1, indicating maximum diversity within the n chemotypes (see also the User’s Guide). The “Download the SSE data” button generates a data file with the SSE of the n most populated scaffolds (n goes from 10 to 60). The file downloaded with “Download the scaffolds and ID” includes the original SMILES, database name, compound ID, scaffold, and scaffold_id. The plots depict the frequency of the n most populated scaffolds on a user’s selected database. The number assigned to each bar on the histogram is the scaffold_id. The user can identify which number is assigned to each chemotype by downloading the file using “Download the scaffolds and ID”. A file containing all the different scaffolds in the databases can be downloaded using “Download unique scaffolds”. The image can be downloaded as a 900 × 800 tiff. Fingerprint Diversity. The concept of structural similarity is used to predict molecular properties and biological activity.22 On this tab the user can choose between three molecular fingerprints from rcdk package to calculate all pairwise similarities: ECFP with a diameter of four or six (chosen by the user), PubChem (881-bits), and MACCS keys (166-bit). The current version of PUMA uses the Tanimoto index to quantify the similarity values. A file with the intradatabase similarity statistics can be



RESULTS AND DISCUSSION We illustrate the use of PUMA by showing the analysis of five 96well plates with diverse compounds from Pathogen Box, downloaded from www.pathogenbox.org. Pathogen Box is public collection of 400 compounds that are part of a current worldwide effort to find active compounds for neglected diseases. We also show the application of PUMA to analyze three data sets previously reported: GRAS, compounds focused on epigenetic targets (Epigenetic_focused) and approved drugs to treat cancer (FDA-oncology).11 These databases were curated using Molecular Operating Environment. Briefly, salts were removed, the charges in the molecules were neutralized, the largest fragments were kept and compounds containing metals and metalloids or duplicates in each data set were removed (further details of the data sets are in Table S1 and Figure S1). Chemical Space. The output file downloaded with “Download PCA summary” (Table S2) shows that the first three PCs of the three databases retrieved 89.7% of the variance. Table S3, downloaded using “Download PCA loadings”, indicates that nHBAcc, AlogP, and nRotB have the highest contribution to the first, second, and third PCs, respectively. Figure 1 illustrates a 3D representation of the chemical space generated with PUMA. Overall, the compounds in GRAS are clustered in the same area while the FDA-oncology compounds occupy different areas indicating low and high diversity, respectively. Compounds in Epigenetic_focused are between GRAS and the FDA-oncology compounds, while GRAS is in a different area of the chemical space of the FDA-oncology compounds. Figure S2 shows 2D representation of the chemical space of Pathogen Box generated 1737

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740

Application Note

Journal of Chemical Information and Modeling

Figure 2. Molecular property analysis generated with PUMA. (A) Density plot of hydrogen bond acceptors (nHBAcc). The plot was generated with the “PropertiesStatistics” tab. (B) Distance matrix calculated with Euclidean distance of six molecular properties; databases in dark red are the most diverse. The matrix was generated with the “PropertiesSimilarity and Distance” tab.

Figure 3. Examples of scaffold analysis and fingerprint-based diversity using PUMA. (A) Plot generated using the “ScaffoldsCyclic System Recovery (CSR) Curves” tab: scaffold analysis of the data sets epigenetic_focused, FDA_oncology, and GRAS. (B) Cumulative distribution function using the “Fingerprint Diversity” tab: pairwise similarity values of the compounds in Pathogen Box using PubChem fingerprints/Tanimoto.

with PUMA. Of note, the 2D and 3D plots available in PUMA are interactive. Chemical Diversity. PUMA readily revealed that, on average, all compounds in Pathogen Box have drug-like properties and low chemical diversity with Tanimoto values close to 0.9 (Figure S3A). Table S4 in the Supporting Information summarizes the statistics of the six molecular properties computed with PUMA for all the databases. The compounds in GRAS have lower MW, nRotB, nHBAcc, nHBDon, and TPSA than FDA-oncologic compounds, which have the highest molecular properties values. All the data sets have similar AlogP distribution. Figure 2A shows a density plot with nHBAcc, all the properties have a similar distribution. Figure 2B depicts the distance matrix of GRAS, FDA_oncology, and Epigenetic_focused data sets using molecular proper-

ties. The matrix illustrates that FDA_oncology is different and more diverse than GRAS with intralibrary distances of 3.47 and 1.73 and an interlibrary distance of 4.37 (this is in agreement with the visualization of chemical space in Figure 1). For this example, most of the compounds that violate Lipinski’s and Veber’s rules were removed from FDA-oncology using “Download drug-like subset”. Scaffold Diversity. The results obtained using this tab in PUMA revealed a large scaffold diversity of Pathogen Box and indicated that most of the compounds are cyclic. The summary statistics on Table S5 suggest that there is almost one scaffold for each compound (e.g., AUCs close to 0.5 and the diverse subset 1738

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740

Application Note

Journal of Chemical Information and Modeling

Figure 4. Example of integration of PUMA in D-Tools. (A) Global diversity of Pathogen Box visualized with CDPs. Each data point represents a plate. The fraction of chemotypes over the number of molecules is plotted on the y-axis, and the median of the pairwise MACCS keys/Tanimoto similarity is plotted on the x-axis. Data points are colored by the mean Euclidean distance of six physicochemical properties. Plates in red are the most diverse, while plates in green are the least diverse. (B) SAS map generated with Activity Landscape Plotter. The pairwise similarity (x-axis) was calculated with PubChem fingerprints/Tanimoto; the activity difference (y-axis) was generated using the antimalarial activity reported in Pathogen Box against the strain Dd2.

downloaded using “Download unique scaffolds” includes 93% of the compounds in the database). All compounds in Plate A have a different scaffold e.g., SSE of 1, while the other plates have three or less compounds with the same scaffold. Figure 3A shows the CSR curves generated with PUMA for GRAS, FDA_oncology, and Epigenetic_focused. The results obtained with the new platform are in agreement with previous analysis reported for these data sets,11 e.g., FDA oncology and GRAS are the most and least diverse databases in terms of scaffolds with AUCs of 0.55 and 0.84 and SSE30 of 0.98 and 0.622 (Table S5). This is because GRAS has mostly acyclic systems with the ID 10 (Figure S3B). Fingerprints Diversity. The results obtained using this tab in PUMA indicate that Plate E in Pathogen Box is the most diverse with MACCS keys, ECFP4, and PubChem median similarity of 0.370, 0.095, and 0.466, respectively (Table S7). Figure 3B depicts the CDF of the pairwise PubChem fingerprints/Tanimoto similarity. The CDF for plate D shows that the intradatabase pairwise similarity values are higher for this database (median similarity of 0.539). Similar results were observed with MACCS keys and ECFP4 (Figures S4 and S5). The fingerprint-based diversity analysis performed with PUMA were in agreement with previous observations that GRAS structures are diverse.11 This data set was the most diverse with MACCS keys, ECFP4, and PubChem median similarity of 0.133, 0.048, and 0.16 (Table S7 and Figures S4 and S5). D-Tools. Consensus Diversity Plots (CDPs). The results indicate that Plates B and E are the most diverse in terms of properties; Plate A has more different scaffolds, and Plates E and B are the most diverse considering fingerprints. Multiple diversity criteria are visualized in CDPs, Figure 4A shows a CDP with FNM on the y-axis and MACCS keys/Tanimoto similarity median on the x-axis. The CDP shows that among all five plates compounds in Plate E are, overall, the most diverse while compounds in Plate D are the least diverse. All the

information required to generate the CDPs can be obtained from PUMA. Activity Landscape Plotter. Pathogen Box contains screening activity data. To exemplify the integration of PUMA and activity landscape analysis in D-Tools, herein we used Activity Landscape Plotter to explore the SAR of 123 compounds tested for antimalarial activity against three strains of Plasmodium falciparum: 3D7, Dd2, and W2. As expected from such a structurally diverse database no significant activity cliffs were found. However, we identified the compound with ID MMV667494 as the most active compound toward the strain 3D7 (IC50 of 0.007 μM). Figure 4B illustrates a structure− activity similarity map generated with PubChem fingerprint using the activity reported against the strain Dd2.



CONCLUSIONS AND FUTURE DEVELOPMENTS PUMA is an online service that integrates, in a single platform, several methods and analysis to characterize user-supplied chemical databases in terms of molecular properties of pharmaceutical interest, scaffold content and diversity, fingerprint-based diversity, and visual representation the chemical space. Through D-Tools, PUMA is linked to other free online services that analyze the global diversity of databases and activity landscape modeling. The analysis of Pathogen Box, FDAoncology, Epigenenic_focused, and GRAS illustrated in this manuscript represents a real example of how PUMA can rapidly translate information encoded on the SMILES into information on chemical space and diversity. A next logical step is to use PUMA to analyze the diversity of other compound data sets. Future developments in PUMA include adding visualization methods of the chemical space other than PCA and analysis of molecular complexity.25 We offer other research groups the possibility to integrate their methods to D-Tools in order to enrich online chemoinformatic analyses. The nonshiny scripts used by PUMA are available upon request. 1739

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740

Application Note

Journal of Chemical Information and Modeling



(11) González-Medina, M.; Prieto-Martínez, F. D.; Owen, J. R.; Medina-Franco, J. L. Consensus Diversity Plots: A Global Diversity Analysis of Chemical Libraries. J. Cheminf. 2016, 8 (1), 63. (12) Shinyapps.io by RStudio. http://www.shinyapps.io/ (accessed June 20, 2017). (13) Guha, R. Chemical Informatics Functionality in R. J. Stat. Softw. 2007, 18, 1−16. (14) Inc, P. T. Collaborative data science. https://plot.ly/ (accessed June 20, 2017). (15) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 2001, 46 (1−3), 3−26. (16) Veber, D. F.; Johnson, S. R.; Cheng, H. Y.; Smith, B. R.; Ward, K. W.; Kopple, K. D. Molecular Properties that Influence the Oral Bioavailability of Drug Candidates. J. Med. Chem. 2002, 45 (12), 2615− 2623. (17) Keller, T. H.; Pichota, A.; Yin, Z. A Practical View of ‘Druggability’. Curr. Opin. Chem. Biol. 2006, 10 (4), 357−361. (18) Wenderski, T. A.; Stratton, C. F.; Bauer, R. A.; Kopp, F.; Tan, D. S. Principal Component Analysis as a Tool for Library Design: A Case Study Investigating Natural Products, Brand-Name Drugs, Natural Product-Like Libraries, and Drug-Like Libraries. Methods Mol. Biol. 2015, 1263, 225−242. (19) Prieto-Martinez, F. D.; Gortari, E. F.-d.; Mendez-Lucio, O.; Medina-Franco, J. L. A Chemical Space Odyssey of Inhibitors of Histone Deacetylases and Bromodomains. RSC Adv. 2016, 6 (61), 56225− 56239. (20) Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F.; Schenck, R. J.; Trippe, A. J. Structural Diversity of Organic Chemistry. A Scaffold Analysis of the CAS Registry. J. Org. Chem. 2008, 73 (12), 4443−4451. (21) Medina-Franco, J. L.; Martínez-Mayorga, K.; Bender, A.; Scior, T. Scaffold Diversity Analysis of Compound Data Sets Using an EntropyBased Measure. QSAR Comb. Sci. 2009, 28 (11−12), 1551−1560. (22) Medina-Franco, J. L.; Maggiora, G. M. Molecular Similarity Analysis. In Chemoinformatics for Drug Discovery; John Wiley & Sons, Inc: 2013; pp 343−399. (23) Gonzalez-Medina, M.; Mendez-Lucio, O.; Medina-Franco, J. L. Activity Landscape Plotter: A Web-Based Application for the Analysis of Structure-Activity Relationships. J. Chem. Inf. Model. 2017, 57 (3), 397− 402. (24) González-Medina, M.; Owen, J.; El-Elimat, T.; Pearce, C.; Oberlies, N.; Figueroa, M.; Medina-Franco, J. Scaffold Diversity of Fungal Metabolites. Front. Pharmacol. 2017, 8, 180. (25) Méndez-Lucio, O.; Medina-Franco, J. L. The Many Roles of Molecular Complexity In Drug Discovery. Drug Discovery Today 2017, 22 (1), 120−126.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00253.



Tables S1−S9 and Figures S1−S5 as mentioned in the text (PDF) User’s guide (PDF) Examples of input and output data files (ZIP)

AUTHOR INFORMATION

Corresponding Authors

*Phone: +5255-5622-3899 ext. 44458. E-mail: mgm_14392@ hotmail.com (M.G.-M.). *E-mail: [email protected] (J.L.M.-F.). ORCID

Mariana González-Medina: 0000-0001-7365-939X José L. Medina-Franco: 0000-0003-4940-1107 Funding

This work was supported by the Universidad Nacional Autónoma de México (UNAM) [grant PAPIME PE200116] and the Programa de Apoyo a la Investigación y el Posgrado (PAIP) ́ [grant 5000−9163], Facultad de Quimica, UNAM. We thank Lilia González-Medina for technical support and Oscar Palomino-Hernández for helpful discussions. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Medina-Franco, J. L. Interrogating Novel Areas of Chemical Space for Drug Discovery using Chemoinformatics. Drug Dev. Res. 2012, 73 (7), 430−438. (2) Sheridan, R. P.; Kearsley, S. K. Why do we Need so Many Chemical Similarity Search Methods? Drug Discovery Today 2002, 7 (17), 903− 911. (3) Villoutreix, B. O.; Lagorce, D.; Labbé, C. M.; Sperandio, O.; Miteva, M. A. One Hundred Thousand Mouse Clicks Down the Road: Selected Online Resources Supporting Drug Discovery Collected Over a Decade. Drug Discovery Today 2013, 18 (21−22), 1081−1089. (4) Tetko, I. V.; Maran, U.; Tropsha, A. Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development. Mol. Inf. 2017, 36 (3), 1600082. (5) Liao, C.; Sitzmann, M.; Pugliese, A.; Nicklaus, M. C. Software and Resources for Computational Medicinal Chemistry. Future Med. Chem. 2011, 3 (8), 1057−1085. (6) Backman, T. W. H.; Cao, Y.; Girke, T. ChemMine Tools: An Online Service for Analyzing and Clustering Small Molecules. Nucleic Acids Res. 2011, 39, W486. (7) Athanasiadis, E.; Cournia, Z.; Spyrou, G. ChemBioServer: A WebBased Pipeline for Filtering, Clustering and Visualization of Chemical Compounds Used in Drug Discovery. Bioinformatics 2012, 28 (22), 3002−3003. (8) Vila, T.; Lopez-Ribot, J. L. Screening the Pathogen Box for Identification of Candida albicans Biofilm Inhibitors. Antimicrob. Agents Chemother. 2017, 61 (1), e02006−16. (9) Preston, S.; Jiao, Y.; Jabbar, A.; McGee, S. L.; Laleu, B.; Willis, P.; Wells, T. N. C.; Gasser, R. B. Screening of The ‘Pathogen Box’ Identifies an Approved Pesticide with Major Anthelmintic Activity Against the Barber’s Pole Worm. Int. J. Parasitol.: Drugs Drug Resist. 2016, 6 (3), 329−334. (10) Mayer, F. L.; Kronstad, J. W. Discovery of a Novel Antifungal Agent in the Pathogen Box. mSphere 2017, 2 (2), e00120−17. 1740

DOI: 10.1021/acs.jcim.7b00253 J. Chem. Inf. Model. 2017, 57, 1735−1740