Structural Analysis and Identification of Colloidal Aggregators in Drug

19 hours ago - Journals A-Z · Books and Reference · Advertising Media Kit · Institutional Sales · ACS Publishing Center · Privacy Policy · Terms of Us...
0 downloads 0 Views 3MB Size
Article Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

pubs.acs.org/jcim

Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery Zi-Yi Yang,† Zhi-Jiang Yang,† Jie Dong,‡ Liang-Liang Wang,§ Liu-Xia Zhang,† Jun-Jie Ding,§ Xiao-Qin Ding,§ Ai-Ping Lu,∥ Ting-Jun Hou,*,⊥ and Dong-Sheng Cao*,†,∥ †

Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China Central South University of Forestry and Technology, Changsha 410004, People’s Republic of China § Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China ∥ Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region, People’s Republic of China ⊥ College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, People’s Republic of China

Downloaded via NOTTINGHAM TRENT UNIV on August 28, 2019 at 11:23:27 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Aggregation has been posing a great challenge in drug discovery. Current computational approaches aiming to filter out aggregated molecules based on their similarity to known aggregators, such as Aggregator Advisor, have low prediction accuracy, and therefore development of reliable in silico models to detect aggregators is highly desirable. In this study, we built a data set consisting of 12 119 aggregators and 24 172 drugs or drug candidates and then developed a group of classification models based on the combination of two ensemble learning approaches and five types of molecular representations. The best model yielded an accuracy of 0.950 and an area under the curve (AUC) value of 0.987 for the training set, and an accuracy of 0.937 and an AUC of 0.976 for the test set. The best model also gave reliable predictions to the external validation set with 5681 aggregators since 80% of molecules were predicted to be aggregators with a prediction probability higher than 0.9. More importantly, we explored the relationship between colloidal aggregation and molecular features, and generalized a set of simple rules to detect aggregators. Molecular features, such as log D, the number of hydroxyl groups, the number of aromatic carbons attached to a hydrogen atom, and the number of sulfur atoms in aromatic heterocycles, would be helpful to distinguish aggregators from nonaggregators. A comparison with numerous existing druglikeness and aggregation filtering rules and models used in virtual screening verified the high reliability of the model and rules proposed in this study. We also used the model to screen several curated chemical databases, and almost 20% of molecules in the evaluated databases were predicted as aggregators, highlighting the potential high risk of aggregation in screening. Finally, we developed an online Web server of ChemAGG (http://admet.scbdd.com/ChemAGG/index), which offers a freely available tool to detect aggregators.



INTRODUCTION Drug discovery and development is a time-consuming and expensive process. In 2018, the U.S. Food and Drug Administration (FDA) approved 59 new drugs, breaking its record of 53 drugs approved in 1996.1 Despite the growth of the number of approved drugs, the research and development (R&D) investment per drug reaches as much as $2.8 billion, with a high rate of failure and decrease in market profit.2 Highthroughput screening (HTS), which can rapidly test hundreds of thousands to even millions of compounds, has become a popular approach to identify initial hit compounds for further medicinal chemistry optimization. However, researchers have found that only a minority of the identified hits are useful, while many hits derived their activities from undesirable mechanisms.3,4 Most false positives are caused by the © XXXX American Chemical Society

nonspecific interactions between colloidal aggregators by false hits and biological targets.5,6 Ferreira et al. have found that aggregators cover 88% of the false positives, while only 3.1% of the false positives identified by the quantitative HTS (qHTS) are caused by fluorescence interference.7 Babaoglu et al. have also suggested that in the qHTS of β-lactamase (a model enzyme commonly used to detect nonspecific protein adsorption and inhibition of enzymes), 95% of the positives are aggregators and only less than 4% of the positives are true hits in the final validation.8 Apparently, it is of high importance to filter out aggregators in the early phase of drug discovery. Received: July 3, 2019 Published: August 20, 2019 A

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

machine (SVM) model based on a data set with 1319 aggregators and 128 325 nonaggregators, and the average prediction accuracy of the 5-fold cross-validation for the aggregators is relatively low (77.8%).25 Moreover, there are some frequently used rules or models designed for the identification of potentially problematic fragments, but their abilities to detect aggregators have not been validated. For example, Blake et al. developed a set of substructures to filter out reactive molecules.26 Based on the alternative pooling strategy, Hann and colleagues developed a set of filters to remove reactive molecules.27 Huth et al. and Metz et al. used an experimental method called ALARM NMR to detect false positives in biochemical screens and built a set of chemical rules based on 3500 compounds to filter out thiolreactive compounds.28,29 The ALARM NMR rules are expected to detect aggregators to some extent since the nuclear magnetic resonance (NMR) method can triage aggregate bonds. Baell et al. devised a set of 480 substructures that encode the molecular substructures of pan-assay interference compounds (PAINS) from HTS data.30 However, the ability of the PAINS substructures to detect aggregators may be poor because their HTS campaigns operate in the presence of a detergent and casein. Lagorce and co-workers provided a Web server called FAF-Drugs4 to calculate molecular properties,31 and this Web server can also predict aggregators with the structural alerts summarized from the structures of 312 known aggregates. Several rules32−36 and druglikeness models37 have been developed and widely used to screen molecules in early drug discovery. However, their abilities to detect aggregators need further assessment. The primary aim of this study is to develop machine learning models to distinguish aggregators from nonaggregators with high confidence. To achieve this goal, we used five types of molecular descriptors and their combinations with two ensemble learning methods to build reliable classification models. More importantly, we aimed to find the most relevant molecular features to aggregation and summarize them as a set of rules. Then, we compared the ability of our models with other existing filtering rules and models used in drug screening. Finally, a Web server to identify aggregators was developed and is freely available to the community.

Colloidal aggregators, ranging from 60 to 300 nm, are formed by small organic molecules in aqueous solution. Colloidal aggregators can nonspecifically inhibit enzymes by destabilizing or partially denaturing these proteins.9,10 Aggregators have a critical aggregation concentration (CAC) similar to a critical micelle concentration (cmc).11 In addition, like other nanoemulsions, they also have several properties, such as particulate nature, concentration-dependent and reversible formation, and macromolecule adsorption. The above-mentioned properties explain the common features of HTS false positives: flat structure−activity relationships and steep hill coefficients in concentration−response curves.12 These properties also limit the bioavailability and drugability of aggregators in drug discovery. Several studies have indicated that aggregators with a radius over 250 nm have poor in vivo absorption.13,14 Owen et al. investigated the diffusion of the colloid-forming dye Evans blue into cells and found that the molecules in the aggregated state cannot pass the membrane, but they can pass easily in the free state.15 Additionally, the presence of colloidal aggregation has an impact on drug activity.16 Though several aggregated molecules have become clinically approved drugs, such as felodipine and benzyl benzoate, poor water solubility and bioavailability limit their use, and thus require more optimization compared with other nonaggregated drugs. In conclusion, in most cases, it is more efficient to filter out aggregated compounds as early as possible. To mitigate aggregators, several experimental techniques have been developed based on the particulate nature of colloidal aggregators, such as light scattering,17 electron microscopy,18 fluorescence assays,19 and nonionic detergents. The use of nonionic detergents, such as Triton X-100 and Tween-80, is the most common approach used in screening campaigns, and it can avoid most aggregations by right-shifting the concentration−response curves.12,16 However, these methods require specific experimental conditions to check whether a compound will aggregate in HTS.20 Moreover, these methods also have their own disadvantages. Taking the nonionic detergents method as an example, some detergentresistant aggregators can still interfere with assays that use nonionic detergents.8 Several efforts have been made to develop computational models to distinguish aggregators from nonaggregators. For example, based on an aggregator library of more than 12 600 aggregated molecules, Irwin et al. developed a computational tool called Aggregator Advisor to predict aggregation based on the calculated log P of the queried molecule and the structural similarity to known aggregators.21 Due to the lack of a training step, this model suffered from high false positives and false negatives. However, this model was still commonly used by other Web sites to detect aggregators, such as Hit Dexter 2.0.22 Machine learning approaches have been used to develop theoretical models based on relatively small data sets with aggregators and nonaggregators. Doman et al. developed a recursive partitioning (RP) model based on 47 aggregators and 64 nonaggregators. This model gave high prediction accuracy for the training set (93.7%), but it did not show acceptable predictive capability for the 75 orally available drugs in the negative control set.23 Based on the experimental data from 1030 druglike molecules, Feng and co-workers built two computational models using the naive Bayesian (NB) and RP methods, but both yielded a misclassification rate of 26% for the random set.24 Rao et al. developed a support vector



MATERIALS AND METHODS Data Set Compilation. The first step in structure−activity relationship (SAR) modeling is to compile a reliable data set. For the positive set, we collected the aggregated molecules from the Aggregator Advisor library after deleting several compounds from unknown sources. For the negative set, we compiled the drugs or drug candidates from DrugBank (Approved, Experimental and Investigational Drugs),38 ChEMBL24 (Drugs of frequent indications and Small molecules from max phase 0-4),39 TTD Drugs (All drugs),40 and Guide to Pharmacology (Complete Ligands).41 To obtain a high-quality negative set, we manually checked all of the molecules and deleted those that have already been reported as aggregators. To further ensure the quality and reliability of the data set, all data were subjected to a multistep data preparation process: (1) Compounds with a molecular weight below 80 or above 800 Da were discarded. (2) Since compounds with a linear chain are less likely to aggregate, all linear chain drugs and drug candidates were deleted. (3) All compounds were standardized (“washed”, pH 7.0) using the “wash” function of MOE (Molecular Operating Environment software, version B

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

our study, Konstanz Information Miner (KNIME), a platform integrating data processing, data analysis, and data exploration, was applied to construct the data analysis pipeline.57 The main hyperparameters of two algorithms were optimized by using the grid search method and 5-fold cross-validation. For RF, the number of decision trees (n_tree, from 500 to 2000, interval = 100) and the maximum fraction of features considered per split (max_features, from 10 to 100, interval = 5) were optimized. For XGBoost, the learning rate (Eta, from 0.01 to 0.3, interval = 0.02), the maximum depth of a tree (maximum depth, from 3 to 10, interval = 1), and the number of models to train in the boosting ensemble (boosting rounds, from 500 to 3000, interval = 100) were optimized. Performance Evaluation. To ensure that the derived model has good generalization ability, 5-fold cross-validation and an external validation set were used for the validation purpose. For the 5-fold cross-validation, the whole training set was split into five roughly equal sized parts. Then the model was built with four parts of the data and its prediction accuracy to the remaining part was calculated. This process was repeated five times so that every part could be used as a validation set. For the best model determined by crossvalidation, the external validation set was used for further validation. Commonly used statistical parameters were used to evaluate the quality of the classification models, including true positives (TP), false negatives (FN), true negatives (TN), false positives (FP), the overall prediction accuracy (ACC = (TP + TN)/(TP + TN + FP + FN)), the prediction accuracy of the positive set (sensitivity, SE = TP/(TP+FN)), and the prediction accuracy of the negative set (specificity, SP = TN/(TN + FP)). In addition, the receiver operating characteristic (ROC) curve was plotted and the area under the receiver operating characteristic curve (AUC) was used to assess each model. Besides, in the application phase, we used the comprehensive evaluation index F which adjusts the weight of model precision (Precision= TP/(TP + FP)) and recall (Recall= TP/(TP + FN)) by changing the threshold to ensure the performance of our model.

2018, Chemical Computing Group, Montreal, QC, Canada) to disconnect group metals in simple salts, remove minor components, deprotonate strong acids, protonate strong bases, and add explicit hydrogens. (4) A duplicate filter was used to delete duplicated molecules, followed by quality checks. Finally, we obtained 12 119 unique compounds in the positive set and 24 172 unique compounds in the negative set. To verify the reliability and predictive ability of the models, all the compounds were divided into a training set (20 666 drugs and 10 650 aggregators, for a total of 31 316 compounds) and a test set (3506 drugs and 1469 aggregators, for a total of 4975 compounds) using MOE according to their chemical scaffolds. To further validate the generalization ability of our model, we collected 5681 nonduplicated aggregators from the ZINC database as the external validation set to evaluate the ability of the models to predict aggregators. Molecular Representation. Five different molecular representations were used to explore the impact of molecular descriptors on the performance of the prediction models: (1) 166 MACCS structural fragments with frequency information on special substructure occurrence;42 (2) 210 Chemical Advanced Template Search (CATS) descriptors, a set of pharmacophore descriptors which splits atoms into six groups of hydrogen-bond donor (D), hydrogen-bond acceptor (A), hydrophobicity (H), positive charge (P), negative charge (N), and aromatic (R), with bond distances from 1 to 10 between two atoms; (3) 206 MOE2d descriptors including 20 physical properties, 14 Hueckel theory descriptors, 18 subdivided surface areas, 42 atom counts and bond counts, 16 Kier and Hall connectivity and Kappa shape indices, 33 adjacency and distance matrix descriptors, 13 pharmacophore feature descriptors, and 50 partial charge descriptors;43 (4) Extended Connectivity Finger-Print with bond diameter of 4 (ECFP4), a class of 1024 bit circular fingerprints developed specifically for SAR modeling;44 and (5) 110 Ghose−Crippen (GC) structural fragments for the calculation of log P based on group contribution.45 The above molecular descriptors were calculated using the open-source Web-based platforms ChemDes46 and ChemSAR,47 the freely available Python packages ChemoPy48 and PyDPI,49 and MOE (version 2018). Model Construction. In this study, two ensemble learning algorithms were employed to construct the classification models: random forest (RF) and extreme gradient boosting (XGBoost). The ensemble learning algorithm builds a set of base learners based on the training set and performs a prediction on the test set by majority voting for classification or by averaging for regression based on the predictions made by individual learners. RF is one of the most popular ensemble algorithms in QSAR/SAR modeling, and it is an ensemble of unpruned classification trees created by using bootstrap samples of the training data and random feature selection in tree induction, which was proposed by Breiman in 2001.50−52 RF can not only improve the model accuracy of a single decision tree but also avoid the overfitting phenomenon. XGBoost is an efficient and scalable implementation of the gradient boosting framework, and it provides insights on cache access patterns, data compression, and fragmentation.53−55 XGBoost develops the model in a sequential stagewise fashion like other boosting methods do, and generalizes them by allowing optimization of an arbitrary differentiable loss function.56 It has been regarded as a new generation of ensemble learning algorithms, which have become the winners for several machine learning competitions in recent years. In



RESULTS AND DISCUSSION Scaffold Analysis of the Data Set. An SAR model derived from diverse structures will generally cover a large chemical space and, consequently, has a wide applicability domain. To analyze the diversity of our data set, we used the RDKit package to decompose all molecules into scaffolds based on their two-dimensional structures. Here, we analyzed two classes of scaffolds: Murcko scaffolds and carbon skeletons. The Murcko scaffolds were extracted from molecules by removing all R-groups but retaining the linkers between ring systems, and the carbon skeletons were generated based on the Murcko scaffolds by changing heteroatoms to carbon atoms and all bond orders to single.58,59 As a result, we obtained 12 294 different Murcko scaffolds and 5939 different carbon skeletons. In total, 7760 different Murcko scaffolds and 3863 different carbon skeletons were generated for aggregators. For the Murcko scaffolds, the scaffolds with no more than five molecules covered nearly 60% of the drugs and 85% of the aggregators in our data set. For the carbon skeletons, the distribution of molecules was clustered more tightly, and the skeletons with no more than five molecules covered nearly 30% of the drugs and 43% of the aggregators. The scaffold analysis indicated a high level of C

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 1. Top 10 Murcko Scaffolds and Carbon Skeletons That Appeared in the Drug Set and the Aggregator Set

structural diversity of the data set, and the model developed with the data set may have good prediction coverage for structurally diverse compounds. To further explore the relationship between scaffolds and aggregation, we extracted the most frequent 10 Murcko scaffolds and carbon skeletons from the drug and aggregator sets, respectively. As shown in Table 1, there are several similar frequently occurring structures in the two sets. For example, the naphthalene ring, five-membered ring, and benzene were the most frequent Murcko scaffolds for the two sets. More importantly, there are also some notable differences of the scaffolds between the two sets. In the top 10 Murcko scaffolds, 90% of the aggregator scaffolds contained more than one

heteroatom (nitrogen, oxygen, and/or sulfur atom), while only 50% of the drug scaffolds contained heteroatoms. Notably, no sulfur atom was confirmed in the top drug Murcko scaffolds. For the top 10 carbon skeletons, 70% of the aggregators were bicyclic or tricyclic compounds, linked by three to five atoms. On the contrary, 70% of the drugs were formed by parallel or directly linked rings, which were smaller and simpler. Overall, the frequently emerged scaffolds in the aggregators are more complicated and larger and contained more heteroatoms than those in the drugs. According to these results, we believe that the differences in molecular structures between the two sets will be helpful for detecting aggregators in early drug discovery. D

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Table 2. Performance of the Models Derived from Different Combinations of Machine Learning Algorithms and Descriptor Sets 5-fold cross-validation RF

XGBoost

CATS ECFP4 MACCS MOE2d GC all CATS ECFP4 MACCS MOE2d GC all

test set

SE

SP

ACC

AUC

SE

SP

ACC

AUC

0.799 0.812 0.845 0.815 0.819 0.856 0.842 0.880 0.886 0.881 0.862 0.916

0.930 0.975 0.957 0.951 0.949 0.960 0.937 0.959 0.954 0.954 0.943 0.966

0.886 0.920 0.919 0.905 0.905 0.925 0.905 0.932 0.931 0.930 0.915 0.949

0.945 0.970 0.968 0.960 0.959 0.976 0.957 0.974 0.973 0.974 0.964 0.987

0.790 0.783 0.822 0.793 0.792 0.832 0.819 0.858 0.861 0.846 0.823 0.848

0.964 0.976 0.973 0.971 0.969 0.965 0.965 0.970 0.971 0.970 0.966 0.974

0.912 0.919 0.928 0.918 0.917 0.926 0.922 0.937 0.938 0.934 0.924 0.937

0.957 0.970 0.972 0.964 0.960 0.966 0.963 0.974 0.974 0.983 0.965 0.976

Figure 1. ROC curves of the 5-fold cross-validation to the training set (A) and the predictions to the test set (B) obtained with the models derived from different combinations of machine learning algorithms and descriptor sets.

selection process five times to extract the most frequent features emerging in the top 20 features for the models based on the CATS, MACCS, MOE2d, and GC descriptors and in the top 50 features for the model based on the ECFP4 descriptors. Finally, 18 CATS descriptors, 18 MACCS fragments, 17 MOE2d descriptors, 19 GC fragments, and 45 ECFP4 fingerprints were chosen as the optimal features for model building. Model Performance from Five Different Molecular Representations. Ten classification models were then developed based on the optimal feature sets chosen by feature selection for five different types of descriptors using the RF and XGBoost algorithms, and two classification models were established based on the combinations of the five optimal feature sets using the RF and XGBoost algorithms. The classification statistics of the 5-fold cross-validation to the training set and the predictions on the test set are summarized in Table 2. The ROC curves for the 5-fold cross-validation and the predictions on the test set are shown in Figure 1. It is apparent to observe that most models performed well for both the training and test sets, with an average ACC up to 0.9 and

Feature Selection Based on RFE-RF. To remove irrelevant variables and consequently improve the reliability of our models, the feature selection was conducted by using the recursive feature elimination based on RF (RFE-RF) approach. First, the RF models were trained based on the complete set of descriptors. Then, a ranking criterion score was computed for each descriptor and the descriptor with the lowest score was removed. Finally, the RF model was retrained by using the remaining set of descriptors. The above process was run iteratively until the stopping criterion was satisfied. As shown in Figure S1, the accuracy of the models first increased rapidly, but after a point, it became stable and almost unchanged. Thus, the models based on the CATS, MACCS, MOE2d, and GC descriptors needed approximately 20 features to achieve stable prediction (accuracy above 0.90), and that based on the ECFP4 fingerprints needed at least 50 features to achieve an accuracy of 0.90. This result may be explained by the fact that the ECFP4 fingerprints are a set of nonpredefined circular fingerprints that can represent an essentially infinite number of different molecular features, and the information for a single fingerprint is limited. Then we repeated the feature E

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 2. Top eight features and their values of importance in the models derived from (A) CATS, (B) MACCS, (C) MOE2d, (D) GC, and (E) ECFP4.

to aromatic or hydrophobic atom pairs. The features related to long bond distances (>7 bonds) of atom pairs covered almost two-thirds of the eight features, highlighting their importance in distinguishing aggregators from nonaggregators. For example, the CATS-RR10 values for almost 77% of drugs were below 0.1, and those for only a few drugs spread from 0.1 to 1.4. However, the CATS-RR10 values for approximately 33% of aggregators were below 0.1, and those for the majority of aggregators distributed in a range from 0.2 to 0.5. The observation is also consistent with the analysis of the frequently occurring scaffolds that aggregators have relatively more complicated and larger scaffolds. Hence, we believe that compounds with more long distance aromatic or hydrophobic atom pairs may tend to aggregate in aqueous solution. MACCS. For the MACCS descriptors, higher percentages of ring atoms (MACCS (165), MACCS (163)) and a lower percentage of hydroxyl groups (MACCS (139)) were found in aggregated compounds. Moreover, aggregators also covered more compounds with higher numbers of sulfur atoms (MACCS (−81)) and nitrogen atoms (MACCS (156), MACCS (161)) than drugs. The frequencies of these important features in the two sets are quite different. For example, MACCS (−81) counts the sulfur atoms attached to no less than three other atoms, such as sulfur-containing aromatic heterocycles. In our study, 80% of drugs had no sulfur atom, while more than 55% of aggregated molecules had more than one sulfur atom. Therefore, it is suggested that controlling the sulfur atoms and nitrogen atoms in rings might be a useful way to avoid molecular aggregation. MOE2d. As shown in Figure 2C, most important features selected from the MOE2d descriptors were related to hydrophobicity, such as h_logD, log S, and h_logS. Several important features represented the total charge or strain energy needed for protonation, such as h_log_pbo, h_pavgQ, and h_pstrain. The other two descriptors, GCUT_SMR_0 and balabanJ, were related to molecular refractivity and molecular topological structure. Considering the importance and the easy-to-understand characteristics of molecular hydrophobicity, we further analyzed h_logS, h_logD, and h_logP. As shown in Figure 3, the distributions of drugs and aggregators

an average AUC up to 0.95. Though the performance on the test set was commonly slightly worse than the 5-fold crossvalidation, there were no significant differences between the predictions to the two sets. These results indicated that our models built with the two ensemble learning algorithms can distinguish aggregators and drugs with high confidence. For the models built on a single set of molecular descriptors, the model based on the ECFP4 fingerprints and the XGBoost algorithm performed the best, with ACC = 0.932 and AUC = 0.974 for the 5-fold cross-validation, and ACC = 0.937 and AUC = 0.974 for the test set. As shown in Table 2, the model based on the combination of all descriptors and the XGBoost algorithm yielded the best performance (Eta = 0.2, Boosting rounds = 1500, Maximum depth = 6), with ACC of 0.949 and AUC of 0.987 for 5-fold cross-validation, and ACC of 0.937 and AUC of 0.976 for the test set. It appeared that the combination of molecular descriptors could capture the relationship between the chemical structures of molecules and the end point more efficiently than a single set of molecular descriptors. What surprised us was that the predictions of the models based on the GC fragments were not poor. In fact, the performance of the XGBoost model based on the GC fragments for the test set (ACC = 0.924 and AUC = 0.965) was even better than that based on the CATS descriptors for the test set (ACC = 0.922 and AUC = 0.963). Hence, we believed that the substructures related to log P were important to distinguish aggregators and drugs. Analysis of Selected Important Features That Contribute to Discrimination. The ability of a compound to aggregate in aqueous solution should be related to its physicochemical properties and chemical patterns, and therefore we explored the features related to compound aggregation. To achieve this goal, we calculated the importance of the features used in the five XGBoost models built based on the individual optimal descriptor sets. Figure 2 shows the top eight important molecular features in the five different models. It should be noted that the ECFP4 features were not analyzed in this part. CATS. CATS are a set of pharmacophore descriptors. As shown in Figure 2A, the most important features were related F

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 3. Distributions of (A) h_logS, (B) h_logD, and (C) h_logP for aggregators and drugs.

Table 3. Performance of the Models Derived from Different Combinations Descriptor Sets and Two Machine Learning Algorithms 5-fold cross-validation RF

XGBoost

MACCS MACCS MACCS all MACCS MACCS MACCS all

+ GC + GC + CATS + GC + CATS + MOE2d + GC + GC + CATS + GC + CATS + MOE2d

test set

SE

SP

ACC

AUC

SE

SP

ACC

AUC

0.833 0.849 0.855 0.857 0.886 0.906 0.908 0.916

0.947 0.962 0.957 0.959 0.951 0.968 0.962 0.967

0.909 0.928 0.923 0.925 0.928 0.949 0.944 0.950

0.965 0.987 0.977 0.976 0.977 0.977 0.985 0.987

0.836 0.835 0.830 0.832 0.826 0.836 0.841 0.848

0.955 0.958 0.963 0.965 0.970 0.971 0.975 0.974

0.920 0.922 0.924 0.926 0.927 0.931 0.935 0.937

0.965 0.966 0.966 0.966 0.968 0.971 0.975 0.976

Performance of Models Based on Feature Combination. As shown in Table 2, the models built based on the combinations of different types of molecular descriptors had better performance. In this section, we constructed several robust classification models based on different combinations of descriptors. In consideration of the definition, the descriptors in different types may contain the same information. For instance, MACCS (139) and [OH, OH2] count the number of hydroxyl groups in a molecule. To avoid meaningless information redundancy, the correlation between any two features was calculated and the feature that has high correlation (r > 0.95) with another was removed. However, we retained the descriptors that provided good interpretations as much as possible. The performance of the eight combinatorial models is listed in Table 3. As discussed above, the performance of the models based on the combination of different descriptor sets was better than those of the models based on a single descriptor set (Table 2). It seems that more information could be captured by the inclusion of more descriptor types, and then the performance of the models became better. For another general trend, using the same descriptors, the models built with XGBoost were better than those built with RF. Overall, the best classifier based on XGBoost and all five types of molecular descriptors (Eta = 0.2, Boosting rounds = 2000, Maximum depth = 6) was able to distinguish aggregators from drugs with ACC of 0.950 and AUC of 0.987 in the 5-fold cross-validation and ACC of 0.937 and AUC of 0.976 for the test set. Additionally, we have also calculated the consensus model from five descriptor sets. The result showed that the performance of the consensus model (SE = 0.899, SP = 0.963, ACC = 0.942) was a little worse than those of models from the combination descriptors.

for these three descriptors were obviously different (p < 0.001). It is generally accepted that the aggregation and nonspecific binding behaviors of aggregators are related to hydrophobicity. In some contexts, log P higher than 3 was considered to be a criterion to identify aggregators with high probability,21 which was also supported by our analysis (Figure 3C). In addition, the pH of the environment may change the ionized state of a molecule and thus influence molecular liposolubility. Considering this possibility, log D, the distribution coefficient at pH 7, is a more practical feature. The log D values of more than 40% of drugs were less than 2, while those of only 5% of aggregators were at the same level. log D is also significantly discrepant in the range between 4.5 and 7.0. The percentage of drugs was up to 60%, which was approximately 3 times over that of aggregators (Figure 3B). GC. For the GC fragments, the SMARTS ([#6], [#7], and [#8]) are defined as the supplementary summary of carbon, nitrogen, and oxygen atoms that do not match any basic types, respectively. Though they were important for model building, it is hard to transform such features into a clear explanation. For several other important fragments, they provide similar structural information to the important MACCS descriptors that aggregators possess more aromatic carbons ([cH]) and fewer OH groups ([OH, OH2]) than drugs. Almost 99% of aggregators have zero or only one hydroxyl group, while 45% of drugs have more than one hydroxyl group and 10% of drugs have more than three hydroxyl groups. We suspect that a molecule with more hydroxyl groups can form hydrogen bonds with the target and/or solvent more easily, so it may be more difficult to aggregate in aqueous solution because this molecule needs to overcome strong interactions. G

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 4. First four levels of the decision tree model.

Almost all the descriptors shown in Figure 4 also had relatively high importance (Figure 2), suggesting that these important features are quite essential whether in the compilation of rules or in the construction of classification models. In addition, from the substructures and physicochemical properties, the important features that can effectively distinguish aggregators from nonaggregators, such as log D, the number of hydroxyls, and CATS-RR10, were more or less related to molecular hydrophobicity. Considering the high importance and distributions of CATS-RR10, we suspected that the number of long bond distance aromatic or hydrophobic atom pairs was essential to colloidal aggregation. With the increase of the compound concentration, hydrophobic groups may be increasingly attracted to each other and thus may assemble into colloidal aggregations. This hypothesis is also consistent with the X-ray scattering of aggregators.61 Since the molecular structure of aggregation is poorly understood, this finding is enlightening for further study. In consideration of the results of the decision tree and the above analysis, we selected five features and performed an exhaustive search of possible threshold combinations of these features. We rationalized the optimal set of thresholds to maximize accuracy and balance sensitivity and specificity. The results presented in our study indicated that such aggregators seem to obey, on average, a set of rules, in which log D is higher than 6, the number of aromatic carbons attached to a hydrogen atom is higher than 14, the number of hydroxyl groups is higher than 3, and the number of sulfur atoms attached to more than three atoms is higher than 2. In addition, our study suggested that CATS-RR10 > 0.03 could be a useful criterion for the identification of aggregators. This set of rules is summarized in Table 4. Our rules showed good performance, with ACC = 0.723 for the test set and ACC = 0.710 for the external validation set (Table 5). This set of rules cannot reach the high accuracy that the models could achieve, but they provided the discriminating value for important

To further assess the stability of the proposed model, the whole data set was randomly divided into the training and test sets with a ratio of 3:2, and the training and test sets were used to build and evaluate the XGBoost models, respectively. The training and testing processes were conducted 1000 times. By averaging the predictions given by the 1000 XGBoost models, the following results for the test sets could be obtained: SE = 0.920 ± 0.002, SP = 0.970 ± 0.003, ACC = 0.954 ± 0.002, and AUC = 0.987 ± 0.001. Apparently, the performances of the XGBoost models were quite stable. To further validate the prediction ability of our model, we collected several aggregated molecules from the aggregator subset in the ZINC database.60 After washing and eliminating duplicates, a total of 5681 molecules was collected to validate our model. As a result, about 92% of aggregators were detected, and the prediction possibilities for 88 and 80% of molecules were higher than 0.7 and 0.9, respectively. To further validate the high prediction accuracy of our model, we evaluated the similarity between ZINC aggregators and the aggregator data used for training by calculating the Tanimoto coefficient based on ECFP4 fingerprint, and counted the number of training compounds that are most similar to each query molecule. Taking 0.7 as a threshold, almost 85% of compounds in ZINC aggregators were similar to the training compounds. This result proved that there exist the general characters among aggregators, which ensure the possibility to filter out aggregators by screening models. Overall, these results highlighted the capability of our model to detect aggregators in different sets of molecules. Development of Aggregation Rules. To better understand the contributions of important properties to aggregation, we chose several comprehensive features identified by the feature selection to build a decision tree model with the scikitlearn machine learning package (max_depth = 9, min_samples_split = 200). Figure 4 provides the information for the first four levels of the decision tree model. H

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

0.475 for the test set, and sensitivities of 0.662 and 0.704 for the external validation set. According to the comparison results, only the FAF-Drugs4 (strict) filter and ALARM NMR achieved a sensitivity higher than 0.8 (0.862 and 0.857). However, their scores for specificity were much lower (0.091 and 0.396), indicating that they may be too strict to detect druglike molecules, especially the FAF-Drugs4 (strict) filter. That is, almost all drugs or drug candidates were being filtered out. The performance of the other rules or models was contrary to the FAF-Drugs4 (strict) filter and ALARM NMR. With a higher specificity score and lower sensitivity score, they performed better in detecting drugs than aggregators. Since these models and rules were designed to filter out molecules with unfavorable physicochemical properties, low intestinal absorption, or some undesirable properties, they were advantageous for differentiate drugs and nondrugs rather than aggregators and nonaggregators. However, there were several models and rules, such as FAF-Drugs4, that did not perform well in the two sides (both lower than 0.5), which suggested the difficulty of distinguishing aggregators from drugs. The low bioavailability caused by the aggregation of molecules has not attracted enough attention. Most druglikeness rules, such as Lipinski and Ghose, could not detect aggregators successfully (sensitivity < 0.4). Therefore, our rules, which can detect more than 70% of aggregators in both the test set and the external validation set, with accuracy rates for drugs up to 0.715, could be considered as a good supplement to already proposed druglikeness rules. Another phenomenon that needs to be noted is the importance of adopting an appropriate filtering method. Taking the different results from the FAF-Drugs4 filter and the FAF-Drugs4 (strict) filter as examples, though the screening models they used were from both Web sites, the different options may lead to totally different outcomes in application. Compared to the sensitivity of 0.483 of the FAF-Drugs4 filter, the stricter filter almost doubled this score (sensitivity = 0.862), while the result was

Table 4. Summary of Aggregation Rules basic rules

additional rules

log D is higher than 6. The number of aromatic carbons attached to a hydrogen atom is higher than 14. The number of hydroxyl groups is higher than 3. The number of sulfur atoms attached to more than three atoms is higher than 2. CATS-RR10 > 0.03

features, which can provide simple and valuable suggestions to chemists for the synthesis and optimization of potential lead compounds. Comparison with Other Related Filtering Rules and Models. To further emphasize the necessity and superiority of our classification model for detecting aggregators, we compared our models and rules with some existing filters and rules, such as Aggregator Advisor, the Blake filter, Glaxo filter, ALARM NMR filter, PAINS substructures on the Web site SmartsFilter and the Lilly-MedChem-rules, FAF-Drugs4, and six different rules and models (Lipinski’s rules, Ghose’s rules, Oprea’s rules, Veber’s rules, Varma’s rules, and druglikeness model) implemented in the Web site ADMETlab (http://admet.scbdd.com/). We used our previous test data (3506 drugs and 1469 aggregators, 4975 total molecules) and the ZINC external validation set (5681 aggregators) as the benchmark data for comparison. The results given by the different filters, rules, or models are summarized in Table 5. As shown in Table 5, our model performed the best and had the highest sensitivity and specificity, and our rules also performed better than most of the other models. The Aggregator Advisor model could detect aggregators with a sensitivity of 0.521 for the test set and 0.662 for the external validation set, while the strict one performed better. But the comprehensive performance of Aggregator Advisor was not satisfied compared with our models, with ACCs of 0.825 and

Table 5. Performance Comparison between Our Model and Other Rules or Models test set Blake Glaxo ALARM NMR PAINS Lilly- MedChem-rules FAF-Drugs4a FAF-Drugs4 (strict)b Aggregator Advisorc Aggregator Advisor (strict)d Lipinski Ghose Oprea Veber Varma druglikeness model our model our rules

external validation set

TP

FN

TN

FP

SE

SP

ACC

TP

FN

SE

590 121 1259 127 555 710 1266 766 1180 314 203 230 20 543 239 1246 1091

879 1348 210 1342 914 759 203 703 289 1155 1266 1239 1449 926 1230 223 378

2060 3323 1387 3326 2313 740 318 3337 1185 2842 2793 2123 3024 2367 1277 3417 2507

1446 283 2119 180 1193 2766 3188 169 2321 664 713 1383 482 1139 2229 89 999

0.402 0.082 0.857 0.086 0.378 0.483 0.862 0.521 0.803 0.214 0.138 0.157 0.014 0.370 0.163 0.848 0.743

0.588 0.948 0.396 0.949 0.660 0.211 0.091 0.952 0.338 0.811 0.797 0.606 0.863 0.675 0.364 0.975 0.715

0.534 0.695 0.534 0.696 0.576 0.291 0.318 0.825 0.475 0.634 0.602 0.528 0.612 0.585 0.305 0.937 0.723

1949 261 4681 528 2331 2833 5153 3763 4001 1470 1516 268 177 2336 739 5249 4034

3732 5420 1000 5153 3350 2848 528 1918 1720 4211 4165 5413 5504 3345 4942 432 1647

0.343 0.046 0.824 0.093 0.410 0.499 0.907 0.662 0.704 0.259 0.267 0.047 0.031 0.411 0.130 0.924 0.710

FAF-Drugs4 filter involved PAINS filter, Lilly-MedChem-rules, undesirable substructure moieties, and Retrieve covalent inhibitors. bFAF-Drugs4 filter (strict) on the basis of FAF-Drugs4 filter; it added PPIHitProfiler and in-house [∗] and published physicochemical filter into consideration. c Aggregator Advisor filter defined the aggregators as the similarity is greater than 0.85 and the calculated log P is above 3. dAggregator Advisor filter (strict) defined the aggregators as the calculated log P is above. a

I

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

problems unless they are deleted in the early phase. Considering this situation, our model is a severely needed and significant tool in pharmaceutical development. Web Server for Identifying and Filtering Aggregated Molecules. To share our results with chemists and pharmacologists, we constructed a Web server called ChemAGG for filtering out aggregators from potential hit molecules. The Web server is freely accessible at http://admet. scbdd.com/ChemAGG/index. It was developed by using the Python language, which enables multiple accesses simultaneously. Considering the Web site environment and prediction performance described above, the prediction model based on the ECFP4 fingerprint and XGBoost algorithm was adopted in the Web site. As shown in Figure 6, ChemAGG has some unique and beneficial features: (1) It provides three types of inputs to start a prediction. The server accepts an ∗.sdf file created by MDL Information Systems as the input file. It also accepts other two forms of input: by inputting SMILES and drawing a molecule from the JME editor.62 (2) It supports the batch computation. Prediction of colloidal aggregation for a single molecule is of little use for researchers who are dealing with ample data, especially in VS. ChemAGG supports batch computation by uploading molecular files. (3) ChemAGG provides useful tools, such as searching and ranking, in the results table. The table shows the SMILES and structures of query molecules. It also provides the predicted category of molecules (“1” for Aggregator and “0” for Nonaggregator), with a prediction possibility value from 0 (impossible to aggregate) to 1 (most likely to aggregate), which yields a better estimate of how reliable the predictions are for particulate compounds. (4) It provides a general-purpose preserve layout. The final table can be downloaded as a ∗.csv file. This Web server provides a friendly and convenient interface to start a calculation and a nicely formatted page to access the results.

opposite for specificity (0.211 vs 0.091). Considering the comparison with other frequently used methods, it is more appropriate to use our model to filter out aggregators in drug discovery. This unique and useful model can also be considered to be a supplementary tool to the existing methods for the identification of potentially problematic compounds. Model Application in Virtual Screening. To obtain a better understanding of our models and explore the proportions of aggregated molecules in different screening databases, we used the Asinex (567 106 compounds), ChemicalBlock (198 643 compounds), and Specs (212 863 compounds) databases for further applications. Our model is a negative design that aims to weed out molecules with unsatisfactory physicochemical features in preliminary studies of drug discovery. Therefore, it is of vital importance to ensure the prediction accuracy. To accomplish this goal, we used the comprehensive evaluation index called the F measure, which adjusts the value of precision and recall of a model by changing the threshold. After adjustment, we chose 0.95 as a classification threshold to discriminate between aggregators and nonaggregators, with a precision of 0.97 and recall of 0.7 in the 5-fold cross-validation. As shown in Figure 5, the general distributions of the prediction possibilities of the three databases are presented as



CONCLUSION Aggregation is one of the main reasons for false positives in HTS. Due to the small size of the collection of aggregators, efforts to predict colloidal aggregation in silico have had limited success. To solve this problem, we collected 12 119 aggregators and 24 172 drugs to build prediction models. The best performing model had an ACC of 0.950 and AUC of 0.987 in the 5-fold cross-validation and an ACC of 0.937 and AUC of 0.976 in the test set, and also performed well in the external validation set. In comparison with other frequently used models and rules, our model performed best and had a high accuracy that none of the others could achieve. More importantly, we summarized a set of rules which could be a good supplement to the already available druglikeness rules. These rules are useful especially for chemists in conducting the synthesis and optimization of compounds. We also used our model to filter three large compound data sets, and more than 20% of molecules were predicted to be aggregators, indicating a potential threat to drug development. Taking the current situation into consideration, our model and rules are severely needed and unique filtering tools in early drug discovery. To benefit from the results of our study, a public and free Web server ChemAGG (http://admet.scbdd.com/ChemAGG/ index), which is able to distinguish aggregators, has been developed. How to achieve a high efficiency and success rate has always been the research focus of drug development. Due to the high attrition in phase I and phase II trials, many studies have

Figure 5. Screening result of the Asinex, ChemicalBlock, and Specs databases. The x-axis represents the possibility of a compound to aggregate, from 0 (least likely to aggregate) to 1 (most likely to aggregate); the left y-axis represents the relative frequency which is plotted in the histogram; the right y-axis represents the cumulative frequency which is plotted in lines.

concave surfaces, indicating the ability of our model to distinguish aggregators from nonaggregators. Though most compounds (approximately 60%) were distributed into two poles, there were still some compounds (approximately 30%) whose prediction probabilities were between 0.2 and 0.8. This distribution tendency indicates that the threshold of the models greatly impacts the final screening result. When the threshold was 0.95, approximately 20% of compounds in the Asinex and ChemicalBlock databases were predicted to be aggregators. For the Specs database, the percentage was greater than 30%. Though our predictions inevitably contained some false positives and false negatives, these results suggested that the percentages of aggregators were not low in the present curated chemical databases. Aggregators can later be potential J

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 6. Overview of the workflow in the ChemAGG Web site.

jiang Provincial Natural Science Foundation of China (LZ19H300001), and Hunan Provincial Natural Science Foundation of China (2019JJ51003). The studies meet with the approval of the university’s review board.

concentrated on screening compounds with undesired properties in early drug discovery. There are some notable and widely known researches, such as the Lipinski rules, whose original article has more than 6100 literature citations. With a deeper study of mechanisms and more data attained, these negativedesigned filters have become more professional and useful. Our study aims to go a step further regarding filter design. We believe that our study will help scientists to screen compounds that have an increased likelihood of aggregation in biochemical assays. Our rules are predictive, intuitive, and simple to implement, while our models are accurate and effective in screening. Though aggregation may have positive applications in some fields, such as protein−protein interactions, it is clear that nonaggregators are more suitable in most pharmaceutical development projects. We hope that our study will contribute to the effort to decrease false positives and thus increase the efficiency and success rate of drug development.





(1) Mullard, A. 2018 FDA drug approvals. Nat. Rev. Drug Discovery 2019, 18, 85−89. (2) Dimasi, J. A.; Grabowski, H. G.; Hansen, R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20−33. (3) Rishton, G. M. Reactive compounds and in vitro false positives in HTS. Drug Discovery Today 1997, 2, 382−384. (4) Roche, O.; Schneider, P.; Zuegge, J.; Guba, W.; Kansy, M.; Alanine, A.; Bleicher, K.; Danel, F.; Gutknecht, E.-M.; Rogers-Evans, M.; Neidhart, W.; Stalder, H.; Dillon, M.; Sjogren, E.; Fotouhi, N.; Gillespie, P.; Goodnow, R.; Harris, W.; Jones, P.; Taniguchi, M.; Tsujii, S.; von der Saal, W.; Zimmermann, G.; Schneider, G. Development of a virtual screening method for identification of “frequent hitters” in compound libraries. J. Med. Chem. 2002, 45, 137−142. (5) McGovern, S. L.; Caselli, E.; Grigorieff, N.; Shoichet, B. K. A Common Mechanism Underlying Promiscuous Inhibitors from Virtual and High-Throughput Screening. J. Med. Chem. 2002, 45, 1712−1722. (6) Feng, B. Y.; Simeonov, A.; Jadhav, A.; Babaoglu, K.; Inglese, J.; Shoichet, B. K.; Austin, C. P. A high-throughput screen for aggregation-based inhibition in a large compound library. J. Med. Chem. 2007, 50, 2385−2390. (7) Ferreira, R. S.; Simeonov, A.; Jadhav, A.; Eidam, O.; Mott, B. T.; Keiser, M. J.; Mckerrow, J. H.; Maloney, D. J.; Irwin, J. J.; Shoichet, B. K. Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors. J. Med. Chem. 2010, 53, 4891− 4905. (8) Babaoglu, K.; Simeonov, A.; Irwin, J. J.; Nelson, M. E.; Feng, B.; Thomas, C. J.; Cancian, L.; Costi, M. P.; Maltby, D. A.; Jadhav, A.; Inglese, J.; Austin, C. P.; Shoichet, B. K. Comprehensive mechanistic analysis of hits from high-throughput and docking screens against beta-lactamase. J. Med. Chem. 2008, 51, 2502−2511. (9) Coan, K. E. D.; Maltby, D. A.; Burlingame, A. L.; Shoichet, B. K. Promiscuous aggregate-based inhibitors promote enzyme unfolding. J. Med. Chem. 2009, 52, 2067−2075. (10) Blevitt, J. M.; Hack, M. D.; Herman, K. L.; Jackson, P. F.; Krawczuk, P. J.; Lebsack, A. D.; Liu, A. X.; Mirzadegan, T.; Nelen, M. I.; Patrick, A. N.; et al. Structural Basis of Small-Molecule Aggregate Induced Inhibition of a Protein-Protein Interaction. J. Med. Chem. 2017, 60, 3511−3517.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.9b00541. Feature selection results of five descriptors (PDF) SMILES strings for all aggregators used for building models (12 119 structures) (XLSX)



REFERENCES

AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Tel.: +86-731-89824761 (D.S.C.). *E-mail: [email protected]. Tel.: +86-571-88208412 (T.J.H.). ORCID

Ting-Jun Hou: 0000-0001-7227-2580 Dong-Sheng Cao: 0000-0003-3604-3785 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was financially supported by the National Key Basic Research Program (2015CB910700), National Science & Technology Major Project of China “Key New Drug Creation and Manufacturing Program” (2018ZX09711002-007), ZheK

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

calculation and chemical library design. Nucleic Acids Res. 2015, 43, W200−W207. (32) Varma, M. V. S.; Obach, R. S.; Rotter, C.; Miller, H. R.; Chang, G.; Steyn, S. J.; El-Kattan, A.; Troutman, M. D. Physicochemical Space for Optimum Oral Bioavailability: Contribution of Human Intestinal Absorption and First-Pass Elimination. J. Med. Chem. 2010, 53, 1098−1108. (33) Veber, D. F.; Johnson, S. R.; Cheng, H.; Smith, B. R.; Ward, K. W.; Kopple, K. D. Molecular Properties That Influence the Oral Bioavailability of Drug Candidates. J. Med. Chem. 2002, 45, 2615− 2623. (34) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev. 2001, 46, 3−25. (35) Oprea, T. I. Property distribution of drug-related chemical databases. J. Comput.-Aided Mol. Des. 2000, 14, 251−264. (36) Ghose, A. K.; Viswanadhan, V. N.; Wendoloski, J. J. A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases. J. Comb. Chem. 1999, 1, 55−68. (37) Dong, J.; Wang, N. N.; Yao, Z. J.; Zhang, L.; Cheng, Y.; Ouyang, D.; Lu, A. P.; Cao, D. S. ADMETlab: a platform for systematic ADMET evaluation based on a comprehensively collected ADMET database. J. Cheminf. 2018, 10, 29. (38) Wishart, D. S.; Feunang, Y. D.; Guo, A. C.; Lo, E. J.; Marcu, A.; Grant, J. R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074−D1082. (39) Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A. P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L. J.; Cibriánuhalte, E.; et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945−D954. (40) Li, Y. H.; Yu, C. Y.; Li, X. X.; Zhang, P.; Tang, J.; Yang, Q.; Fu, T.; Zhang, X.; Cui, X.; Tu, G.; et al. Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics. Nucleic Acids Res. 2018, 46, D1121−D1127. (41) Harding, S. D.; Sharman, J. L.; Faccenda, E.; Southan, C.; Pawson, A. J.; Ireland, S.; Gray, A. J. G.; Bruce, L.; Alexander, S. P. H.; Anderton, S.; et al. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Res. 2018, 46, D1091− D1106. (42) Lloyd, D. G.; Buenemann, C. L.; Todorov, N. P.; Manallack, D. T.; Dean, P. M. Scaffold hopping in de novo design. Ligand generation in the absence of receptor information. J. Med. Chem. 2004, 47, 493−496. (43) Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Chemically Advanced Template Search (CATS) for Scaffold-Hopping and Prospective Target Prediction for ‘Orphan’ Molecules. Mol. Inf. 2013, 32, 133−138. (44) Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742−754. (45) Wildman, S. A.; Crippen, G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868−873. (46) Dong, J.; Cao, D.; Miao, H.; Liu, S.; Deng, B.; Yun, Y.; Wang, N.; Lu, A.; Zeng, W.; Chen, A. F. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J. Cheminf. 2015, 7, 60. (47) Dong, J.; Yao, Z.; Zhu, M.; Wang, N.; Lu, B.; Chen, A. F.; Lu, A.; Miao, H.; Zeng, W.; Cao, D. ChemSAR: an online pipelining platform for molecular SAR modeling. J. Cheminf. 2017, 9, 27. (48) Cao, D.; Xu, Q.; Hu, Q.; Liang, Y. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 2013, 29, 1092−1094.

(11) Coan, K. E. D.; Shoichet, B. K. Stoichiometry and physical chemistry of promiscuous aggregate-based inhibitors. J. Am. Chem. Soc. 2008, 130, 9606−9612. (12) Shoichet, B. K. Interpreting steep dose-response curves in early inhibitor discovery. J. Med. Chem. 2006, 49, 7274−7277. (13) Doak, A. K.; Wille, H.; Prusiner, S. B.; Shoichet, B. K. Colloid formation by drugs in simulated intestinal fluid. J. Med. Chem. 2010, 53, 4259−4265. (14) Frenkel, Y. V.; Clark, A. D.; Wang, Y. H.; Lewi, P. J.; Janssen, P. A.; Arnold, E.; Das, K. Concentration and pH dependent aggregation of hydrophobic drug molecules and relevance to oral bioavailability. J. Med. Chem. 2005, 48, 1974−1983. (15) Owen, S. C.; Doak, A. K.; Ganesh, A. N.; Nedyalkova, L.; Mclaughlin, C. K.; Shoichet, B. K.; Shoichet, M. S. Colloidal drug formulations can explain “bell-shaped” concentration-response curves. ACS Chem. Biol. 2014, 9, 777−784. (16) Owen, S. C.; Doak, A. K.; Wassam, P.; Shoichet, M. S.; Shoichet, B. K. Colloidal aggregation affects the efficacy of anticancer drugs in cell culture. ACS Chem. Biol. 2012, 7, 1429−1435. (17) Wang, J.; Matayoshi, E. Solubility at the Molecular Level: Development of a Critical Aggregation Concentration (CAC) Assay for Estimating Compound Monomer Solubility. Pharm. Res. 2012, 29, 1745−1754. (18) Lindfors, L.; Skantze, P.; Skantze, U.; Westergren, J.; Olsson, U. Amorphous drug nanosuspensions. 3. Particle dissolution and crystal growth. Langmuir 2007, 23, 9866−9874. (19) Ilevbare, G. A.; Taylor, L. S. Liquid−Liquid Phase Separation in Highly Supersaturated Aqueous Solutions of Poorly Water-Soluble Drugs: Implications for Solubility Enhancing Formulations. Cryst. Growth Des. 2013, 13, 1497−1509. (20) Mcgovern, S. L.; Helfand, B. T.; Feng, B.; Shoichet, B. K. A specific mechanism of nonspecific inhibition. J. Med. Chem. 2003, 46, 4265−4272. (21) Irwin, J. J.; Duan, D.; Torosyan, H.; Doak, A. K.; Ziebart, K. T.; Sterling, T.; Tumanian, G.; Shoichet, B. K. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015, 58, 7076−7087. (22) Stork, C.; Chen, Y.; Sícho, M.; Kirchmair, J. Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters. J. Chem. Inf. Model. 2019, 59, 1030−1043. (23) Seidler, J.; Mcgovern, S. L.; Doman, T. N.; Shoichet, B. K. Identification and prediction of promiscuous aggregating inhibitors among known drugs. J. Med. Chem. 2003, 46, 4477−4486. (24) Feng, B. Y.; Shelat, A.; Doman, T. N.; Guy, R. K.; Shoichet, B. K. High-throughput assays for promiscuous inhibitors. Nat. Chem. Biol. 2005, 1, 146−148. (25) Rao, H.; Li, Z.; Li, X.; Ma, X.; Ung, C.; Li, H.; Liu, X.; Chen, Y. Identification of small molecule aggregators from large compound libraries by support vector machines. J. Comput. Chem. 2009, 31, 752−763. (26) Blake, J. F. Identification and evaluation of molecular properties related to preclinical optimization and clinical fate. Med. Chem. 2005, 1, 649−655. (27) Hann, M.; Hudson, B.; Lewell, X.; Lifely, R.; Miller, L.; Ramsden, N. Strategic pooling of compounds for high-throughput screening. J. Chem. Inf. Model. 1999, 39, 897−902. (28) Huth, J. R.; Mendoza, R.; Olejniczak, E. T.; Johnson, R. W.; Cothron, D. A.; Liu, Y.; Lerner, C. G.; Hajduk, P. J.; Chen, J. ALARM NMR: a rapid and robust experimental method to detect reactive false positives in biochemical screens. J. Am. Chem. Soc. 2005, 127, 217− 224. (29) Metz, J. T.; Huth, J. R.; Hajduk, P. J. Enhancement of chemical rules for predicting compound reactivity towards protein thiol groups. J. Comput.-Aided Mol. Des. 2007, 21, 139−144. (30) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719−2740. (31) Lagorce, D.; Sperandio, O.; Baell, J. B.; Miteva, M. A.; Villoutreix, B. O. FAF-Drugs3: a web server for compound property L

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling (49) Cao, D.; Liang, Y.; Yan, J.; Tan, G.; Xu, Q.; Liu, S. PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies. J. Chem. Inf. Model. 2013, 53, 3086−3096. (50) Strobl, C.; Boulesteix, A. L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional variable importance for random forests. BMC Bioinf. 2008, 9, 307. (51) Cao, D. S.; Yang, Y. N.; Zhao, J. C.; Yan, J.; Liu, S.; Hu, Q. N.; Xu, Q. S.; Liang, Y. Z. Computer-aided prediction of toxicity with substructure pattern and random forest. J. Chemom. 2012, 26, 7−15. (52) Cao, D. S.; Hu, Q. N.; Xu, Q. S.; Yang, Y. N.; Zhao, J. C.; Lu, H. M.; Zhang, L. X.; Liang, Y. Z. In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint. Anal. Chim. Acta 2011, 692, 50−56. (53) Lei, T.; Li, Y.; Song, Y.; Li, D.; Sun, H.; Hou, T. ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J. Cheminf. 2016, 8, 6. (54) Lei, T.; Chen, F.; Liu, H.; Sun, H.; Kang, Y.; Li, D.; Li, Y.; Hou, T. ADMET Evaluation in Drug Discovery. 17. Development of Quantitative and Qualitative Prediction Models for Chemical-Induced Respiratory Toxicity. Mol. Pharmaceutics 2017, 14, 2407−2421. (55) Lei, T.; Sun, H.; Kang, Y.; Zhu, F.; Liu, H.; Zhou, W.; Wang, Z.; Li, D.; Li, Y.; Hou, T. ADMET Evaluation in Drug Discovery. 18. Reliable Prediction of Chemical-Induced Urinary Tract Toxicity by Boosting Machine Learning Approaches. Mol. Pharmaceutics 2017, 14, 3935−3953. (56) Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189−1232. (57) Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Kötter, T.; Meinl, T.; Ohl, P.; Thiel, K.; Wiswedel, B. KNIME: The Konstanz Information Miner. ACM SIGKDD Explor. Newsl. 2009, 11, 26−31. (58) Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887−2893. (59) Xu, Y. J.; Johnson, M. Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries. J. Chem. Inf. Comput. Sci. 2002, 42, 912−926. (60) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 2012, 52, 1757−1768. (61) Duan, D.; Torosyan, H.; Elnatan, D.; Mclaughlin, C. K.; Logie, J.; Shoichet, M. S.; Agard, D. A.; Shoichet, B. K. Internal Structure and Preferential Protein Binding of Colloidal Aggregates. ACS Chem. Biol. 2017, 12, 282−290. (62) Bienfait, B.; Ertl, P. JSME: a free molecule editor in JavaScript. J. Cheminf. 2013, 5, 24.

M

DOI: 10.1021/acs.jcim.9b00541 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX