Discovering Novel and Diverse Iron-Chelators in Silico - Journal of

Nov 8, 2016 - The ISE models were developed by training a data set of 130 reported iron-chelators. The developed models are statistically significant ...
1 downloads 10 Views 701KB Size
Subscriber access provided by University of Idaho Library

Article

Discovering Novel and Diverse Iron-Chelators in Silico ARIJIT BASU, Yang Sung Sohn, Mohamed Alyan, Rachel Nechushtai, Abraham J. Domb, and Amiram Goldblum J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00450 • Publication Date (Web): 08 Nov 2016 Downloaded from http://pubs.acs.org on November 22, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Discovering Novel and Diverse Iron-Chelators in Silico Arijit Basua, Yang-Sung Sohnb, Mohamed Alyana, Rachel Nechushtaib, Abraham J Domba, Amiram Goldbluma* a

School of Pharmacy, Institute for Drug Research, Hebrew University of Jerusalem, Jerusalem, 91120, Israel b

Department of Plant and Environmental Sciences, the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Givat Ram, Jerusalem, 91904, Israel

Abstract: Specific iron chelation is a validated strategy in anticancer drug discovery. However, only a few chemical classes (4-5 categories) have been reported to date. We discovered in silico 5 new structurally diverse iron chelators by screening through models based on previously known chelators. To encompass a larger chemical space and propose newer scaffolds, we used our Iterative Stochastic Elimination (ISE) algorithm for model building and subsequent virtual screening (VS). The ISE models were developed by training a dataset of 130 reported iron chelators. The developed models are statistically significant with area under the receiver operating curve greater than 0.9. The models were used to screen the Enamine chemical database of ~1.8 million molecules. The top ranked 650 molecules were reduced to 50 diverse structures, and a few others were eliminated due to the presence of reactive groups. Finally, 34 molecules were purchased and tested in vitro. Five compounds were identified with significant iron chelation activity in Cal-G assay. Intracellular iron-chelation study revealed one compound as equivalent in potency to the iron chelating "gold standards" DFO and DFP. The amount of discovered positives (5 out of 34) is expected by the realistic enrichment factor of the model. * Corresponding Author Amiram Goldblum Molecular Modeling and Drug Discovery The Institute for Drug Research, School of Pharmacy The Hebrew University of Jerusalem Jerusalem 91120, Israel Email: [email protected]

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 2 of 23

Introduction

Metal ions such as iron, copper, zinc etc. are needed for metabolism, growth, and proliferation. These metals are co-factors of many macromolecules that are crucial for biological functions such as oxygen transport, metabolism, and DNA synthesis.1 Balancing the cellular level of these metals is essential, because an excess can lead to toxicity, and deficiency might cause damage and lead to many disease conditions.2 Iron is one of the most abundant transition metals in our body, and its chelation could block the growth of cancer cells. Iron participates in many essential biological processes. Conversion between divalent and trivalent forms of iron, interacting with cellular oxidants or reductants, is responsible for its redox activity.3, 4 Cancer cells divide rapidly; they require a large pool of essential metabolites to meet the demand. Most importantly, they need a large pool of deoxy ribose. This deoxy nucleotide is synthesized by ribonucleotide reductase (RR) from ribonucleotides, a reduction that is the rate-limiting step in DNA synthesis. RR is an iron containing enzyme, harboring the non heme di-iron center, essential for catalyzing the deoxyribonucleotide synthesis.4, 5 Cancer cells over express ribonucleotide reductase, and in an effort to meet the increased iron demand, transferrin receptor-1 expression is also significantly increased. These over expressions lead to a high rate of iron uptake; as a result, cancer cells are more sensitive to iron chelators.6 Over the years only a few chemical scaffolds of iron chelators have been reported; the majority are thiosemicarbazones and semicarbazones 4 7 while deferasirox was rationally designed starting from desferithiocin.8, 9 Traditional iron chelators cause cardio toxicity and hypoxia, due to their redox active properties.4, 10 Cancer resistance due to prolonged use of chemically analogous iron chelators is also a concern. Therefore, developing new iron chelators with different chemical scaffolds may have several benefits. To discover novel and diverse scaffolds of iron chelators, we used our in-house iterative stochastic elimination (ISE) algorithm to construct models that distinguish between iron chelators and others, in order to serve for subsequent virtual screening (VS). ISE is a heuristic algorithm used to search and find the best solutions in problems of large combinatorial spaces. This algorithm has been successfully applied to problems with more than 10100 combinations.11 ISE identifies those variables that consistently contribute to the worst results, and eliminates them. Subsequently it produces a smaller set of variable values that reduces the number of combinatorial options. These eliminations allow ISE to move quickly towards identification of the optimal set of states for the problem. ISE may be applied to highly complex problems, provided the following criteria are met: the problem can be presented as a set of variables, these variables should have a set of alternative values, and a scoring or evaluation function can be applied for each state of the system, i.e., for any combination of variables and their values.11, 12 2 ACS Paragon Plus Environment

Page 3 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ISE modeling is based on introducing previous learning data to the 'ISE engine' that generates models. In our lab, we have successfully used ISE based classification methods for modeling Drug likeliness13, predicting which molecules may be remotely loaded to nano liposomes,14 characterizing oral bioavailability,15 hERG toxicity16 and more. The ISE based classification algorithm generates models that are large sets of filters, each filter consisting of ranges of molecular descriptors values for 4-5 properties. The model is composed of a set of filters on the values of the molecular descriptors. The individual filters are selected as the most able to correctly identify active compounds. The model is used for virtual screening of a large commercial library. Virtual screening uses typically catalogs of commercially available molecules, or otherwise could use virtual lists of combinatorial synthesis products. Ideally, this screening should result in identification of active lead molecules that are structurally diverse.17 We begin model building by using previously reported activities of compounds. Due to the use by ISE of physicochemical properties and not of molecular structure details, the models can retrieve structurally diverse hits, even if model-building data is in itself composed of structurally related molecules. Our virtual screening was followed by diversity analysis for choosing the final hits. These hits were subsequently purchased and evaluated for their iron-chelation potential by in vitro experiments.

2

Experimental

2.1 Molecular Dataset We selected 130 iron chelators from literature sources (details provided as supporting information tables S1 and S2). The compounds were chosen based on their IC50 values in iron dependent SK-N-MC (neuroblastoma) cell lines, redox potentials (FeIII/FeII), and their iron chelating ability. The iron chelation assays varied from source to source and are in some cases qualitatively judged (where the iron chelation is reported as single dose study) based on the discussions in those papers. We assigned the molecules ‘high’ or ‘low’ depending on their iron chelating activity. The dataset was "seeded" with decoys (a 100 decoys for every active molecule, assumed to be inactive) to form the learning dataset. The random picking of decoys (from 1.8 million of Enamine dataset) was limited to those molecules that are similar to the actives in some major properties: average values (± 2 standard deviation) of Molecular Weight (MW), number of hydrogen bond donors and hydrogen bond acceptors, and calculated log P. This idea of an "applicability domain" for picking decoys as the "inactives" of the learning set is required in order to avoid the inclusion of very different molecules (i.e., much smaller and having very different character etc.). Such molecules could skew the model due to detecting differences that are not useful for screening. Picking molecules by "applicability domain" increases the difficulties of distinction between actives and decoys but is also more 3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 23

practical for the subsequent VS. We call this screen-1, and the model is ‘actives vs decoys’. Similarly, we also partitioned the dataset of actives into 90 highly active and 30 least active molecules. We call this screen-2, and the model is ‘highs vs lows’ to be used in the virtual screening for additional focusing on molecular "leads". 10 compounds were moderately active and thus removed from the dataset. 2.2 ISE model development We used a standard ISE protocol (supporting information Figure S1) to build the classification models that distinguish between molecules with the desired property and molecules that lack it (i.e., 'actives vs. randoms' and 'highs vs. lows'). Problem solving proceeds as follows: 1) calculate the values of the physicochemical descriptors of interest for all of the molecules in the two databases, using appropriate software. In the present study, MOE18 was used for calculating the values of ~180 descriptors for all the learning set. 2) Eliminate descriptors that have very low standard deviations and those with intercorrelation (dependence) of r2>0.8. When two descriptors are highly correlated, the descriptor having a higher sum of correlations with all others was discarded. 3) For optimizing their ranges, each set of descriptor values is divided into hundred discrete numbers, and 4) all possible ranges of values (n*(n-1)/2 = 4950) are examined in order to find the best one for separating two activity classes. Scoring is done by the Matthews Correlation Coefficient (MCC, equation 1) which takes into consideration true and false positives (TP, FP) and negatives (TN, FN). 5) Divide the learning set (actives and inactives) into 5 equal subsets, picked randomly, for a 5-fold model construction: in each of 5 modeling steps, 4/5 will be used for training and 1/5 will be used for test set. Following these 5 iterations, all the learning set would have been examined and scored as "test set". 6) Pick a few (usually 4 or 5) descriptor ranges to form a “filter”. 7) Test if the descriptor values of a molecule in the training set are within the ranges of all descriptors in that filter. If they are, that molecule is a true positive, and it is a false negative if even one of its descriptors is outside of the descriptor ranges of that filter. For an "inactive" molecule, if one value is outside of the filter's ranges, it will be assigned as true negative, but can be defined "false positive" if all its descriptor values fit the ranges of that filter. 8) All training set molecules are examined based on (7). 9) A scoring function, the Matthews Correlation Coefficient (MCC), is used to measure the ability of this particular filter to distinguish between positives and negatives in the training set. The MCC values are "normalized" between -1 and +1 and MCC is not sensitive to the learning set size. The more positive MCC, the better it is. Eq. 1 Eq. 1: True positive (TP), false positive (FP), true negative (TN), false negative (FN). 4 ACS Paragon Plus Environment

Page 5 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

10) the above process (6-9) is repeated very many times to create many thousands of filters, enough for each variable value randomly selected to appear a large number of times so that it is possible to evaluate its "contribution" to the top best or top worst MCC values (between 1-10% at each end). 11) A virtual histogram of MCC frequencies is used to examine that role of all the (discrete) values of each variable. The values which contribute consistently to the worst MCC values and which do not contribute to best MCC values are rejected, and a new iteration of sampling takes place with many less variable values. No variables are rejected, but some variables may remain with no "ranges" due to discarding all their values. 12) Values of the lower or upper ranges of a variable may thus be identified as contributing to results with the worst scores and subsequently eliminated. The ISE engine repeats that sampling and the elimination steps until a predetermined number of combinations (∼106-107) is reached from the much larger set of initial combinations, 1010 -1030 for sets of 5 descriptors. That smaller number of combinations may be evaluated in full. After calculating all the remaining combinations exhaustively, we have a partial set of filters. More filters are produced by the 5-fold repetition of the above actions applied to the different training and test sets. Once the 5 models have been produced, their filters are combined, sorted and clustered so the very closely related filters (in which TP/FP differs by less than 2% between two filters with similar MCC) from the same or from the different runs are eliminated. The final model is composed of filters from the 5-fold runs, and includes those with MCC values from the best one to the "best minus 20%" from the top MCC value. This may result in hundreds or even more filters. We produced one (combined) model for the 'actives vs randoms' consisting of 750 filters (MCC > 0.95) and another for the "highs vs lows' consisting of 680 filters (having MCC > 0.95). Statistical validations The developed models (set of best filters in each fold) were used for evaluating the test set molecules. The test set molecules were screened through these models; the results were then evaluated based on their statistical metrics. The quality of the models is evaluated based on training (average MCC) and test (area under the ROC curve, and enrichment factor (EF) of top 1%) of the sorted dataset statistics for this study.

5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 23

In addition, all test set molecules were given indexes based on equation 2, 1/5 of them in each of the 5 rounds of model building. So finally, all the learning set molecules were scored and viewed as a scatter plot to show their distribution. n

∑δ MBI =

i =1

active

PTP P − δ inactive TN PFP PFN n

Eq.2

In Equation (2), n is the number of filters, δactive=1 if the molecule passed filter i as a positive, otherwise δactive=0. δinactive=1 if a molecule passed a filter as a negative, otherwise δinactive = 0. PTP/PFP is the proportion of true to false positives in a particular filter and may be called an efficiency factor, whereas PTN/PFN is an inefficiency factor and the proportion of true negatives versus false negatives. Any molecule can obtain a MBI that is positive, which suggests that it may be a candidate for the particular activity, or a negative number, which suggests that it is not expected to be an active candidate. 2.3 Virtual screening 1.8 million molecules from Enamine database were screened through the generated ISE models in two stages. First, the screening was performed through screen-1 ‘actives vs decoys’ and then through screen-2 ‘highs vs lows’. Molecular descriptors were generated for the screened molecules, similarly to those generated for the learning set. For each molecule screened, a Molecular Bioactivity Index (MBI) is computed by adding the score of each filter that is passed "correctly" and subtracting the score for a filter that it fails to pass. Passing "correctly" requires that a molecule's properties will comply in full with all the property ranges in a filter. The MBI for each molecule is finally normalized by the total number of filters (Eq. 2), However the decision which molecules to pick as candidates depends more on the results of the scatter plot, assuming that the greatest chance for finding active molecule is for index values in which TP/FP is maximal, demanding that enough molecules must be found to have those index values among the virtually screened ones. We keep molecules with a MBI of more than 0.8. Then fingerprints based Tanimoto matrix was created for all these molecules. Molecules having lowest nearest neighbor Tanimoto values were selected. Comparison of the chosen molecules with the actives of the learning set was done by calculating the Tanimoto distance between all members of those two sets. Only those molecules were chosen which are different from each other and also different from the actives. Further, molecules containing reactive groups were filtered out and the top 34 most diverse molecules were purchased and subjected to iron chelation assay.

6 ACS Paragon Plus Environment

Page 7 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Diversity analysis Tanimoto based structural fingerprints were generated for all the actives. Binary linear fingerprints from structures were generated. Daylight invariant atom type and default bond order was used.19 After generating the linear fingerprinting, their Tanimoto index is calculated: by c/(a + b − c), where 'a' is the number of bits of structure A. 'b' is the number of bits of structure B, and 'c' is the number of bits that are similar in both A and B. We extracted pair wise Tanimoto distances based on the nearest neighbor method, for diversity comparison between two datasets. The nearest-neighbor average Tanimoto similarity values were used to compare between two libraries.20-22 These nearest neighbor similarities are a convenient way to assess the diversity of a collection of compounds. We calculated the nearest neighbors in the reference library for each compound, using binary linear fingerprints similarity based on Tanimoto. 2.4 Iron chelation assay CALG-Fe(III) (1: 1) complex is prepared with ferrous ammonium sulfate, which was allowed to oxidize in air to ferric iron as a stock solution 5mM in HBS. In a 96 well plate, 100 µl of CalG-Fe (600 nM) solution and 100 µL of the solution of the compounds were added so as to make the final concentration of 10µM for each. The compounds were solubilized in 1% DMSO, diluted with distilled water to meet the required concentration. Final assay volume is 200 µL and the concentration of each compound is 10µM. The plates were incubated at 37 °C for 2 h before reading their fluorescence (excitation/emission wavelength 485 nm/520 nm) using Teccan plate reader. All experiments were carried out in triplicate, and deferoxamine (DFO) was used as standard. Intracellular iron-chelation studies To trace iron transport in live MDA-MB-231 cells, CALG-AM fluorescent assay was used.23 Non-fluorescent CALG-AM is converted to green-fluorescent calcein once it diffuses into live cells, going through acetoxy methyl ester hydrolysis by intracellular esterases. Cells grown in the 96 well plates are incubated with CALG-AM (0.25-0.5uM) for 10 minutes at 37 °C in DMEM-Hepes (pH 7.3). Cells are washed with DMEM-Hepes and replenished with DMEM-Hepes containing 0.5 mM probenecid (to minimize the probe leaking). Fluorescence of cells laden with CALG in plate-reader for 10min was studied and used as baseline. FAS (10 µM) or FeHQ (5 µM) was added to cells and read the fluorescence again for 20 min. Thereafter, various iron chelators at different concentrations were added to cells and read the fluorescence for 20min. deferiprone(DFP), deferoxamine(DFO) were used as positive controls. Fluorescent intensity was measured using fluorescent plate reader (Tecan-Safire; Neotec, Mannedorf, Austria) at excitation/emission wavelength 485 nm/520 nm, with readings taken every 7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 23

minute. All experimental systems are run in triplicate, and the fluorescence intensity values were averaged and normalized to the initial fluorescence. Experiments were performed in triplicate. Each reading at any given time was normalized to the local initial fluorescence level.

3

Results

3.1 Preliminary analysis of the training set We constructed the model building set by collecting the reported iron chelators from literature sources.10, 24-35 The dataset consisted of 130 reported anticancer iron chelators (Figure 1). Compounds having significant iron-chelation activity are assigned as actives. The molecules were selected based on IC50 values in cancer cell lines (SK-N-MC), which were assayed in the presence of iron in the medium. A few molecules were selected based on their Fe(III)/Fe(II) redox potentials. Some were selected based on 'single dose' iron chelation studies. All the selected compounds are bidented or tridented iron chelators. The decoys were chosen within the applicability domain of the actives to reduce artificially good enrichments.36 VS is performed using decoys that resemble the intrinsic physicochemical properties (applicability domain) of MW, number of H-bond acceptors, number of H-bond donors, and log P of the known actives.

8 ACS Paragon Plus Environment

Page 9 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. Representative examples of the model building data set revealing the types of compounds that were used to develop ISE based classification models. All the 130 actives are presented in Supporting Information tables S1 and S2.

The applicability domain for the model building set compounds are numbers of H-bond acceptors = 2 to 4, number of H-bond donors = 2-3, calculated log P = 3.79 ± 1.44, and molecular weight 299.82 ± 51.52. We have used the decoys (inactives/randoms) from the Enamine chemical library.13 For every active molecule, we have chosen 100 decoys. All 13000 decoys were "mixed" with 130 active molecules to form the model building dataset of 13130 molecules. For the ISE analysis, we divided the dataset into training (80%) and test (20%) set. We also performed five-fold randomization, so every molecule gets a chance to be 4 times in the training set and once in the test set. as described in details in Section 2.2. As mentioned above, we generated a model for 'actives vs randoms' and another model with no decoys, of 'highs vs lows' differentiation based highly active (90 molecules) vs low active (30 molecules). 3.2 Quality of the developed models As detailed above (section 2.2), an ISE model is a collection of filters. The filters are composed of ranges of different molecular descriptors. We have randomized the test and training set five times, so that every molecule gets a chance to be once as test set and four 9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

times as training set. Our developed ISE based algorithm iteratively generates a set of molecular descriptors and ranks them according to their ability to identify true positives/negatives. Each filter is scored by MCC, and the algorithm repeats this process optimizing in every step the set of descriptors and their ranges, until it reaches significant MCC values. We used the top 20% of all the filters for further assessment and screening the test set compounds. The final quality of the models was assessed by their ability to predict the test set molecules.

Figure 2. Scatter plot revealing the quality of ISE models (actives vs decoys) that was used for predicting the test set molecules. A significant partition is observed between the positives and decoys, suggesting the model may be further used for screening. The actives vs decoy model was used as the first screen for VS of the entire Enamine database. The realistic true positive: false positive ratio at > 0.8 MBI = 0.18 due to a 10fold underrepresentation of the decoys (see discussion). We thus expect to obtain one active molecule in every 18 purchased.

We used various metrics to assess the quality of the generated models. Average MCC, area under the ROC curve and enrichment factor are used to assess the quality of the developed models (Table1). We observed significant partitioning of the true positives from the true negatives (Figure 2). The developed model is robust, and may be used for further prediction and screening of new molecules.

10 ACS Paragon Plus Environment

Page 11 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 1. Validation statistics generated by ISE based classification models. Actives vs randoms (decoys) used as a first screen, highs vs lows model used as second screening.

ISE models

Top descriptors

Active vs random Fold1

AUC

MCC

EF

0.983

0.977

40.55

Fold2

0.948

0.96

41.91

Fold3

0.991

0.95

58.71

Fold4

0.971

0.964

48.22

Fold5

0.943

0.96

48.67

High vs Low

0.894

0.960

48.87

> 200 occurrences out of 1000 filters SMR_VSA1, PEOE_VSA+3, a_acc, PEOE_RPC+,SlogP_VSA2,lip_acc > 150 occurrences out of 680 filters SMR_VSA2, SlogP_VSA2, SMR_VSA1, TPSA, SlogP_VSA5

Mathews correlation coefficient(MCC), enrichment factor 1%(EF), area under the receiver operating characteristic curver(ROC). SMR-molecular refractivity, atomic contribution model. VSA-van der Waals surface area. PEOE- partial equalization of orbital electronegativity. RPC relative positive partial charge. a_acc- Number of hydrogen bond acceptor atoms (not counting acidic atoms). SlogP- Log of the octanol/water partition coefficient.

The ISE model is composed of ~1000 filters, and each filter is composed of five descriptor ranges. Therefore, a complete mechanistic meaning of all the identified descriptors may not be possible. We retrieved the frequently occurring descriptors (>20%), which we present in supporting information figure S1. We observed that the developed models are dominated by charge and polarizability descriptors. SMR descriptors, which are molecular refractivity descriptors (over a range of subdivided area, VSA) occur in both models (actives vs randoms & highs vs lows). We believe this observation is expected, since polarizability should be helpful towards metal binding. The training set is also dominated by amines (secondary or tertiary), imines and carbazones, which coordinate iron as neutral atoms with their nitrogen lone pair, but are calculated as protonated in the descriptors calculation (PEOE descriptors). In addition, partition coefficient on a subdivided surface area also occurs in both models. The models are statistically robust (by average MCCs, AUC, and enrichment factors), with significant separation of actives and decoys and highs vs lows. Moreover, the models are uniform and consistent; the identified consolidated filters were then used for VS. 3.3 Virtual screening Top 20% of the filters with significant MCC values were identified from each fold, and combined to form a consolidated set of filters; we call it screen-1. Further, the filters from highs vs lows ISE model were also integrated, and used as screen-2. The resulting two stage screens were used for VS of Enamine chemical library37 containing ~1.8 million 11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 23

molecules. The hits with bioactivity index (MBI) > 0.8 were chosen (~650 molecules). These molecules underwent Tanimoto fingerprint based diversity analysis. Molecules containing reactive groups were removed. Finally, top 34 diverse molecules were purchased, and further evaluated as iron chelators (supporting information figures S2 and S3). Average pair wise distances based on nearest neighbor with every other were calculated and plotted (supporting information Figures S1 and S2). This plot depicts diversity of the molecules compared to the rest of the library. We observed that in the training set there is structural similarity with an average Tanimoto > 0.8. Whereas the identified initial set of hits after screening has a significantly low diversity average T < 0.2. The purchased 34 molecules are significantly diverse (Nearest Neighbor Average Tanimoto, NAT ~ 0.1). ISE algorithm was able to identify hits that differ significantly from the actives of the learning set and also found that those hits are more diverse than the original actives. (see supporting information Figure S3). The molecules sent for experimental testing are also diverse with respect to each other. Table 2 shows the structures of 34 compounds purchased and evaluated for iron chelation. Table 2 Top thirty-four Compounds identified as hits from ISE based VS. These compounds were purchased and evaluated for their iron-chelation potential.

1

2

3

4

5

6

7

8

9

10

11

12

12 ACS Paragon Plus Environment

Page 13 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

13

14

15

16

7

18

19

20

22

23

24

26

27

28

31

32

21 O

ON+

N

O N

N

25

O S N N

HN

O

N

29

30

33

34

13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

a) 100

80

% IRON CHELATION

60

40

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 25 26 27 29 30 32 33 34 DF O

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 23

COMPOUNDS

b)

Figure 3. a) Relative % of iron chelation of the purchased hits as assayed in vitro in solution. Compounds 23, 28, and 31 are insoluble or precipitated out in the medium. The results are averages of three experiments. The assay is performed in solution using Cal-G-Fe complex. Ability to extract bound iron from this complex was determined, and plotted as relative percentage with respect to Cal-G-Fe complex. b) Intracellular iron uptake assay for compounds 9 and 33 in MDA-MB-231(breast cancer) cells laden with Cal-G. Both compounds were taken up by the cells. All assays were performed in triplicate. Compound 33 is comparable with deferoxamine (DFO), and compound 9 is comparable with deferiprone (DFP) (50 µM), but less active than DFO (50 µM). Relative protection: The ability of chelators preloaded into cells that reduce quenching of cell associated CALG was taken as measure of chelator uptake into cells. 9 and 33 showed their uptake into cells and protection effect against extracellular iron loading.

14 ACS Paragon Plus Environment

Page 15 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.4 In vitro screening The results of the Cal-G assay indicated 5 compounds with significant iron chelation property. The identified hits were compounds 9, 20, 21, 22, and 33. Except for compound 20, all other hits are likely to be tri-dented iron chelators. During iron chelation experiments all the five compounds are consumed in twice molar equivalents to iron, suggesting two moles per compound are needed to chelate one mole of iron. Compounds 9 and 33 were the most potent with chelation of ~80% and 95% respectively (Figure 3) at 10 µM. We have chosen compounds 9 and 33 for further evaluation on intracellular iron chelation in MDA-MB-231(breast cancer) cells laden with Cal-G and quenched with externally added iron. We find that compound 33 is able to chelate intracellular iron at ~50 µM within 30 mins, similar to DFP. Compound 9 chelated iron, but the response was slower compared to 33. However, both compound 9 and 33 showed equivalent protection to DFP after 3h of pre-incubation. Compound 33 also showed equivalent protection to DFO. From these studies compound 33 evolved as the lead molecule that is rapidly taken up by the cancer cells and show significant protection (Figure 3).

4

Discussion

The main useful iron chelators were limited to just a few scaffolds to date. Discovering new chemical scaffolds of iron chelators could have substantial advantages with respect to their potential as drug candidates, in particular in cases of cellular resistance to iron chelation. Such discovery clearly increases the chances of drug discovery, as branching off (through subsequent chemical modifications) from different scaffolds encompasses larger chemical space. Most iron chelators were discovered serendipitously or intuitively. Here, we used rational drug design (specifically, ISE based modeling for screening) to discover different iron chelating scaffolds. We used Virtual Screening (VS), which is generally the first step in drug discovery research. VS has shown here the ability to find newer, structurally diverse, active compounds. These identified compounds should serve as a basis for further improvement. ISE modeling is based on known active molecules, represented by their physicochemical properties. ISE attempts to optimize the properties that differentiate between one class of molecules from another. Discovery of newer diverse scaffolds enhances the chance of getting the right molecules in subsequent optimization steps. Earlier reported iron chelators may be classified into 4-5 categories. Reported inhibitors for thiosemicarbazone/ semicarbazone overwhelm other classes. The model building dataset is therefore partially biased. Thus, learning from known chelators faces greater challenge to come up with compounds having different scaffolds. Since the ISE process learns from precedents using only physiochemical descriptors with no bias towards 15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

chemical structures, the final models have a greater chance to identify molecules with other structures that have some main similar properties. Therefore, VS may end up with structures that are diverse compared to the model building dataset. The ISE model was built using standard methods and yields statistically significant models. We used a two-stage screening, first "filtering" through the model of the 'actives vs randoms' and subsequently filtering through the model of 'highs vs lows'. The actives vs randoms model uses decoys within the applicability domain of the actives. Actives vs randoms separation and resulting statistics may be deceptively over performed if the randoms are not within applicability domain.36 We used decoys that resemble the intrinsic physicochemical properties (applicability domain) like MW, no of H-bond acceptors, H-bond donors, log P etc. The coordination information is indirectly represented by values of certain descriptors. The training set is dominated by amines (secondary or tertiary), imines, and carbazones. These groups chelate through their lone pairs in their neutral state, but have descriptors that reflect their ability to attract protons (another type of “coordination”, PEOE descriptors). Negative partial charge descriptors were also encountered (PEOE- , see supporting information Figure S4) accounting for C=O and C=S in hydrazones, semicarbazones, and thiosemicarbazones. The developed models are dominated by charge and polarizability (MR descriptors) descriptors (Table 1), which is expected for metal chelators. Polarizability, which is conducive to metal binding, occurs in both models (actives vs randoms & highs vs lows). The discovered iron chelators (molecules 9, 20, 21, 22 and 33) contain, each, functions that can be used for iron chelation, All 5 molecules have at least bidentate ability due to the proper functional groups, which are mostly nitrogen aromatics with lone pairs in the plane of the rings. However, we also detect amides which have partial negative charges on the oxygen, sulfur, phenol, and imine. These molecules differ in the functional groups as well as in the number of rotatable bonds which could enable "folding" in order to function as tridentate, as indicated by the experimental results After ISE based VS all the molecules get a score – their "molecular bioactivity index" (MBI) for this specific activity, a value between 0 and 1. This score is a result of a molecule's ability to pass successfully a model's filters. We have given a cut-off of MBI >0.8 for keeping the top molecules. The resulting molecules were then subjected to diversity analysis using Tanimoto finger printing based method. Figure S3 in supporting information presents the increase in diversity from that of the training set molecules to the diversity of compounds chosen for purchase. Only molecules that are diverse from each other and diverse from the training set were purchased. The goal of this is to have candidates with novel, diverse and hopefully bioactive iron chelators. Highly potent hits 16 ACS Paragon Plus Environment

Page 17 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

are desirable, but this is not the only condition. The primary goal is to identify as many diverse starting points as possible.17 The purchased compounds are indeed highly diverse compared to themselves and to the training set molecules. The compounds were then assayed in vitro for iron chelating ability. In Cal-G assay, five compounds were identified as active molecules. None of those hits have similar scaffolds. Our identified leads compound 9, 20, 21, 22 and 33 are the five diverse scaffolds from where we may branch off during subsequent optimization steps. We further studied two compounds (9, 33) for cellular uptake and intra cellular iron chelation. Compound 33 is the most active molecule with anthranilic acid hydrazine as the core scaffold, with a phenolic hydroxyl group. The average Tanimoto of compound 33 as compared to all models building set is ~0.1. Compound 9, another lead molecule with average Tanimoto from training set is ~0.08, is significantly diverse from the model building set; structurally, it contains a secondary amine with imidazole and pyrazole side chains. Compounds 9 and 33 both showed their uptake into cells and protection effect against extracellular iron loading. The other lead molecules 20, 21, and 22 have Tanimoto of 0.04, 0.093, and 0.095 respectively. All the identified molecules are not known in any publication and are novel (Figure 3). All the hits were novel, and not reported in scientific literature or patents (as revealed by structure search in SciFinder®). We also performed a similarity search on these structures; this search retrieved no hits until 80% similarity criteria. We determined the molar equivalents of the test molecules per iron atom as part of the calcein assay as described in section 3.4 (in vitro screening). Four out of the five best chelators (9, 21, 22, 33) coordinate iron in 2:1 molar equivalents, which may suggest either bidentates or tridentates (various geometries are possible). One of the actives (2'benzoyl pyridine thiosemicarbazone) was crystallized with iron showing a tridentate octahedral geometry.27 The decision for picking molecules from commercial databases is based on the amounts of true positives (TP) vs. false positives (FP) at different cutoffs of the bioactivity indexes (MBI) values. Those are shown in table S3 of the supporting information. The higher the MBI cutoff, the greater is the value of TP/FP, but the total number of molecules is smaller, therefore there must be a balance between an improved MBI and a realistic number of molecules. The "discovery expectation" may be suggested by using the values of TP/(TP+FP) at each MBI cutoff, which is the number of expected TP out of the total above that cutoff. For example, at a cutoff MBI = 0.4 in table S3 the "expectation" should be to find 79 actives out of a total of 145 (79 TP + 66 FP) or about 1 out of any 2 molecules. So sending even 10 molecules could be sufficient to discover 5 actives. Those numbers are

17 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 23

based on the model, which has 130 actives (from which the 79 TPs are found above MBI = 0.4) and 13000 decoys (from which 66 FPs are found above that cutoff). Those numbers are however highly overestimated in suggesting discovery, because of the low number of decoys in the learning set. That number should have been closer to the expected rate of discovery in High Throughput Screenings (HTS), which is ~1 hit among 1000 tested molecules as a “rule of thumb”.38 But such dilution in the virtual modeling would mean having 130 actives vs. 130,000 decoys, which requires extremely longer computation times. It is therefore useful to "adjust" the numbers of decoys after model construction, which affects the model's FPs, requiring to multiply it in our case by 10. If we use a higher MBI cutoff, the proportion of TP to FP increases, and above MBI = 0.8 we find 76 TPs out of 130 and 41 FPs out of 13,000, which should indeed be 410/130,000. Therefore the expected discovery of actives is 76/(76+410) , or 5 actives out of ~35 molecules, which is exactly what was discovered. Enrichments for practical purposes are therefore much lower than those computed from the learning sets, due to the relatively small numbers of inactives compared to a much larger "chemical space" in chemical libraries. The reason for those smaller numbers of decoys is to avoid very long computations. The above discussion is valid for classification modeling in which actives are to be discovered among randoms ("decoys"), with "dilution" at the HTS levels of 1:1000. In the case of the "highs vs. lows" model, with 90 "highs" vs. 30 "lows", there is no need to "adjust" the numbers. These compounds may be regarded as promising hits or candidate lead molecules. A new scaffold allows points from which we can branch off for further structural optimization. We identified five diverse and moderately active molecules. This study leads to new iron chelating scaffolds. Now we have five different chemical scaffolds to initiate lead optimization.

5

Conclusions

Iron chelators have therapeutic potential in iron overload disease, as anticancer, neuronal disorders, antimicrobial activity, and many more.39 Iron chelation is now a validated strategy in many such disorders, but only a few chemical scaffolds have been reported and explored so far. Medicinal chemistry literature is overloaded with reports on similar synthetic derivatives consisting of 4-5 chemical scaffolds (mainly, hydrazones, semicarbazones, thiosemicarbazones etc.). We report here rational designing strategy to develop iron chelators that are structurally novel and diverse. We conclude that: 18 ACS Paragon Plus Environment

Page 19 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1) Our Iterative Stochastic Elimination algorithm is capable of constructing models that are used for finding novel scaffolds for iron chelation. 2) Classification models by ISE produce a set of multiple filters that allow to score millions of molecules on the basis of passing (or not) filters, adding (or subtracting) their quality and normalizing so that all screened molecules may be ordered 3) Enrichment of the models must be modified to suggest a realistic enrichment, which reflects the chance to discover iron chelators from in very large chemical database, due to its small representation as decoys 4) The diversity of the learning dataset does not imply diversity of the discovered molecules in screening. In the present case, the final molecules have a greater diversity than those used to produce the model. 5) It is possible to discover leads by screening through ISE models. Supporting Information Table S1. Compounds used as training set molecules Table S2. Smiles notation of the training set compounds. Figure S1. Left: describes the protocol followed to identify the iron chelators. Right: ISE workflow. Figure S2. Histogram of the frequency of main descriptors in the ISE models Figure S3. Heat maps for the Tanimoto matrix presenting the diversity of the training set vs. final hits Figure S4. Nearest-Neighbors' Average Tanimoto (NAT) Table S3. Ratio of TP/FP at different molecular bioactivity index (MBI) cut offs. Supporting information is available free of charge via the Internet at http://pubs.acs.org. Acknowledgments This research was supported by The Israel Science Foundation (grant No 999/15). AB thanks the planning and Budgeting Committee of the Council for Higher Education at the Israel Ministry of Education for a postdoctoral fellowship. List of Abbreviations a_acc, number of hydrogen bond acceptor atoms (not counting acidic atoms); CALG, calcein green; DFO, deferoxamine; DFP, deferiprone; DMEM, Dulbecco's Modified Eagle's Medium; EF, enrichment factor; FAS, ferrous ammonium sulfate; FeHQ;Fehydroxyquinoline complexes; FN, false negatives; FP, false positives; hERG, human Ether-à-go-go-Related Gene; HTS, high throughput screening; ISE, Iterative 19 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 23

Stochastic Elimination; MBI, Molecular Bioactivity Index; MCC, Matthews Correlation Coefficient; NAT, nearest neighbor average tanimoto; PEOE, partial equalization of orbital electronegativity; ROC, receiver operating characteristic; RPC, relative positive partial charge; RR, ribonucleotide reductase; SlogP, Log of the octanol/water partition coefficient; SMR, molecular refractivity, atomic contribution model; TN true negatives; TP, true positive; VS, virtual screening; VSA, van der Waals surface area. References 1. Cai, L.; Li, X.-K.; Song, Y.; Cherian, M. G., Essentiality, toxicology and chelation therapy of zinc and copper. Curr. Med. Chem. 2005, 12, 2753-2763. 2. Durham, T. R.; Snow, E. T., Metal ions and carcinogenesis. In Cancer: cell structures, carcinogens and genomic instability, Springer: Birkhäuser Basel, 2006; pp 97130. 3. Yu, Y.; Kalinowski, D. S.; Kovacevic, Z.; Siafakas, A. R.; Jansson, P. J.; Stefani, C.; Lovejoy, D. B.; Sharpe, P. C.; Bernhardt, P. V.; Richardson, D. R., Thiosemicarbazones from the old to new: iron chelators that are more than just ribonucleotide reductase inhibitors. J. Med. Chem. 2009, 52, 5271-5294. 4. Yu, Y.; Gutierrez, E.; Kovacevic, Z.; Saletta, F.; Obeidy, P.; Suryo Rahmanto, Y.; R Richardson, D., Iron Chelators for the Treatment of Cancer. Curr. Med. Chem. 2012, 19, 2689-2702. 5. Merlot, A. M.; Kalinowski, D. S.; Richardson, D. R., Novel Chelators for Cancer Treatment: Where are We Now? Antioxid. Redox Signal. 2013, 18, 973-1006. 6. Shao, J.; Zhou, B.; Chu, B.; Yen, Y., Ribonucleotide reductase inhibitors and future drug design. Curr. Cancer Drug Targets 2006, 6, 409-431. 7. Liu, Z. D.; Hider, R. C., Design of iron chelators with therapeutic application. Coord. Chem. Rev. 2002, 232, 151-171. 8. Alberti, D., Deferasirox (ICL670): from bench to bedside. Hematologica Rep 2005, 8, 7-10. 9. Nick, H.; Acklin, P.; Lattmann, R.; Buehlmayer, P.; Hauffe, S.; Schupp, J.; Alberti, D., Development of tridentate iron chelators: from desferrithiocin to ICL670. Curr. Med. Chem. 2003, 10, 1065-1076. 10. Kovacevic, Z.; S Kalinowski, D.; B Lovejoy, D.; Yu, Y.; Suryo Rahmanto, Y.; C Sharpe, P.; V Bernhardt, P.; R Richardson, D., The medicinal chemistry of novel iron chelators for the treatment of cancer. Curr. Top. Med. Chem. 2011, 11, 483-499. 11. Glick, M.; Rayan, A.; Goldblum, A., A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins. Proc. Natl. Acad. Sci. USA 2002, 99, 703-708. 12. Stern, N.; Goldblum, A., Iterative Stochastic Elimination for Solving Complex Combinatorial Problems in Drug Discovery. Isr. J. Chem. 2014, 54, 1338-1357. 13. Ursu, O.; Rayan, A.; Goldblum, A.; Oprea, T. I., Understanding Drug-likeness. WIREs Comput. Mol. Sci. 2011, 1, 760-781. 14. Cern, A.; Golbraikh, A.; Sedykh, A.; Tropsha, A.; Barenholz, Y.; Goldblum, A., Quantitative structure-property relationship modeling of remote liposome loading of drugs. J. Controlled Release 2012, 160, 147-157. 15. Rayan, A.; Marcus, D.; Goldblum, A., Predicting oral druglikeness by iterative stochastic elimination. J. Chem. Inf. Model. 2010, 50, 437-445. 16. Rayan, A.; Falah, M.; Raiyn, J.; Da'adoosh, B.; Kadan, S.; Zaid, H.; Goldblum, A., Indexing molecules for their hERG liability. Eur. J. Med. Chem. 2013, 65, 304-314.

20 ACS Paragon Plus Environment

Page 21 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

17. Scior, T.; Bender, A.; Tresadern, G.; Medina-Franco, J. L.; Martínez-Mayorga, K.; Langer, T.; Cuanalo-Contreras, K.; Agrafiotis, D. K., Recognizing pitfalls in virtual screening: a critical review. J. Chem. Inf. Model. 2012, 52, 867-881. 18. Molecular Operating Environment (MOE), 2010.10 Chemical Computing Group Inc.: 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2010, 2010. 19. Nikolova, N.; Jaworska, J., Approaches to measure chemical similarity–a review. QSAR Comb. Sci. 2003, 22, 1006-1026. 20. An, Y.; Sherman, W.; Dixon, S. L., Hole filling and library optimization: Application to commercially available fragment libraries. Biorg. Med. Chem. 2012, 20, 5379-5387. 21. An, Y.; Sherman, W.; Dixon, S. L., Kernel-Based Partial Least Squares: Application to Fingerprint-Based QSAR with Model Visualization. J. Chem. Inf. Model. 2013, 53, 23122321. 22. López-Vallejo, F.; Nefzi, A.; Bender, A.; Owen, J. R.; Nabney, I. T.; Houghten, R. A.; Medina-Franco, J. L., Increased Diversity of Libraries from Libraries: Chemoinformatic Analysis of Bis-Diazacyclic Libraries. Chem. Biol. Drug Des. 2011, 77, 328-342. 23. Chen, M.-p.; Cabantchik, Z. I.; Chan, S.; Chan, G. C.-f.; Cheung, Y.-f., Iron Overload and Apoptosis of HL-1 Cardiomyocytes: Effects of Calcium Channel Blockade. PLoS ONE 2014, 9, e112915. 24. Richardson, D. R.; Sharpe, P. C.; Lovejoy, D. B.; Senaratne, D.; Kalinowski, D. S.; Islam, M.; Bernhardt, P. V., Dipyridyl Thiosemicarbazone Chelators with Potent and Selective Antitumor Activity Form Iron Complexes with Redox Activity. J. Med. Chem. 2006, 49, 6510-6521. 25. Kalinowski, D. S.; Yu; Sharpe, P. C.; Islam, M.; Liao, Y.-T.; Lovejoy, D. B.; Kumar, N.; Bernhardt, P. V.; Richardson, D. R., Design, Synthesis, and Characterization of Novel Iron Chelators:  Structure−Activity Relationships of the 2-Benzoylpyridine Thiosemicarbazone Series and Their 3-Nitrobenzoyl Analogues as Potent Antitumor Agents. J. Med. Chem. 2007, 50, 3716-3729. 26. Richardson, D. R.; Kalinowski, D. S.; Richardson, V.; Sharpe, P. C.; Lovejoy, D. B.; Islam, M.; Bernhardt, P. V., 2-Acetylpyridine Thiosemicarbazones are Potent Iron Chelators and Antiproliferative Agents: Redox Activity, Iron Complexation and Characterization of their Antitumor Activity. J. Med. Chem. 2009, 52, 1459-1470. 27. Stefani, C.; Punnia-Moorthy, G.; Lovejoy, D. B.; Jansson, P. J.; Kalinowski, D. S.; Sharpe, P. C.; Bernhardt, P. V.; Richardson, D. R., Halogenated 2′-Benzoylpyridine Thiosemicarbazone (XBpT) Chelators with Potent and Selective Anti-Neoplastic Activity: Relationship to Intracellular Redox Activity. J. Med. Chem. 2011, 54, 6936-6948. 28. Lovejoy, D. B.; Sharp, D. M.; Seebacher, N.; Obeidy, P.; Prichard, T.; Stefani, C.; Basha, M. T.; Sharpe, P. C.; Jansson, P. J.; Kalinowski, D. S.; Bernhardt, P. V.; Richardson, D. R., Novel Second-Generation Di-2-Pyridylketone Thiosemicarbazones Show Synergism with Standard Chemotherapeutics and Demonstrate Potent Activity against Lung Cancer Xenografts after Oral and Intravenous Administration in Vivo. J. Med. Chem. 2012, 55, 7230-7244. 29. Stefani, C.; Jansson, P. J.; Gutierrez, E.; Bernhardt, P. V.; Richardson, D. R.; Kalinowski, D. S., Alkyl Substituted 2′-Benzoylpyridine Thiosemicarbazone Chelators with Potent and Selective Anti-Neoplastic Activity: Novel Ligands that Limit Methemoglobin Formation. J. Med. Chem. 2012, 56, 357-370. 30. Lui, G. Y.; Obeidy, P.; Ford, S. J.; Tselepis, C.; Sharp, D. M.; Jansson, P. J.; Kalinowski, D. S.; Kovacevic, Z.; Lovejoy, D. B.; Richardson, D. R., The iron chelator, deferasirox, as a novel strategy for cancer treatment: oral activity against human lung tumor xenografts and molecular mechanism of action. Mol. Pharmacol. 2013, 83, 179-190.

21 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 23

31. Lukmantara, A. Y.; Kalinowski, D. S.; Kumar, N.; Richardson, D. R., Synthesis and biological evaluation of substituted 2-benzoylpyridine thiosemicarbazones: Novel structure– activity relationships underpinning their anti-proliferative and chelation efficacy. Bioorg. Med. Chem. Lett. 2013, 23, 967-974. 32. Lovejoy, D.; Richardson, D., Iron chelators as anti-neoplastic agents: current developments and promise of the PIH class of chelators. Curr. Med. Chem. 2003, 10, 10351049. 33. Kontoghiorghes, G. J., Iron mobilization from transferrin and non-transferrin-boundiron by deferiprone. Implications in the treatment of thalassemia, anemia of chronic disease, cancer and other conditions. Hemoglobin 2006, 30, 183-200. 34. Richardson, D. R.; Tran, E. H.; Ponka, P., The potential of iron chelators of the pyridoxal isonicotinoyl hydrazone class as effective antiproliferative agents. Blood 1995, 86, 4295-4306. 35. Serda, M.; Kalinowski, D. S.; Mrozek-Wilczkiewicz, A.; Musiol, R.; Szurko, A.; Ratuszna, A.; Pantarat, N.; Kovacevic, Z.; Merlot, A. M.; Richardson, D. R.; Polanski, J., Synthesis and characterization of quinoline-based thiosemicarbazones and correlation of cellular iron-binding efficacy to anti-tumor efficacy. Bioorg. Med. Chem. Lett. 2012, 22, 5527-5531. 36. Verdonk, M. L.; Berdini, V.; Hartshorn, M. J.; Mooij, W. T.; Murray, C. W.; Taylor, R. D.; Watson, P., Virtual screening using protein-ligand docking: avoiding artificial enrichment. J. Chem. Inf. Comput. Sci. 2004, 44, 793-806. 37. EnamineStore. http://www.enamine.net/ (Aug 28, 2016) 38. Posner, B. A.; Xi, H.; Mills, J. E., Enhanced HTS hit selection via a local hit rate analysis. J. Chem. Inf. Mod. 2009, 49, 2202-2210. 39. Sheth, S., Iron Chelation: An update. Curr. Opin. Hematol. 2014, 21, 179-185.

22 ACS Paragon Plus Environment

Page 23 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Graphical abstract

23 ACS Paragon Plus Environment