Compound Prioritization in Single-Concentration Screening Data

Aug 26, 2016 - In addition, the “cost/benefit” of using different single-concentration ligand efficiency thresholds is systematically analyzed by ...
2 downloads 12 Views 3MB Size
Article pubs.acs.org/jcim

Compound Prioritization in Single-Concentration Screening Data Using Ligand Efficiency Indexes Gonzalo Colmenarejo* Department of Computational Chemistry, Centro de Investigación Básica, GSK, Parque Tecnológico de Madrid, Tres Cantos 28760, Spain S Supporting Information *

ABSTRACT: The triage of compounds at the singleconcentration screening phase of high-throughput screening (HTS) requires multiobjective optimization in order to achieve the best selection of hits, both in terms of potency and physicochemical properties, for a given number of compounds to test. In this regard, ligand efficiency indexes, well established as guides for hit prioritization in the dose− response phase of HTS, are less studied in the singleconcentration phase. In the present work the use of ligand efficiency indexes in the prioritization of compounds in singleconcentration is investigated. Formulas for deriving them from single-concentration screening data are provided. The statistical association between the single-concentration and dose−response ligand efficiency indexes is evaluated with a wide historical data set including multiple screens of different target classes and screening technologies. The results show Pearson’s correlation coefficients r above 0.9 for compounds with dose−response curves, and areas under the curve (AUC) of receiver operating characteristic (ROC) curves above 0.85 after including compounds with no dose−response curves. This good statistical association contains the contribution of both the physicochemical parameter(s) in the ligand efficiency indexes, as well as the biological activity, as demonstrated by permutation tests. The “cost/benefit” of using different thresholds of single-concentration ligand efficiency indexes in rescuing different numbers of efficient compounds is systematically investigated, and cost/benefit curves are provided. Approximate thresholds are proposed for the different ligand efficiency parameters that results in large percentages of efficient compounds rescued while attempting to reduce the cost of compounds to test in dose−response mode. Finally, a practical example of implementation of these indexes that considers clustering of compounds is described, where the rescue of efficient compounds is higher and with a much lower cost than a typical response-driven selection.



INTRODUCTION High-throughput screening (HTS) is one of the main sources of new chemical starting points for drug discovery.1 In its most typical implementation, there is an initial step where the whole collection of compounds is tested in a single-concentration mode in order to obtain for all the compounds a controlsnormalized response (inhibition or activation, depending on the screen at hand). This is followed by a step where the hits are evaluated in multiple concentrations (dose−response mode), in order to determine their pIC50 (= −log10(IC50), where IC50, in molar units in this work, is the concentration of compound that produces 50% of the maximum normalized response). Once the pIC50 values are determined for the hits, a prioritization process is followed where the compounds are ranked according to one or multiple properties, in order to decide which series to devote chemical optimization resources in. In the past, potency alone (pIC50) was used for this purpose, and the compound or compounds that entered the chemical optimization programs were those displaying the maximum biological effect. However, this simplistic view lead to © XXXX American Chemical Society

problems, as it was observed that during the chemical optimization process there is a trend to inflate the physicochemical properties, particularly the molecular mass and lipophilicity.2 As a result, the compounds entering the clinical development phase tended to have physicochemical properties outside the desirable range and therefore were subjected to a large attrition rate, due to bad absorption and permeability properties, as well as a high promiscuity and therefore toxicity. In order to avoid these difficulties, one of the main approaches has been the development of multiple “ligand efficiency” indexes that counterbalance potency with physicochemical parameters in a single expression.3 The first of these indexes proposed, and the most widely used, is the ligand efficiency or LE, originally defined as the binding free energy per heavy atom of the molecule;4,5 in our HTS context it can be expressed as Received: May 26, 2016

A

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

LE = 1.37pIC50/HA

dose−response phase. It is advisible to use the same ligand efficiency parameter in single-concentration and dose− response screening data in the case they were statistically associated, which is one of the key points addressed in this work. On the other hand, to our knowledge no parameter has been proposed to prioritize single-concentration screening data by counterbalancing potency and other properties like lipophilicity.

(1)

where HA is the number of heavy (non-hydrogen) atoms in the molecule; this equation counterbalances a measure of affinity (pIC50) with the number of heavy atoms. Soon the concept was extended to other physicochemical properties, such as lipophilicity, as in the lipophilic ligand efficiency index or LLE: LLE = pIC50 − cLogP

(2)



where cLogP is the in silico calculated logarithm of the noctanol−water partition coefficient.6,7 A third ligand efficiency index that we will consider in this work is LLEAT, that is a lipophilic LE that is designed to have the same target value and dynamic range as LE:8 LLEAT = 0.111 + 1.37

pIC50 − clogP HA

MATERIALS AND METHODS All the statistical analyses were performed in R.10 For the ROC analyses the pROC package11 was used. Visualization of the results was done in R and TIBCO Spotfire.12 Data Sets. A total of 10 different screens were retrieved from the GSK databases to be used in this work. They comprise a wide variety of target classes (including two phenotypic screens) and technologies (see Table 1).

(3)

LLEAT has the interesting property that it balances potency with both molecular size (number of heavy atoms) and lipophilicity simultaneously. Although the use of ligand efficiency indexes in the ranking of dose−response screening data is well established,2 it remains to be seen whether they could be extended to singleconcentration data as well. The hit selection process in this phase would largely benefit if that were the case, because it would increase the alignment with the prioritization performed in the dose−response phase and therefore allow a more effective identification of ligand efficient hits. It would also permit a more rational use of the resources in the singleconcentration screen, especially when the hit rate is large and/ or reagents are scarce and in silico triage methods need to be adopted. In the present work, the use of ligand efficiency indexes for the prioritization of single-concentration screening data is investigated. Formulas for calculating them from singleconcentration responses are provided, and using historical data, the statistical association between the dose−response and single-concentration ligand efficiency indexes is evaluated. In addition, the “cost/benefit” of using different single-concentration ligand efficiency thresholds is systematically analyzed by generating cost/benefit curves, and approximate cutoffs are proposed that give a large rescue of efficient compounds while attempting to reduce the number of compounds to test in dose−response. Finally, a practical implementation example is given that includes clustering of the hits, which is shown to be clearly superior to typical response-driven selection of compounds. Previous Related Work. A previous attempt to prioritize single-concentration screening data with a ligand efficiency index corresponds to the so-called “percentage efficiency index” or PEI = %inhibition/MW, where % inhibition is the percentage inhibition at a specific concentration of compound (response in our context), and MW is the molecular weight.9 This index counterbalances potency (in the form of percent inhibition) and molecular mass (in the form of molecular weight) and has as reference value 1.5. However, in the case of the PEI no systematic statistical analysis of the association with the output at the dose−response phase was provided. In addition, this parameter is functionally different to the dose− response ligand efficiencies, including the “binding efficiency index” or BEI = pIC50/MW of the same authors.9 This results in a worse alignment of the single-concentration prioritization with the well-established ligand efficiency indexes used at the

Table 1. HTS Analyzed in This Work with Their Target Class and Screening Technologya HTS

target class

technology

1 2 3 4 5 6 7 8 9 10

other enzyme transporter GPCR whole cell viability other enzyme other enzyme kinase nuclear receptor kinase kinase

FLINT FP β-arrestin reporter luminescence absorbance kinetic FLINT kinetic luminescence luciferase reporter TRFRET FLINT

data set p2d, p2d, p2d, p2d p2d, p2d p2d, p2d, p2d, p2d,

c2d c2d c2d c2d c2d c2d c2d c2d

a

Column data set indicates whether the screen is represented in both the p2d and c2d data sets or only in the p2d data set (see below).

Two types of screening processes were followed in these screens (Figure 1). In the first strategy (Figure 1a), corresponding to 8 of the 10 screens, an initial “primary” screening phase at single concentration of the whole screening collection was followed by a “confirmation” phase of the statistical hits, hits being defined as those compounds 3 SD above the mean of the response distribution. The confirmation phase was also performed at single concentration and its main objective was to clean the list of hits from false positives. Finally, the single-concentration confirmed hits, in this case identified from the control population because of the expected high response of the compounds, were submitted to a dose− response experiment, used as final confirmatory experiment where the pIC50 of each hit is estimated. For these screens, two types of data sets were created: one where the primary response is correlated to the dose−response results (“p2d” in Table 1), and another where the confirmation response is correlated to the dose−response results (“c2d” in Table 1). In the second type of screening process, corresponding to 2 of the 10 screens studied (Figure 1b), a subset of the screening collection was tested in a primary screen as before, but now the statistical hits were sent directly to a dose−response experiment. In this case, only data sets correlating primary response with dose−response experiments (“p2d” in Table 1) were generated. The data sets in all cases contain for each compound the single-concentration response, a categorical variable indicating B

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

where Amin and Amax are the minimum and maximum asymptotes of the dose−response curve, respectively, C is the concentration of the compound, IC50 is the inflection point, or concentration where 50% of the effect is observed, and H is the Hill slope. All the screens used in this work were normalized with a low and a high control in each plate, so responses go from 0 to 100%, and we can set Amin = 0 and Amax = 100. In addition, all the data points were obtained at 10 μM compound concentration, so here we will use C = 1 × 10−5 M. By assuming a Hill slope H = 1 in the previous equation, isolating the IC50, and taking the negative decimal logarithm, it is possible to obtain a pIC50 value for each single-concentration response value, scpIC50: ⎤ ⎡⎛ 100 ⎞ scpIC50(Resp) = − log10⎢⎜ − 1⎟10−5⎥ ⎥⎦ ⎢⎣⎝ Resp ⎠

(5)

The domain of this function is 0 > Resp > 100, and it varies very quickly as it approaches these limits; so, we only considered responses between 5 and 95% for the analysis. The number of statistical hits with responses 95% is proportionally very small, so they could be progressed directly without changing significantly the cost of compound confirmation. Alternatively, they could be prioritized based on different criteria, like chemical diversity, substructure filters and physicochemical properties, but using ligand efficiency criteria in this case is unreliable given the large uncertainty in the pIC50 values of these hits. It is possible to plug the scpIC50 value obtained from a single-concentration inhibition data point with eq 5 into the equations for LE, LLE, and LLEAT in order to obtain singleconcentration ligand efficiencies, which will be denoted in what follows as scLE, scLLE, and scLLEAT. Clustering of Compounds. The compounds were clustered with Butina’s sphere exclusion method13 as implemented in RDKit,14 using a similarity threshold of 0.75. Clustering was performed separately for each screen.

Figure 1. Screening processes followed in the HTS used in this work, and the corresponding obtained data sets used in the analysis. (a) Primary screen at single-concentration, followed by confirmation screen (single-concentration) and dose−response screen. (b) A fraction of the screening collection is tested in the primary screen, followed directly by a dose−response screen.

if the compound confirmed its single-concentration activity or not (“1” if yes, “0” if not), and the LE, LLE, and LLEAT for those compounds with an adjustable dose−response curve. In addition, clogP and the number of heavy atoms were included. These data sets are provided as Supporting Information. All the single-concentration experiments were performed at a concentration of compound of 10 μM. All the dose−response experiments were performed at a starting concentration of 100 μM, followed by 10 1:3 dilutions. In these conditions, the lowest pIC50 measurable was 4. In some cases, there was an excess of compounds resulting from the primary screen or confirmation screen; in silico chemical filtering or an additional secondary/orthogonal screen was then applied to reduce the list of compounds to a manageable size. For those cases, in order to avoid unbalancing the confirmation rates at different response ranges, the same filtering was applied to the nonconfirmed hits and those compounds were removed from the data sets. Data with responses below 5% or above 95% were also removed, as they provide predictions for pIC50 and ligand efficiency parameters with high variance. The very few compounds in those response ranges should be treated separately (see next section). In this way, a total of 116 227 rows in the case of the p2d data set (with 10 screens), and 28 397 in the case of the c2d data set (with 8 screens), were collected. In turn, the p2d data set contains 25 651 compounds with measured pIC50 values, and the c2d data set 22 508. Calculation of scLE, scLLE, and scLLEAT from SingleConcentration Responses. The definitions of LE, LLE, and LLEAT include the pIC50 (see eqs 1−3). The latter is normally obtained from a fit of the four-parameter logistic equation to a series of dose−response data points: A − A max Resp(C) = A max + min H C 1 + IC50 (4)



RESULTS The Results section of the paper will comprise the following analyses: correlation of pIC50 vs scpIC50; correlation of scLE, scLLE, scLLEAT vs LE, LLE, LLEAT for hits giving dose− response curves; association of scLE, scLLE, scLLEAT vs LE, LLE, LLEAT for all hits; cost/benefit analysis in triaging with single-concentration ligand efficiency indexes; and a practical example implementation of the indexes with clustering. Correlation between pIC50 Values Obtained from Dose−Response Data and Those Obtained from SingleConcentration Data (scpIC50, eq 5). Table 2 collects the Pearson’s correlation coefficients r and RMSE (root mean squared error) between the scpIC50 values obtained using eq 5 and those obtained from the fit of dose−response curves for the different screens for both the p2d and c2d data sets. Different degrees of association are observed, but in all the cases statistically significant correlations (p < 2 × 10−16) were obtained. This is expected because otherwise the screens would have no capacity to identify hits that confirm in dose−response curves.

( )

C

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 2. Pearson’s Correlation Coefficient r and RMSE between Single-Concentration (scpIC50, eq 5) and Dose− Response Versions of pIC50 for Screens in the p2d or c2d Data Sets screen

r (p2d)

RMSE (p2d)

r (c2d)

RMSE (c2d)

1 2 3 4 5 6 7 8 9 10

0.656 0.627 0.301 0.544 0.229 0.650 0.692 0.527 0.778 0.503

0.419 0.314 0.417 0.43 0.505 0.415 0.32 0.298 0.407 0.371

0.473 0.620 0.351

0.483 0.306 0.406

0.265

0.501

0.731 0.509 0.813 0.487

0.306 0.291 0.368 0.382

Correlation between LE, LLE, and LLEAT, and Their Single-Concentration Counterparts (scLE, scLLE, and scLLEAT, Respectively) for Hits with Dose−Response Curves. For the statistical hits that gave a dose−response curve, it is possible to go back to the corresponding singleconcentration response and calculate the scLE, scLLE, and scLLEAT as described above. By calculating their Pearson’s correlation coefficient with the dose−response ligand efficiencies, a very high degree of correlation is obtained. For instance, for the LE vs scLE in the p2d data set an average Pearson’s correlation coefficient of 0.922 (with standard deviation, SD = 0.012) was observed. Table 3 collects the mean and SD of the

Figure 2. Scatter plots for scLE vs LE for the 10 different screens in the p2d data sets.

Table 3. Average and SD of Pearson’s Correlation Coefficient r between LE, LLE, and LLEAT and Their SingleConcentration Counterparts (scLE, scLLE, and scLLEAT) over All the Screens in the p2d or c2d Data Setsa correlation LE vs scLE LLE vs scLLE LLEAT vs scLLEAT a

mean r (p2d)

SD r (p2d)

mean r (c2d)

SD r (c2d)

0.922 0.941 0.955

0.026 0.017 0.012

0.918 0.938 0.954

0.031 0.022 0.016

than the one obtained with nonrandomized responses. This indicates that there is a statistically significant (p < 0.001) effect of the single-concentration biological activity in the observed ligand efficiency correlations. Association between LE, LLE, and LLEAT and Their Single-Concentration Counterparts (scLE, scLLE, and scLLEAT, Respectively) for All Statistical Hits. We have seen the very good correlation between single-concentration ligand efficiency indexes and their dose−response counterparts for confirmed (with dose−response curve) statistical hits. However, during the screening process, a fraction of the compounds will fail to give an adjustable dose−response curve and therefore it is necessary to see the effect in the statistical association that these compounds would have. By including the compounds that do not give a dose− response curve into the analysis, it is not possible to calculate Pearson’s correlation coefficients for all the compounds since pIC50s and therefore LE, LLE, and LLEAT will not be available for the nonconfirmed compounds. In this case it is necessary to deal with classification techniques of analysis, like receiver operating characteristic (ROC) curves. In these curves, the capability of a continuous parameter to separate two classes is evaluated by plotting the true positive rate against the false positive rate while varying the continuous parameter. A perfect classifier would yield a point at coordinate (0, 1) of the ROC space (0 rate of false positives and 1 rate of true positives); for a random classification, the curve would be a diagonal. In order to systematically compare ROC curves, it is customary to summarize them into a single number by the area under them (area under the curve (AUC)): a perfect classifier would have an AUC = 1, while a random classification would give an AUC = 0.5.

For only compounds giving dose−response curves.

correlation coefficients for all the pairs of ligand efficiency parameters in both the p2d and c2d data sets; Figure 2 displays the corresponding LE correlation plots for all the screens in the p2d data sets. Ligand efficiency indexes are functions of both biological activity (pIC50) and chemical descriptors (number of heavy atoms for LE, clogP for LLE, and number of heavy atoms and clogP for LLEAT,; see eqs 1−3). Since the chemical descriptors are the same between pairs of dose−response vs singleconcentration ligand efficiency indexes, it is expected that they will provide a source of the observed correlation. Therefore, additional tests were applied to determine if the correlation in biological activity (dose−response vs single-concentration pIC50) has a significant influence on the ligand efficiency correlation also. For that purpose, a permutation test was run where the responses were randomized 1000 times, the scLE, scLLE, and scLLEAT were recomputed, and the Pearson’s correlation coefficient was recalculated and compared with the original correlation coefficients; in all the 1000 repetitions, for all the screens, ligand efficiency indexes and data sets, the correlation coefficient of the randomized responses was smaller D

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

number of nonconfirming compounds. The classifiers are normally denoted in a similar way as the academic letter grades from their AUC:15

In our case, the continuous parameter would be the singleconcentration ligand efficiency indexes, and the two classes would be defined for each dose−response ligand efficiency index by applying thresholds based on common criteria in the literature. In this way, for LE, a compound was deemed “positive” if its LE > 0.3; and “negative” if its LE ≤ 0.3. The same threshold of 0.3 was used for LLEAT as this parameter is scaled similarly as LE. In the case of LE, the common criteria for deeming a compound as efficient in a lead optimization context is LLE > 5; however, it was observed that if using this threshold, especially for some screens, the number of efficient compounds would be very small as might be expected with lower potency screening hits, so it was decided to use a threshold of 3 instead. Compounds not giving a dose−response curve or not confirming in the confirmation screen were included in the negative class since their biological activity would be very low, resulting in low ligand efficiency indexes. The mean and SD of the AUC of the ROC curves calculated for each ligand efficiency parameter and data set type over the different screens are collected in Table 4.

• • • • •

LE vs scLE LLE vs scLLE LLEAT vs scLLEAT a

mean AUC (p2d)

SD AUC (p2d)

mean AUC (c2d)

SD AUC (c2d)

0.891 0.879 0.897

0.021 0.031 0.028

0.928 0.950 0.960

0.036 0.029 0.029

= = = = =

A (outstanding) B (excellent/good) C (acceptable/fair) D (poor) F (no discrimination)

Therefore, in our case the predictive power is in the good/ excellent-to-outstanding range. Similarly as with the correlation coefficients with confirmed compounds, it is again required to see if this predictive power contains a significant contribution of the biological activity, besides the chemical descriptors contribution. In this way, a permutation test was run where the responses were randomized 1000 times for all the screens, ligand efficiency indexes, and data sets, and the AUC values were recalculated and compared with the nonrandomized values. In all the cases, the biological activity had a statistically significant (p < 0.01) contribution to the observed association between single-concentration and dose−response ligand efficiency indexes. Ranking and Selection of Compounds Using scLE, scLLE, and scLLEAT.: Cost/Benefit Analysis. It is possible to see the effect that using single-concentration ligand efficiency data as ranking parameter would have on the prioritization and selection of compounds, both in terms of how many ligand efficient compounds will be retrieved (“benefit”), and how many of all the hits need to be tested in order to retrieve these efficient compounds (“cost”). The fraction of efficient compounds “rescued” by varying the threshold of these parameters was calculated; by this it is meant the fraction of efficient compounds that are selected by a given threshold. Figure 3 (upper row) displays the average fraction and 95% confidence intervals of efficient compounds after aggregating the curves over all the screens for the p2d data sets. The figure displays the plots for scLE (left), scLLE (center), and scLLEAT

Table 4. Average and SD of AUC of ROC Curves for Classification of LE, LLE, and LLEAT by Their SingleConcentration Counterparts (scLE, scLLE, and scLLEAT, Respectively), in Both p2d or c2d Data Setsa association

0.9−1.0 0.8−0.9 0.7−0.8 0.6−0.7 0.5−0.6

For all statistical hits.

These AUC values show the good classification capacity of scLE, scLLE, and scLLEAT, in spite of the presence of a large

Figure 3. Cost/benefit curves in p2d data sets. (upper row) Average fraction of efficient compounds rescued at varying thresholds of scLE (left), scLLE (center), and scLLEAT (right). Shaded gray corresponds to the 95% confidence intervals. (lower row) Average fraction of total hits at varying thresholds of scLE (left), scLLE (center), and scLLEAT (right). Shaded gray corresponds to the 95% confidence intevals. E

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 4. Cost/benefit curves in c2d data sets. (upper row) Average fraction of efficient compounds rescued at varying thresholds of scLE (left), scLLE (center), and scLLEAT (right). Shaded gray corresponds to the 95% confidence intervals. (lower row) Average fraction of total hits at varying thresholds of scLE (left), scLLE (center), and scLLEAT (right). Shaded gray corresponds to the 95% confidence intervals.

Example of Practical Implementation with Clustering. In the previous section we have analyzed the simplest approach to use the single-concentration ligand efficiency indexes, which is to rank the compounds by the index and retrieve the top ones above a cutoff determined by a cost/benefit criterion. It is however possible to use more sophisticated selections that include other factors, like chemical diversity. For instance, one approach to select reduced and diverse lists of hits to screen would be to cluster the hit list, rank each cluster by a singleconcentration ligand efficiency index in descending order, and selecting from each cluster the n top-ranked compounds, where n is selected to fit the number of compounds that can be tested. In this section we analyze this approach, as an example of implementation with clustering. The c2d data sets were clustered screen by screen as described in Materials and Methods. Each cluster was ranked by scLE in descending order. Compounds with a scLE < 0.25 were disregarded, as they contain potentially a very small fraction of efficient (LE > 0.3) compounds (see previous section). Figure 5a, upper panel, shows the percentage of efficient compounds that would be rescued by selecting the top 1, 2, 3,..., 10 compounds in each cluster. Figure 5a, lower panel, shows the percentage of statistical hits that would need to be tested by selecting the corresponding top compounds. For comparison purposes, Figure 5b displays the results of a typical response/diversity-driven selection where the clusters are ranked by response in descending order and the n top compounds are screened: both the percentage of efficient compounds (upper panel) and the percentage of compounds that would need to be tested (lower panel) are shown for the top 1, 2,..., 10 compounds in each cluster. It can be seen that the scLE/diversity-driven selection rescues more efficient compounds at a much lower cost. For example, by selecting the top three compounds in each cluster we would be rescuing ∼90% of the efficient hits by testing ∼58% of the hits. In turn, the top three compounds in the response/diversity-driven selection would rescue 89% of the efficient compounds by testing ∼88% of the hits. Therefore, we

(right). A sigmoid dependence is observed in all the cases, as expected from the high degree of association with the dose− response counterpart parameters. On the lower row of Figure 3, the fraction of compounds within the whole list of hits are displayed that corresponds to each threshold for scLE (left), scLLE (center), and scLLEAT (right), in order to get an idea on the number of compounds that would need to be tested in order to achieve a given number of efficient compounds. In Figure 4, the same plots are displayed for the c2d data sets; approximately similar results are obtained there. From these plots, it is possible to see that using a threshold of 0.3 in scLE would rescue about 85% of the LE > 0.3 compounds, while using a threshold of 0.25 (red vertical line in Figures 3 and 4, upper and lower left) would rescue ∼99% of them. The fraction of total hits corresponding to these thresholds is ∼25% and ∼60%, respectively. So we would need to test the ∼25% top-ranked (in scLE) of the hits to rescue approximately 85% of the efficient (LE > 0.3) compounds, and ∼60% of the top-ranked hits to obtain about ∼99% of the efficient compounds. Similarly, using a threshold of 3 in the scLLE would retrieve ∼85% of the LLE > 3 hits, while using a threshold of 2.5 (red vertical line in Figures 3 and 4, upper and lower center) would correspond to 97% of the LLE > 3 hits. The fraction of total hits needed to test for these thresholds would be about 25% and 35%. Finally, a threshold of 0.3 in the scLLEAT parameter would rescue ∼90% of the LLEAT > 0.3, while a threshold of 0.25 would rescue ∼99% of them (red vertical line in Figures 3 and 4, upper and lower right). The corresponding fractions of the hits that would need to be tested to achieve these efficient compounds would be ∼22% and ∼37%, respectively. With Figures 3 and 4 as average cost/benefit curves it is possible to decide the final number of compounds to test, which would be a balance between the number of compounds that can be tested with the resources available (reagents, time) and how many efficient compounds we want to obtain. F

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

process as they combine potency with physicochemical descriptors counterbalanced in the same expression. As described in the Introduction, a previous example of single-concentration screening index is the PEI9 that counterbalances potency (in the form of percent inhibition) and molecular mass (in the form of molecular weight). However, in the case of the PEI no systematic statistical analysis of the association with indexes used at the dose−response phase was provided. In addition, this parameter is functionally different to the dose−response ligand efficiencies, including the BEI = pIC50/MW.9 This fact has as result a decreased alignment of the single-concentration prioritization with the well-established ligand efficiency indexes used at the dose−response phase. A desirable property of the scLE studied here is that the same functional form is used in the single-concentration and dose− response phases of the HTS. In addition, no equivalent index as PEI exist that counterbalances lipophilicity with response. Therefore, the scLLE and scLLEAT here proposed are the first attempts to an index for the prioritization of single-concentration data taking into account lipophilicity together with potency. In the case of scLLEAT, it has in addition the interesting property of combining in the same expression potency (response), lipophilicity (cLogP) and molecular size (number of heavy atoms, HA), making possible the simultaneous optimization of these three parameters by selecting the compounds with the highest scLLEAT. For instance, in Figure 6 is shown the histograms of primary responses, cLogP and number of heavy atoms for compounds with scLLEAT > 0.3 (blue) or scLLEAT ≤ 0.3 (magenta) in the p2d data set. It can be seen that the scLLEAT > 0.3 compounds have a higher biological activity (response shifted to the right) while they are more polar and smaller (clogP and number of heavy atoms shifted to the left). As one of the referees pointed out, eq 1 (having units of kilocalories per mole per heavy atom) has been criticized as a reliable measure of the free energy per heavy atom.17 However, eq 1 is widely accepted in the drug discovery process, at least as an index that counterbalances potency with the number of heavy atoms to lead the hit selection/optimization process to smaller compounds.3,18 For simplicity, in the present study the Hill slope used in eq 4 was 1. Similarly, it was assumed all the time that the concentration of compound was 10 uM. These assumptions led to eq 5. However, it is possible that in some screens the Hill slope differs significantly from 1, or for some screening settings

Figure 5. (a) scLE/diversity selection. (upper panel) Percentage of LE > 0.3 compounds rescued by selecting the top n compounds ranked by scLE in each cluster. (lower panel) Percentage of total compounds by rank in scLE per cluster. (b) Response/diversity selection. (upper panel) Percentage of LE > 0.3 compound rescued by selecting the top n compounds ranked by response in each cluster. (lower panel) Percentage of total compounds by rank in scLE per cluster.

can largely improve the efficacy of the screen and reduce its costs by using a single-concentration ligand efficiency index like scLE in the in silico triage of compounds.



DISCUSSION In silico triage of compounds from single-concentration HTS data requires the optimization of multiple parameters to achieve the best selection of hits, both in terms of potency and properties, given the available resources.16 The singleconcentration ligand efficiency indexes studied in the present work show promising capabilities as they have a good statistical association with their dose−response counterparts, which are the key parameters widely used in the post-HTS prioritization of compounds to enter chemical optimization programs. In this way, the single-concentration ligand efficiency indexes provide an appropriate alignment of the single-concentration hit triage with the dose−response hit triage and simplify the prioritization

Figure 6. Distributions of primary response, clogP, and number of heavy atoms of compounds with scLLEAT > 0.3 (blue) and scLLEAT ≤ 0.3 (magenta) in the p2d data set. G

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling a concentration of compound different to 10 uM is used. For those cases, in order to improve the quality of the ranking obtained and the effectiveness of the rescue of efficient hits, it is advisible to use eq 6, that gives the scpIC50 corresponding to a given response Resp for arbitrary concentration of compound C and Hill slope H in the general case:

scpIC50(Resp) =

⎡ −log10⎢ ⎣

(

100 Resp

H

⎤ − 1 CH⎥ ⎦

)



(6)

The pIC50 obtained from eq 6 can be used in eq 1, 2, or 3 in order to obtain the scLE, scLLE, and scLLEAT, respectively. In the present analysis, two main data sets were used: the socalled “p2d”, which correlates the dose−response pIC50 with the response in the primary screen, and the “c2d” one that correlates the pIC50 with the response of the confirmation screen. No clear trend was observed when comparing the Pearson’s correlation coefficient between dose−response and single-concentration pIC50s of the different screens (Table 2): in some cases it increased while in other cases it decreased, but not with a large difference; in all the cases a significant correlation was observed. This is not unexpected, since each screen has its noise level and there could be multiple reasons for it decreasing or increasing when moving from the primary to the confirmation phase. Nor is a clear trend observed in the case of the comparison of Pearson’s correlation coefficients between single-concentration and dose−response ligand efficiency indexes for compounds with dose−response curves (Table 3). However, when including the compounds that did not yield dose−response curves, a significantly higher AUC is observed in the c2d data set, although again all the AUCs are large (Table 4). This indicates that the nonconfirming compounds would be a source of noise for the statistical association, which anyway remains large, and with a significant contribution from biological activity in the p2d screening process; in the c2d process, most of the false positives will have been removed and this would give a boost in the association. In summary, a new approach for triaging compounds from single-concentration screening data has been proposed, that uses common ligand efficiency indexes normally applied to the dose−response phase of HTS. These indexes have shown good statistical association with the dose−response data in historical data, with contributions from both physicochemical properties and biological activity, as demonstrated by permutation tests. A systematic analysis of historical data has identified average cost/ benefit curves for these parameters with the purpose of balancing the selection of ligand efficient compounds with the associated cost of testing different numbers of hits. Moreover, these indexes can be used in combination with clustering to achieve a more effective in silico triage of statistical hits in order to generate reduced lists of diverse compounds with higher rescue of efficient hits. It is expected that this will contribute to a more efficient utilization of the HTS process.



confirmation phase or that did not give a dose-response curve; 1 are compounds that gave a dose-response curve (XLSX) c2d data sets. Similarly, column “confyn-c2d” contains a label for the confirmation status of the compound in the c2d process: 0 are compounds that did not give a doseresponse curve; 1 are compounds that gave a doseresponse curve. The other column headers are selfexplanatory (XLSX)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The author declares no competing financial interest.



ACKNOWLEDGMENTS The GSK AD&HTS scientists worldwide are acknowledged for generating the database of screens used in this work.



REFERENCES

(1) Macarron, R.; Banks, M. N.; Bojanic, D.; Burns, D. J.; Cirovic, D. A.; Garyantes, T.; Green, D. V.; Hertzberg, R. P.; Janzen, W. P.; Paslay, J. W.; Schopfer, U.; et al. Impact of High-Throughput Screening in Biomedical Research. Nat. Rev. Drug Discovery 2011, 10, 188−195. (2) Hann, M. M. Molecular Obesity, Potency and Other Addictions in Drug Discovery. MedChemComm 2011, 2, 349−355. (3) Hopkins, A. L.; Keserü, G. M.; Leeson, P. D.; Rees, D. C.; Reynolds, C. H. The Role of Ligand Efficiency Metrics in Drug Discovery. Nat. Rev. Drug Discovery 2014, 13, 105−121. (4) Hopkins, A. L.; Groom, C. R.; Alex, A. Ligand Efficiency: a Useful Metric for Lead Selection. Drug Discovery Today 2004, 9, 430−431. (5) Kuntz, I. D.; Chen, K.; Sharp, K. A.; Kollman, P. A. The Maximal Affinity of Ligands. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 9997− 10002. (6) Leeson, P. D.; Springthorpe, B. The Influence of Drug-Like Concepts on Decision-Making in Medicinal Chemistry. Nat. Rev. Drug Discovery 2007, 6, 881−890. (7) Leach, A. R.; Hann, M. M.; Burrows, J. N.; Griffen, E. J. Fragment Screening: an Introduction. Mol. BioSyst. 2006, 2, 429−446. (8) Mortenson, P. N.; Murray, C. W. Assessing the Lipophilicity of Fragments and Early Hits. J. Comput.-Aided Mol. Des. 2011, 25, 663− 667. (9) Abad-Zapatero, C.; Metz, J. T. Ligand Efficiency Indices as Guideposts for Drug Discovery. Drug Discovery Today 2005, 10, 464− 469. (10) R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. http://www.R-project.org (accessed May 1, 2016). (11) Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J. C.; Muller, M. pROC: an Open-Source Package for R and S+ to Analyze and Compare ROC curves. BMC Bioinf. 2011, 12, 77. (12) TIBCO Spotfire, version 7.6; TIBCO Software Inc.: Boston, 2016. (13) Butina, D. Unsupervised Data Base Clustering Based on Daylight’s Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J. Chem. Inf. Comp. Sci. 1999, 39, 747−750. (14) Landrum, G. RDKit: Open-Source Cheminformatics. Version 2013.03.1. http://rdkit.org/ (accessed March 2015). (15) Lantz, B. Evaluating Model Performance. In Machine Learning with R; Pack Publishing: Birmingham2013313 (16) Dahlin, J. L.; Walters, M. A. The Essential Roles of Chemistry in High-Throughput Screening Triage. Future Med. Chem. 2014, 6, 1265−1290.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00299. p2d data sets. Column “confyn-p2d” contains a label for the confirmation status of the compound in the p2d process: 0 are compounds that did not confirm in the H

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling (17) Kenny, P. W.; Leitao, A.; Montanari, C. A. Ligand Efficiency Metrics Considered Harmful. J. Comput.-Aided Mol. Des. 2014, 28, 699−710. (18) Reynolds, C. H. Ligand Efficiency Metrics: Why All the Fuss? Future Med. Chem. 2015, 7, 1363−1365.

I

DOI: 10.1021/acs.jcim.6b00299 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX