Integration of in Silico and in Vitro Tools for Scaffold Optimization

In addition, a variety of 2-D molecular descriptors were used to build the QSAR .... Accounting for ionization while considering lipophilicity in term...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/molecularpharmaceutics

Integration of in Silico and in Vitro Tools for Scaffold Optimization during Drug Discovery: Predicting P‑Glycoprotein Efflux Prashant V. Desai,*,† Geri A. Sawada,‡ Ian A. Watson,§ and Thomas J. Raub‡,† †

Computational ADME, ‡Drug Disposition, and §Advanced Analytics, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States S Supporting Information *

ABSTRACT: In silico tools are regularly utilized for designing and prioritizing compounds to address challenges related to drug metabolism and pharmacokinetics (DMPK) during the process of drug discovery. P-Glycoprotein (P-gp) is a member of the ATP-binding cassette (ABC) transporters with broad substrate specificity that plays a significant role in absorption and distribution of drugs that are P-gp substrates. As a result, screening for P-gp transport has now become routine in the drug discovery process. Typically, bidirectional permeability assays are employed to assess in vitro P-gp efflux. In this article, we use P-gp as an example to illustrate a well-validated methodology to effectively integrate in silico and in vitro tools to identify and resolve key barriers during the early stages of drug discovery. A detailed account of development and application of in silico tools such as simple guidelines based on physicochemical properties and more complex quantitative structure− activity relationship (QSAR) models is provided. The tools were developed based on structurally diverse data for more than 2000 compounds generated using a robust P-gp substrate assay over the past several years. Analysis of physicochemical properties revealed a significantly lower proportion ( 3) or nonsubstrate (NER 0.4 is considered an indicator of reasonable model performance with useful predictive power.35 Selection of the optimum QSAR modeling method. A variety of methods for building classification models were evaluated including: (a) recursive partitioning (tree), based on algorithms such as Bagging36 and random forest (RF)37 and (b) support vector machine (SVM) models, either based on descriptors (SVM-D)38,39 or fingerprints (SVM-FP).40 As a result of this systematic assessment (see details in the Supporting Information), the Bagging model was selected as the optimum model. Accordingly, for the purpose of this article, detailed results are provided only for various versions of the Bagging models, utilized for supporting design and prioritization of synthesis and experimental measurement across a variety of drug discovery projects over a period of several years at Eli Lilly and Company. Construction of the Bagging Models. An ensemble of trees was built based on a recursive partitioning approach using an in-house implementation of the Bagging method.36 A total of 100 trees were built with a maximum of 40 branches per tree and a minimum of three compounds in the terminal nodes. As described above, more than 1800 descriptors were calculated for the compounds in the training set. Subsequently, descriptors that were constant, near constant, or highly correlated were removed to arrive at a smaller set, between 800−1100, depending on the size of the training set used to build various versions described in this article. In line with the typical Bagging approach, each tree was grown using a 65% bootstrapped sample drawn randomly from the original set. Since bootstrapping involved sampling with replacement, 35% of the compounds were excluded while building each tree. The excluded compounds constitute the out-of-bag (OOB) sample. Thus, each compound is excluded in 35% of the trees. OOB samples then can be used to estimate the internal prediction performance of the model. When scoring the Bagging model, the test molecule is scored by each tree and assigned a class. Those 100 class assignments are then aggregated and an overall class assignment, together with a propensity score, reported. This score provides relative estimation of confidence or reliability of prediction.41 The score associated with a given predicted class in a binary model such as the P-gp model described here can range from 0.5−1, with 0.5 suggesting an ambiguous prediction and 1 indicating highest reliability. Four chronological versions of the model are described in this article: V1, V2, V3, and V4. Each version of the model was evaluated against the subsequent chronological test set. The 1251

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

Figure 1. Trends in physicochemical properties for P-glycoprotein substrates and nonsubstrates. Binned values of the properties are shown along the x-axis and % number of compounds on the y-axis. The bars are colored based on substrates (red) and nonsubstrates (green), as determined from the manual P-gp substrate assay. The number of compounds in each bin per category are listed on the bars. (A) Passive permeability from the manual Pgp substrate assay in the presence of the P-gp inhibitor; (B) HBD count; (C) HBA count; (D) TPSA; (E) MW; (F) clogP (Chemaxon); (G) clogD at pH 7.4 (Chemaxon); (H) the most basic cpKa(Chemaxon).

training sets for each version are referred to as TR-V1 through TR-V4, and the corresponding chronological test sets as TE-V1 through TE-V4. The subsequent version of the model was built by combining the training set from the previous version and the

corresponding chronological test set. Thus, model V4 is the most recent version described in this article, and it was trained using data from TR-V3 plus TE-V3 and evaluated using the chronological test set TE-V4. 1252

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics



Article

RESULTS AND DISCUSSION Reproducibility of the Manual P-gp Substrate Assay and Interpretation of the Results. To understand the influence of inherent experimental variability on the model and setting realistic expectations of predictive capabilities, it is important to analyze reproducibility of the data and factors that may influence the outcome. Of 2300 compounds tested in the assay over more than 8 years, 102 compounds were run more than once. Of these, 90 compounds had consistent outcomes across multiple runs in terms of the P-gp efflux class (based on the NER cutoff of 3). For the remaining 12 compounds, although their P-gp efflux class was not consistent across multiple runs, the NERs were very close to the cutoff of 3 used to assign the class (ranging between 2 and 4 in most cases). To ensure that only robust and unambiguous data are used for in silico model building and analyses, the following types of compounds were excluded from the data set. Average incidences of each type are listed in parentheses: compounds with NER values between 2.8 and 3.2 (2%); % cell values of >90% in the presence of the P-gp inhibitor (60% inhibition at 5 μM (potential false negatives, 2%) (see the Supporting Information for a description of the P-gp inhibition assay); compounds with very slow passive permeability (8, more than 65% were P-gp substrates, while the proportion of substrates was less than 25% among ∼400 compounds with the most basic cpKa 0.6, unless stated otherwise. The corresponding model applicability was between 79−81% for the various chronological test sets across all versions of the models. Descriptors Incorporated into the QSAR Model. As is typical for Bagging models with such training set sizes, 800− 1100 descriptors were utilized across the 100 decision trees. Given the complexity of this model, it is not straightforward to interpret the significance of these descriptors for their ability to discriminate the propensity for P-gp efflux across the entire data set. However, considering the descriptors that are used most frequently across the collection of trees would provide some clue toward their potential impact on the model. Consequently, the following three descriptors were utilized across more than 50 trees in all versions of the model. Abraham’s H-bond basicity30 descriptor (sum of HBA strength of a molecule), calculated by a fragment-based method31 implemented inhouse, was utilized across more than 90% of the trees for all models. On average, it was used twice per each tree. This suggests significant impact of HBA strength toward dialing in Pgp efflux across a large set of diverse data used to build these models. This is consistent with the observed relationship with HBA count for this data set as described above. More importantly, this highlights the role of not just the number of HBA but further refines it to the strength of potential H-bonds formed with corresponding donors in P-gp.44 The next two most frequently used descriptors were HBD count (when 2 or higher) and TPSA. Once again, this was not surprising given the overall trends with these parameters described above. Evaluation of the QSAR Model Performance. Crossvalidation is a standard technique used to assess the predictive performance of QSAR models.45 These predictions provide internal validation of the model based on test sets derived from the original data. Most experts in the QSAR community believe that this type of validation often overestimates a model’s ability to predict a true external test set, which is not involved in any form during the model building process.46,47 One reason for this overestimation suggests that most QSAR models utilize data from the entire training set to optimize the selection of descriptors and thus are already influenced by the so-called leftout test set.45 For the Bagging method described in this article, OOB predictions can be considered a variation of crossvalidation (see Experimental section for details) results. However, a key difference is that the OOB predictions are not used to optimize the selection of descriptors or any other model-specific parameters. It also is obvious that the OOB

accuracy of the model. However, this would also result in a decrease in the overall applicability of the model since predictions with scores below the selected cutoff would not be included in the assessment and labeled as indeterminate. Therefore, a systematic analysis of the OOB predictions (see Experimental Section for details) was conducted to find an optimum balance between prediction accuracy and applicability of the models. Results for model V4 are described here to illustrate this strategy. Figure 2A shows the distribution of OOB

Figure 2. Relationship between the prediction scores and accuracy of P-glycoprotein efflux prediction. Binned prediction scores are listed on the x-axis and the number of compounds as % on the y-axis. Accurate predictions are colored blue, while inaccurate ones are in orange. The number of compounds in each bin per category are listed on the bars. The data are shown for model V4. (A) predictions for the OOB compounds from the training set (TR-V4); (B) predictions for the chronological test set (TE-V4).

prediction scores associated with the given prediction class, substrate or nonsubstrate, for the compounds from the training set TR-V4 used to build model V4. It is apparent that with increasing scores the overall prediction accuracy also increases almost proportionally. For example, for the set of 663 compounds with scores >0.9, the accuracy of prediction was 96% (636 accurate predictions out of 663). However, the applicability of the prediction was significantly decreased to only 34% (663 out of 1945). On the basis of the desired application of the model during early stages of drug discovery projects, it was decided to set the score cutoff for accepting a given prediction such that the overall prediction accuracy and model applicability would be close to 80% or above, while also 1254

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

compounds are likely to be structurally similar to many of the training set compounds since these are drawn from the same original set. Thus, we hypothesized that while the OOB predictions could provide an independent assessment of a Bagging model’s predictive capabilities compared to a typical cross-validation, it still may result an overestimation. To test this hypothesis, we compared the performance of the P-gp models based on OOB predictions for the corresponding chronological test sets. For the V4 version of the model, an overall prediction accuracy of 88% and a kappa value of 0.76 were observed for OOB compounds. The corresponding performance for the chronological test set, TE-V4, was 80% and 0.56, respectively. Thus, the model’s performance was overestimated by OOB predictions, albeit not markedly. It is interesting that the difference in model performance between OOB and chronological test sets was consistent for all other versions of the model. We concluded that OOB predictions serve as a reasonable indicator to compare the relative predictive power of the Bagging models for a given end point. To assess the true predictive performance of a model, it is necessary to evaluate the model against an external test set. A range of approaches has been proposed to do this. A simple method includes random selection of a test set from the existing training set that is set aside for evaluation after the final model is built.48 Others have utilized a relatively complex method like sphere-exclusion that ensures that the test set compounds are close to the training set compounds in the multidimensional descriptor space.49 In either case, it is most likely that the chemical space of these test sets overlaps that of the training sets resulting in an overly optimistic estimation of model performance.50 When the QSAR models are applied in the pharmaceutical drug discovery settings, the sole factor that determines the chemical space of the chronological test set is the nature and needs of the ongoing projects. Of course, there is always a chance that some of the projects using a given assay may have prior compounds from the same chemical scaffold included in the model training set, but there is no conscious effort to ensure that this happens. In our assessment of the P-gp efflux database in real time, we observed that >50% of the chemical scaffolds in the test set had no prior representation in the training set. Numerous combinations of various QSAR modeling methods have been published for P-gp, such as linear discriminant analysis (LDA),51 kNN,52 support vector machine (SVM),52−54 partial least-squares (PLS),55 and decision tree,52 and a range of descriptor sets have been utilized. Performance evaluation has been typically reported for one version of a given model based on one test set (prospective or internal). Thus, depending on a given combination of the training test set used for model evaluation, the optimum combination of method descriptors can vary. We therefore conducted a systematic assessment of four consecutive versions of the Bagging models for P-gp efflux, each involving an increasing number of compounds in the training set and evaluated against a different set of compounds in the corresponding chronological test set. Table 1 summarizes the results of this evaluation. We observed that both the kappa and the overall accuracy values were consistent for all versions, except model V1, when considering the performance of each version based on the chronological test set immediately following that version (for example, prediction of TE-V4 by model V4). Thus, for versions V2−V4, the kappa values were between 0.5−0.6 with an overall accuracy of 80%. The relatively lower values for model V1 are likely due to limited chemical space coverage with the smallest

Table 1. Chronological Evaluation of the QSAR Classification Model to Predict P-Glycoprotein Effluxa model V1 V2 V3 V4

(836) (1223) (1527) (1945)

TE-V1 (407)

TE-V2 (302)

TE-V3 (433)

TE-V4 (466)

0.39, 71% NAb NA NA

0.48, 75% 0.59, 80% NA NA

0.38, 76% 0.50, 79% 0.52, 80% NA

0.39, 74% 0.45, 76% 0.44, 75% 0.56, 80%

a

Kappa and overall accuracy are listed for each model evaluation. Numbers of compounds in the training sets and the test sets are listed in parentheses next to the corresponding model and test set names, respectively. bModels were only evaluated for chronological test sets. For example, in the case of TE-V1, models V2−V4 were not evaluated since the data from TE-V1 were used to train the subsequent models, and thus, it would simply be an evaluation of data fitting rather than the true predictive nature of the model.

training set of ∼800 compounds. Furthermore, analyzing the model performance for chronological test sets subsequent to the one immediately following the model, especially for models V2−V3, we observed that the performance indicators appeared to gradually decrease. For example, for model V2, the kappa values decrease from 0.59 to 0.45 and overall accuracy from 80% to 76% as we consider TE-V2 to TE-V4. For some unknown reason, model V1 did not appear to follow this trend. We also observed that for any given test set the best performance occurred for the model immediately preceding the test set. Thus, comparing the performance of models V1− V4, both the kappa and the overall accuracy values for TE-V4 increased from 0.39 to 0.56 and 74% to 80%, respectively. Given the chronological nature of these test sets, it is conceivable that the training set for the model immediately preceding a given test set would have better representation of the chemical space from that particular test set. This justifies the effort to regularly update the model with additional data as they become available to ensure better coverage of the most relevant chemical space at any given time. Irrespective of the combination of the model−test set evaluated, the kappa values generally ranged from 0.39 to 0.59 and clearly indicate that the models performed significantly better than what might be expected by chance (kappa close to zero) or complete disagreement (kappa close to −1). Similarly, an overall accuracy of prediction of >70% clearly highlights the value of such models, irrespective of the model or the test set considered, during early stages of drug discovery for design and prioritization for synthesis and testing. Finally, when applying these models prospectively in support of drug discovery teams, the two most commonly used key performance parameters are PPV and NPV. These parameters provide an estimate of accuracy of the model toward correctly predicting substrates and nonsubstrates, respectively. If the PPV and NPV of a given model are significantly different, then the decisions based on the model prediction will vary depending on whether a compound is predicted to be a substrate or nonsubstrate. For example, if the PPV is significantly higher than the NPV, and the goal is to avoid the potential liability caused by P-gp efflux, then compounds predicted to be substrates could be deprioritized with higher confidence compared to prioritization based on nonsubstrate predictions. The PPV and the NPV of the P-gp efflux models are depicted in Figure 3. In line with their overall prediction accuracy described above, the PPV and the NPV values also were consistently close to 80% across all models with the exception of a slightly lower NPV value for model V1. This indicates that 1255

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

Another reason might be the differences in the filter plate systems used, including different filter materials, to fit for purpose. While these differences were assessed and shown to be insignificant using a smaller set of standard compounds that register different aspects of permeability behavior, surveying a larger test set of compounds representing greater physicochemical diversity appears to have emphasized one or more of these putative reasons for discord. Application of Various in Silico Tools to Identify and Resolve Issues Related to P-gp Efflux. Implementation of the in Silico Tools. In silico tools are implemented through a graphical user interface (GUI) accessible to all scientists in Eli Lilly and Company. This includes a range of physicochemical properties and a variety of in silico tools such as QSAR and pharmacophore and docking models intended for pharmacological and ADME-toxicology profiling to assess structure− activity relationships (SAR) simultaneously with structure− property relationships (SPR). The end-users can import structures of existing compounds directly from the corporate database or from any virtual library. The tool also provides mechanism for the computational scientist to upload updated versions of models as needed. For the P-gp model, the output provides the predicted class as either substrate or nonsubstrate including the scores for each prediction. Moreover, an additional column listing three possible classes, e.g., substrate, nonsubstrate, or indeterminate, also is supplied for easy incorporation of assessment of the prediction reliability based on the recommended score cut-offs (see above). The indeterminate class refers to predictions with scores less than the recommended cutoff (0.6) for reliability. It is also possible to apply a different score cutoff for a specific project team if sufficient data exist to recalibrate the model reliability. Application of in Silico Tools to Identify and Overcome Challenges Related to P-gp Efflux. The QSAR model is the Tier 1 screen for assessing P-gp efflux potential during the early stages of drug discovery projects. Along with the model, the guidelines based on the trends observed between physicochemical properties and P-gp efflux also are used. In silico analyses are a valuable tool for comparing and prioritizing potential chemical scaffolds that a given project team may be interested in pursuing during the hit assessment or lead optimization stages.57 The appropriate in vitro assays are used to test a representative set of compounds to assess the hypothesized relevant risk. For example, if a majority of compounds from a given scaffold for a CNS project team are predicted to be P-gp substrates by the model, then evaluation using the manual P-gp substrate assay would be conducted sooner rather than later to connect in silico and in vitro outcomes. However, based on our experience with nonCNS programs, compounds with fast passive permeability do not appear to be significantly impacted by P-gp efflux as far as oral absorption is concerned, for most of the cases. The physicochemical property guidelines and the QSAR model constitute a complementary set of in silico tools. While the physicochemical properties provide an easy-to-interpret design tool, as illustrated in Case Study 1 below, the resolution for estimating P-gp efflux is relatively low when those parameters are close to the recommended cutoff values, e.g., TPSA close to 60 Å2. The QSAR model provides better resolution with overall accuracy close to 80%, but given the complexity of the model, it does not directly provide interpretable mechanistic understanding for the prediction. For example, the model does not provide clues related to the

Figure 3. Positive predictive value (PPV) and negative predicted value (NPV) for the P-glycoprotein QSAR models based on chronological test sets. Model versions are listed on the x-axis and the PPV and NPV values (as %) on y-axis. The number of compounds used to train each version is shown in parentheses, and the names of the chronological test sets used for predictive assessment are listed on the top.

the models are expected to provide a balanced prediction performance for both substrates and nonsubstrates. QSAR Model and the Automated P-gp Substrate Screen. Given the relatively slower throughput of the manual P-gp substrate assay and to provide a more efficient screen requiring less manpower in support of drug discovery projects, an automated assay was developed as described in the Supporting Information. The automated screen was designed simply to provide categorization of compounds as either P-gp substrates or nonsubstrates in contrast to the detailed assessment of multiple parameters influencing P-gp efflux provided by the manual assay. Manual versus automated assay validation occurred over four months where 48 compounds were tested in 188 assays runs (1 well per run). The data were fit to a classification tree using the software CART34 and the algorithm by Loh and Shih.56 There was a slight 7.4% misclassification rate for the automated assay during this validation phase, but statistically, the two assays were equivalent concerning the substrate classification. We also simultaneously evaluated the agreement between the QSAR model and the manual P-gp substrate assay for the same set of compounds. A set of ∼150 compounds was selected representing a broad range of structural and physicochemical (solubility, lipophilicity, polarity, etc.) diversity. We ensured that this set would constitute a true chronological test set for the QSAR model. The concordance between the automated screen and the manual assay and a corresponding assessment of the QSAR model toward predicting the outcome from the manual assay for the same set of compounds is provided in Table S1 (Supporting Information). The QSAR model was found to provide equal or better agreement for all evaluation parameters considered. Consequently, we decided to discontinue the automated screen and instead utilize the QSAR model as the Tier 1 screen for evaluating P-gp efflux potential during the early stages of drug discovery. The QSAR model was built using the data from the manual assay, and hence, it was expected to align better with that assay. Since these results validated this hypothesis, it helped to significantly improve the efficiency (time and cost) for prescreening compounds for testing in the manual assay. Reasons for the discord between the automated and manual assay substrate classification are speculative. One cause might be differences in the bioanalytical method where the automated assay relies on a generic liquid chromatographic method to accommodate physicochemical diversity while screening large numbers of compounds across multiple portfolio projects. 1256

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

impact of cell partitioning (proportional to concentration of substrate at the transporter binding site) and/or passive permeability on the net P-gp efflux observed in the assay. In cases where the risks related to P-gp efflux have already been realized, a virtual library would be designed based on both physicochemical guidelines and QSAR model profiling to prioritize synthesis and testing of new analogues predicted to be nonsubstrates. Such strategies are applied throughout the precandidate stages of drug discovery in Eli Lilly and Company. In our experience, the QSAR model for P-gp efflux provides reasonable coverage over a wide range of chemical space, which is expected given the diversity of compounds across the entire small molecule discovery portfolio included in the training set used to build the model. The model is quite effective toward differentiating chemical scaffolds from each other in terms of Pgp efflux potential. Yet, since the model is fitted to a diverse set of compounds, it may not be able to learn and reliably predict changes in P-gp efflux caused by minor structural changes within a given scaffold. This often is encountered during the early lead development stages of drug discovery. By this stage, the chemical space that can be tolerated for a given chemical scaffold toward maintaining a balanced profile across multiproperty space can be relatively narrow. Therefore, it may not be prudent to significantly modify molecular properties such as HBD/HBA counts or TPSA to address P-gp efflux modulation. For example, decreasing TPSA could result in an increase in microsomal turnover and/or a decrease in solubility. In such cases, one could explore the impact of decreasing HBD and/or HBA strengths of key atoms associated with P-gp efflux without having to remove them completely.44 Alternatively, it may be more fruitful to introduce an intramolecular H-bond that could effectively mask one acceptor and one donor within the molecule from putative interaction with P-gp.44 This can be achieved in some cases by designing an appropriate regioisomer. We recently described several examples from the literature where HBD/HBA strengths and intramolecular Hbonds were found to significantly influence P-gp efflux.44 Below are three examples from our in-house projects illustrating these SPR tactics. Case Study 1. P-glycoprotein efflux was identified as a key challenge to overcome for a project team engaging a CNS target (scaffold A). A majority of compounds from the initial set of hits with desirable in vitro potency against the pharmacological target were predicted to be P-gp substrates based on high TPSA values of >85 Å2 and the predictions by the QSAR model. This was confirmed by testing a representative set of compounds using the manual P-gp substrate assay. As shown in Figure 4, out of 21 compounds with TPSA >85 Å2, 18 were P-gp substrates. The team then designed, synthesized, and tested compounds spanning a range of TPSA from 100 Å2. This included compounds predicted to be P-gp substrates and nonsubstrates to assess the reliability of the model for this chemical scaffold. Compounds with TPSA 60 Å2. Because the distribution of P-gp substrates and nonsubstrates was almost equal for compounds with TPSA between 60 and 85 Å2, TPSA was not able to provide sufficient resolution to further optimize the balance between desired potency and lack of P-gp efflux, but the overall accuracy of

Figure 4. Topological polar surface area (TPSA) and P-glycoprotein efflux for compounds from scaffold A. Binned values of TPSA are shown along the x-axis and % number of compounds on the y-axis. The bars are colored based on substrates (red) and nonsubstrates (green), as determined from the manual P-gp substrate assay. The number of compounds in each bin per category are listed on the bars.

prediction for this scaffold by the QSAR model for P-gp efflux was more than 80%. Therefore, the team was able to effectively utilize the QSAR model to prioritize the design and synthesis of compounds predicted to be nonsubstrates while maintaining desired potency against the pharmacological target. As our experience with projects has grown, we now can efficiently identify key factors that most likely influence P-gp recognition for a particular scaffold such that only a few (6−10) compounds, which map the range of key physicochemical properties (e.g., TPSA, HBD. HBA) represented by the scaffold, are selected for testing in the manual P-gp substrate assay. Case Study 2. In another project seeking novel compounds for the treatment of a CNS indication (scaffold B), the presence of a basic nitrogen was critical for maintaining the desired potency against the pharmacological target. Some of the earlier compounds (compounds 1−4 in Table 2) synthesized for this project were identified to have marked NERs in the manual Pgp substrate assay, and most of these had relatively fast passive permeability exceeding 20 × 10−6 cm/s. The in vivo impact of P-gp efflux was confirmed indirectly in rats wherein such compounds exhibited limited brain exposure and 0.6 for reliable predictions for most compounds in the scaffold, it was not considered to be a primary tool for designing compounds to modulate P-gp efflux. Out of five HBA groups in this chemotype, two oxygen atoms of a sulfonamide and the piperidine nitrogen were calculated to be the strongest. Attempts to remove the sulfonamide group or replace it with an amide resulted in inactive compounds. Therefore, the team focused next on decreasing the HBA strength of the piperidine nitrogen (also aligned with its basic cpKa). A set of representative compounds from this effort are listed in Table 2. These compounds maintained the same HBD/HBA count and similar TPSA values, but a decrease in the calculated HBA strength and concomitantly a decrease in the calculated basic pKa of the piperidine nitrogen resulted in a decrease in P-gp efflux. Compound 5 had a good balance of potency against the pharmacological target and pharmacokinetic profile. It also was observed to achieve sufficient CNS 1257

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

Table 2. Key Physicochemical Parameters and P-gp Efflux Results for Representative Compounds from Scaffold B

Table 3. Key Physicochemical Parameters, QSAR Model Prediction, and P-Glycoprotein Efflux Results for Representative Compounds from Scaffold C

a

Net efflux ratio from the manual P-gp substrate assay. bThe dotted line indicates a potential hydrogen bond between the hydroxyl group and the pyridyl nitrogen atom.

a

Net efflux ratio from the manual P-gp substrate assay. bHBA strength was calculated based on the method described in this article.32 c Calculated using the model from Chemaxon.

in intramolecular H-bond with the hydroxyl group) forms of compound 10 as described by Kuhn et al.58 The closed form had reasonable energetic stability in support of this hypothesis. Intramolecular H-bonds involving such ring systems have also been observed in crystal structure databases.58 Formation of a similar intramolecular H-bond was not feasible in the case of the regioisomer compound 9 because the positioning of the tertiary propyl alcohol group meta to the pyridyl nitrogen resulted in a significantly longer distance (>4 Å) between the hydroxyl group and the pyridyl nitrogen. Fortunately, compounds 9 and 10 had similar potency against the pharmacological target; however, this may not always be the case when intramolecular H-bonds are formed.

unbound exposure with acceptable receptor occupancy in vivo to warrant its advance as the lead candidate compound. Case Study 3. Data presented in Table 3 provide an interesting example of the structure−P-gp efflux relationship observed for a homologous series of compounds from scaffold C. Most compounds from this scaffold were predicted to be nonsubstrates for P-gp efflux by the QSAR model, including compounds 7 and 8, whereas compounds 9 and 10 were predicted to be substrates. All compounds exhibited fast passive permeability of >50 × 10−6 cm/s. Comparing compounds 7 and 8, we observed that introducing an additional HBA in the form of a pyridyl nitrogen, while removing the donor from hydroxyl group by conversion into methoxy group, maintained the TPSA close to 70 Å2. Both compounds had similar profiles in the manual P-gp substrate assay including NERs of 1.2−1.3 (nonsubstrates). As expected, introduction of the pyridyl nitrogen while maintaining the hydroxyl group in compound 9 resulted in an increase in TPSA to 81 Å2, and compound 9 was measured as a P-gp substrate (NER of 9.5). Interestingly, when the tertiary alcohol group was moved to the ortho position next to the pyridyl nitrogen (compound 10), the NER decreased to 1.2 suggesting that it was not a P-gp substrate. Given that compounds 9 and 10 are regioisomers, the TPSA values were identical, and the HBA strengths (1.68−1.70) and basic cpKa (3.5−3.7) of the pyridyl nitrogen did not appear to change significantly between the two compounds. The key difference between compounds 9 and 10 was the possibility of an intramolecular H-bond between the hydroxyl group and the pyridyl nitrogen. This was further evaluated using conformational analysis to assess the relative stability of the open (no intramolecular H-bond) versus closed (pyridyl nitrogen locked



CONCLUSIONS P-Glycoprotein plays a key role in the absorption and distribution of drugs that are P-gp substrates, and hence, the identification and mitigation of issues related to P-gp efflux is one of the key challenges during the drug discovery process. A robust bidirectional P-gp substrate assay has been established to assess P-gp efflux. The data generated for >2000 compounds from this assay have been effectively utilized to develop a variety of in silico tools such as simple guidelines based on physicochemical properties and more complex QSAR models that predict the likelihood of a compound to be a P-gp substrate. The Bagging model comprising an ensemble of trees based on a recursive partitioning approach provided optimum prediction performance for prospective validation based on chronological test sets. PPV and NPV values exceeding 80% were observed. The QSAR model demonstrated better concordance than an automated P-gp substrate screen with respect to predicting the outcome of the manual P-gp substrate 1258

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

in the in vitro P-gp assay. This iterative learning cycle is employed to facilitate progression of the scaffold SAR to a clinically successful drug. The following key items based on our experience with P-gp can be widely applicable across other DMPK related challenges typically encountered during early stages of drug discovery, as it pertains to successful utilization of in silico tools. (i) It is critical to establish in silico−in vitro−in vivo connectivity to identify the most suitable tools and strategy to address various challenges of drug discovery. (ii) While simple rules based on physicochemical properties provide general guidance for identifying potential liabilities, within a given chemical scaffold more rigorous assessment including QSAR models and/or relatively complex in silico parameters may be necessary to significantly influence the property space. For example, in the case of P-gp, consideration of H-bond strength can serve as a scaffold-specific local model. (iii) Evaluation of the predictive performance of a QSAR model should be based on a true prospective test set since the traditional cross-validation assessment is known to overestimate the model’s predictive capabilities. (iv) When feasible, it is valuable to have some measure of the reliability of prediction associated with a QSAR model (e.g., score cut-offs with Bagging models) to provide a better means of prioritizing the design and synthesis of compounds. (v) It is essential to continuously enrich in silico tools, especially QSAR models, based on new data to ensure their applicability across a dynamically evolving chemical space in an industrial setting.

assay. As a result, the QSAR model is utilized as the primary (Tier 1) screen for P-gp efflux. The in silico and the in vitro tools have been effectively integrated during early stages of drug discovery to resolve P-gprelated challenges as exemplified by several case studies and summarized in Figure 5. This illustrates a typical flow scheme

Figure 5. Strategy for the integration of in silico, in vitro, and in vivo tools to identify and resolve issues related to P-glycoprotein efflux during the early stages of drug discovery. RO: receptor occupancy, BEA: brain exposure assessment.



ASSOCIATED CONTENT

S Supporting Information *

Additional methods and comparison of the automated Pglycoprotein substrate screen and the QSAR model against the outcome from the manual P-gp substrate assay. This material is available free of charge via the Internet at http://pubs.acs.org.

for projects focused on CNS targets. For a new chemical scaffold being considered for optimization, a representative set of compounds (typically 6−10) is selected for the in vitro manual P-gp substrate assay based on the physicochemical profile and predictions by the QSAR model for P-gp. It is important to select compounds mapping the extremes of the chemical space and the key physicochemical properties likely to influence P-gp efflux (for example, TPSA). At the same time, inclusion of compounds predicted to be substrates and nonsubstrates by the QSAR model is vital to ensure reliable evaluation of the in silico−in vitro correlation for a given scaffold. In cases where none of the pharmacologically active compounds are predicted to be nonsubstrates, the selection can be extended to inactive compounds, and/or additional compounds predicted to be nonsubstrates can be designed and synthesized for testing. Using the results from the in vitro assay, a smaller set of compounds is selected to understand in the vitro−in vivo correlation to assess the impact of P-gp efflux on in vivo access to the target. For this purpose, a variety of studies can be utilized, such as in vivo efficacy studies including a receptor occupancy assay or in vivo exposure that could include the use of animals where the P-gp gene is genetically deleted. The in vitro results are also analyzed to assess the reliability/applicability of various in silico tools for the given chemical scaffold, and accordingly, key in silico tools that could be employed for guiding the design and prioritization for synthesis and testing are identified. Hypotheses are generated toward mitigating potential challenges related to P-gp efflux through the application of these in silico tools. Such hypotheses are then evaluated by testing the newly synthesized compounds



AUTHOR INFORMATION

Corresponding Author

*Phone: (317) 651 4086. Fax: (317) 276 7040. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We are thankful to Stuart Morton for help with the automation of data extraction and processing and Dr. Howard Broughton for implementation of the tool to calculate H-bond strength based on a semiempirical quantum mechanical method. We also thank Dr. David Cummins and Michael Bell for the implementation of the Bagging and the Random Forest methods for QSAR models. We appreciate the valuable support from Dr. Alan Chang for statistical validation of the assays and Michael Huskin for operating the automated P-gp screen. Successful implementation of the in silico and in vitro tools would not have been possible without the active collaboration of a number of medicinal chemists from Discovery Chemistry and Research and Technology. We are also thankful to Drs. Daniel Mudra, Valentine Klimkowski, and James Stevens for their valuable suggestions on the article. 1259

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics



Article

(20) Raub, T. J.; Lutzke, B. S.; Andrus, P. K.; Sawada, G. A.; Staton, B. A. Early Preclinical Evaluation of Brain Exposure in Support of Hit Identification and Lead Optimization. In Optimizing the ″Drug-Like″ Properties of Leads in Drug Discovery; Borchardt, R. T., Kerns, E. H., Hageman, M. J., Thakker, D. R., Stevens, J. L., Eds.; Springer: New York, 2006; pp 355−410. (21) Chen, L.; Li, Y.; Yu, H.; Zhang, L.; Hou, T. Computational models for predicting substrates or inhibitors of P-glycoprotein. Drug Discovery Today 2012, 17, 343−351. (22) Didziapetris, R.; Japertas, P.; Avdeef, A.; Petrauskas, A. Classification analysis of P-glycoprotein substrate specificity. J. Drug Target 2003, 11, 391−406. (23) Gleeson, M. P. Generation of a set of simple, interpretable ADMET rules of thumb. J. Med. Chem. 2008, 51, 817−834. (24) Hitchcock, S. A. Structural modifications that alter the Pglycoprotein efflux properties of compounds. J. Med. Chem. 2012, 55, 4877−4895. (25) Mudra, D. R.; Desino, K. E.; Desai, P. V. In silico, in vitro and in situ models to assess interplay between CYP3A and P-gp. Curr. Drug Metab. 2011, 12, 750−773. (26) Chen, B.; Sheridan, R. P.; Hornak, V.; Voigt, J. H. Comparison of random forest and Pipeline Pilot Naive Bayes in prospective QSAR predictions. J. Chem. Inf. Model. 2012, 52, 792−803. (27) Dantzig, A. H.; Shepard, R. L.; Cao, J.; Law, K. L.; Ehlhardt, W. J.; Baughman, T. M.; Bumol, T. F.; Starling, J. J. Reversal of Pglycoprotein-mediated multidrug resistance by a potent cyclopropyldibenzosuberane modulator, LY335979. Cancer Res. 1996, 56, 4171−4179. (28) Ho, N. F. H.; Raub, T. J.; Burton, P. S.; Barsuhn, C. L.; Adson, A.; Audus, K.; Borchardt, R. T. Quantitative Approaches to Delineate Passive Transport Mechanisms in Cell Culture Monolayers. In Transport Processes in Pharmaceutical Systems; Gordon, G. L., Lee, P. I., Topp, E., M,, Eds.; Marcel Dekker: New York, 2000; pp 219−316. (29) Ertl, P.; Rohde, B.; Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43, 3714−3717. (30) Abraham, M. H. Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes. Chem. Soc. Rev. 1993, 22, 73−83. (31) Japertas, P.; Didziapetris, R.; Petrauskas, A. Fragmental methods in the design of new compounds. applications of the advanced algorithm builder. Quant. Struct.-Act. Relat. 2002, 21, 23−37. (32) Gancia, E.; Montana, J. G.; Manallack, D. T. Theoretical hydrogen bonding parameters for drug design. J. Mol. Graphics Modell. 2001, 19, 349−362. (33) Zhou, W.; Dai, Z.; Chen, Y.; Wang, H.; Yuan, Z. Highdimensional descriptor selection and computational QSAR modeling for antitumor activity of ARC-111 analogues based on support vector regression (SVR). Int. J. Mol. Sci. 2012, 13, 1161−1172. (34) Lee, P. H.; Cucurull-Sanchez, L.; Lu, J.; Du, Y. J. Development of in silico models for human liver microsomal stability. J. Comput.Aided Mol. Des. 2007, 21, 665−673. (35) Hu, Y.; Unwalla, R.; Denny, R. A.; Bikker, J.; Di, L.; Humblet, C. Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability. J. Comput.-Aided Mol. Des. 2010, 24, 23−35. (36) Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123− 140. (37) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947−1958. (38) Ovidiu, I. Applications of Support Vector Machines in Chemistry. In Reviews in Computational Chemistry; Lipkowitz, K. B., Cundari, T. R., Eds.; Wiley-VCH: Weinheim, Germany, 2007; Vol. 23, pp 291−400. (39) Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: New York, 2000.

REFERENCES

(1) Juliano, R. L.; Ling, V. A surface glycoprotein modulating drug permeability in Chinese hamster ovary cell mutants. Biochim. Biophys. Acta 1976, 455, 152−162. (2) Shen, D. W.; Fojo, A.; Chin, J. E.; Roninson, I. B.; Richert, N.; Pastan, I.; Gottesman, M. M. Human multidrug-resistant cell lines: increased mdr1 expression can precede gene amplification. Science 1986, 232, 643−645. (3) Ueda, K.; Cornwell, M. M.; Gottesman, M. M.; Pastan, I.; Roninson, I. B.; Ling, V.; Riordan, J. R. The mdr1 gene, responsible for multidrug-resistance, codes for P-glycoprotein. Biochem. Biophys. Res. Commun. 1986, 141, 956−962. (4) Hamada, H.; Tsuruo, T. Purification of the 170- to 180-kilodalton membrane glycoprotein associated with multidrug resistance. 170- to 180-kilodalton membrane glycoprotein is an ATPase. J. Biol. Chem. 1988, 263, 1454−1458. (5) Horio, M.; Gottesman, M. M.; Pastan, I. ATP-dependent transport of vinblastine in vesicles from human multidrug-resistant cells. Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 3580−3584. (6) Aller, S. G.; Yu, J.; Ward, A.; Weng, Y.; Chittaboina, S.; Zhuo, R.; Harrell, P. M.; Trinh, Y. T.; Zhang, Q.; Urbatsch, I. L.; Chang, G. Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science 2009, 323, 1718−1722. (7) Seelig, A. A general pattern for substrate recognition by Pglycoprotein. Eur. J. Biochem. 1998, 251, 252−261. (8) Sugawara, I.; Kataoka, I.; Morishita, Y.; Hamada, H.; Tsuruo, T.; Itoyama, S.; Mori, S. Tissue distribution of P-glycoprotein encoded by a multidrug-resistant gene as revealed by a monoclonal antibody, MRK 16. Cancer Res. 1988, 48, 1926−1929. (9) Thiebaut, F.; Tsuruo, T.; Hamada, H.; Gottesman, M. M.; Pastan, I.; Willingham, M. C. Cellular localization of the multidrug-resistance gene product P-glycoprotein in normal human tissues. Proc. Natl. Acad. Sci. U.S.A. 1987, 84, 7735−7738. (10) Cordon-Cardo, C.; O’Brien, J. P.; Casals, D.; Rittman-Grauer, L.; Biedler, J. L.; Melamed, M. R.; Bertino, J. R. Multidrug-resistance gene (P-glycoprotein) is expressed by endothelial cells at blood-brain barrier sites. Proc. Natl. Acad. Sci. U.S.A. 1989, 86, 695−698. (11) Greenwood, J. Characterization of a rat retinal endothelial cell culture and the expression of P-glycoprotein in brain and retinal endothelium in vitro. J. Neuroimmunol. 1992, 39, 123−132. (12) Garberg, P.; Ball, M.; Borg, N.; Cecchelli, R.; Fenart, L.; Hurst, R. D.; Lindmark, T.; Mabondzo, A.; Nilsson, J. E.; Raub, T. J.; Stanimirovic, D.; Terasaki, T.; Oberg, J. O.; Osterberg, T. In vitro models for the blood-brain barrier. Toxicol. in Vitro 2005, 19, 299− 334. (13) Hochman, J. H.; Yamazaki, M.; Ohe, T.; Lin, J. H. Evaluation of drug interactions with P-glycoprotein in drug discovery: in vitro assessment of the potential for drug-drug interactions with Pglycoprotein. Curr. Drug Metab. 2002, 3, 257−273. (14) Polli, J. W.; Wring, S. A.; Humphreys, J. E.; Huang, L.; Morgan, J. B.; Webster, L. O.; Serabjit-Singh, C. S. Rational use of in vitro Pglycoprotein assays in drug discovery. J. Pharmacol. Exp. Ther. 2001, 299, 620−628. (15) Raub, T. J. P-glycoprotein recognition of substrates and circumvention through rational drug design. Mol. Pharmaceutics 2006, 3, 3−25. (16) Schwab, D.; Fischer, H.; Tabatabaei, A.; Poli, S.; Huwyler, J. Comparison of in vitro P-glycoprotein screening assays: recommendations for their use in drug discovery. J. Med. Chem. 2003, 46, 1716− 1725. (17) Horio, M.; Chin, K. V.; Currier, S. J.; Goldenberg, S.; Williams, C.; Pastan, I.; Gottesman, M. M.; Handler, J. Transepithelial transport of drugs by the multidrug transporter in cultured Madin-Darby canine kidney cell epithelia. J. Biol. Chem. 1989, 264, 14880−14884. (18) Borst, P.; Schinkel, A. H. What have we learnt thus far from mice with disrupted P-glycoprotein genes? Eur. J. Cancer 1996, 32A, 985−990. (19) Lin, J. H.; Yamazaki, M. Clinical relevance of P-glycoprotein in drug therapy. Drug. Metab. Rev. 2003, 35, 417−454. 1260

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261

Molecular Pharmaceutics

Article

(40) Erickson, J. A.; Mader, M. M.; Watson, I. A.; Webster, Y. W.; Higgs, R. E.; Bell, M. A.; Vieth, M. Structure-guided expansion of kinase fragment libraries driven by support vector machine models. Biochim. Biophys. Acta 2010, 1804, 642−652. (41) Susnow, R. G.; Dixon, S. L. Use of robust classification techniques for the prediction of human cytochrome P450 2D6 inhibition. J. Chem. Inf. Comput. Sci. 2003, 43, 1308−1315. (42) Aanismaa, P.; Gatlik-Landwojtowicz, E.; Seelig, A. Pglycoprotein senses its substrates and the lateral membrane packing density: consequences for the catalytic cycle. Biochemistry 2008, 47, 10197−10207. (43) Li-Blatter, X.; Seelig, A. Exploring the P-glycoprotein binding cavity with polyoxyethylene alkyl ethers. Biophys. J. 2010, 99, 3589− 3598. (44) Desai, P. V.; Raub, T. J.; Blanco, M.-J. How hydrogen bonds impact P-glycoprotein transport and permeability. Bioorg. Med. Chem. Lett. 2012, 22, 6540−6548. (45) Gramatica, P. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 2007, 26, 694−701. (46) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphic Modell. 2002, 20, 269−276. (47) Kubinyi, H.; Hamprecht, F. A.; Mietzner, T. Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. J. Med. Chem. 1998, 41, 2553−2564. (48) Chen, L.; Li, Y.; Zhao, Q.; Peng, H.; Hou, T. ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. Mol. Pharmaceutics 2011, 8, 889−900. (49) Golbraikh, A.; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aided Mol. Des. 2002, 16, 357−369. (50) Sheridan, R. P.; Feuston, B. P.; Maiorov, V. N.; Kearsley, S. K. Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J. Chem. Inf. Comput. Sci. 2004, 44, 1912−1928. (51) Gombar, V. K.; Polli, J. W.; Humphreys, J. E.; Wring, S. A.; Serabjit-Singh, C. S. Predicting P-glycoprotein substrates by a quantitative structure-activity relationship model. J. Pharm. Sci. 2004, 93, 957−968. (52) de Cerqueira Lima, P.; Golbraikh, A.; Oloff, S.; Xiao, Y.; Tropsha, A. Combinatorial QSAR modeling of P-glycoprotein substrates. J. Chem. Inf. Model. 2006, 46, 1245−1254. (53) Huang, J.; Ma, G.; Muhammad, I.; Cheng, Y. Identifying Pglycoprotein substrates using a support vector machine optimized by a particle swarm. J. Chem. Inf. Model. 2007, 47, 1638−1647. (54) Xue, Y.; Yap, C. W.; Sun, L. Z.; Cao, Z. W.; Wang, J. F.; Chen, Y. Z. Prediction of P-glycoprotein substrates by a support vector machine approach. J. Chem. Inf. Comput. Sci. 2004, 44, 1497−1505. (55) Cianchetta, G.; Singleton, R. W.; Zhang, M.; Wildgoose, M.; Giesing, D.; Fravolini, A.; Cruciani, G.; Vaz, R. J. A pharmacophore hypothesis for P-glycoprotein substrate recognition using GRINDbased 3D-QSAR. J. Med. Chem. 2005, 48, 2927−2935. (56) Loh, W. Y.; Shih, Y. S. Split selection methods for classification trees. Stat. Sin. 1997, 7, 815−840. (57) Cramer, J. W.; Mattioni, B. E.; Savin, K. A. Strategies for conducting ADME studies during lead generation in the drug discovery process. IDrugs 2010, 13, 857. (58) Kuhn, B.; Mohr, P.; Stahl, M. Intramolecular hydrogen bonding in medicinal chemistry. J. Med. Chem. 2010, 53, 2601−2611.

1261

dx.doi.org/10.1021/mp300555n | Mol. Pharmaceutics 2013, 10, 1249−1261