Article Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX-XXX
pubs.acs.org/jcim
Topliss Batchwise Schemes Reviewed in the Era of Open Data Reveal Significant Differences between Enzymes and Membrane Receptors Lars Richter* Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Althanstrasse 14, 1090 Vienna, Austria S Supporting Information *
ABSTRACT: In 1977, John G. Topliss introduced the Topliss Batchwise Scheme, a straightforward nonmathematical procedure to assist medicinal chemists in optimizing the substitution pattern of a phenyl ring. Despite its long period of application, a thorough validation of this method has been missing so far. Here, we address this issue by gathering 129 congeneric series from the ChEMBL database, suitable to retrospectively assess the approach. Frequency analysis of Topliss’ schemes showed that the π, Es, σ, and −σ scheme occurred in 17, 20, 6, and 4 congeneric series, respectively. We observed a significant difference of π scheme frequency in enzymes versus membrane receptors, with 12 versus only 2 occurrences. Validation of Topliss schemes in potency optimization showed a remarkable performance increase after restricting the data set to analogue series tested solely against enzymes. In this setting, the Es and the π scheme were successful in 50% and 56% of the analogue series, respectively.
■
INTRODUCTION In the drug design process, the optimization of an initial hit to a potent lead is an elaborate, time-consuming procedure. Medicinal chemists may synthesize hundreds of analogues to acquire adequate potency of compounds toward a target. To assist chemists in this costly endeavor, researchers have analyzed empirical data to infer methods to streamline the potency optimization process.1 One of the pioneers in this development was John G. Topliss. In the early 1970s, he introduced two pragmatic methods to find the optimal substitution pattern on an unfused benzene ring within a congeneric series. Similar to QSAR, Topliss’ methods attempt to infer the physicochemical driving forces, encoded by chemical descriptors, leading to high potency within a set of compounds. For this purpose, he utilized a descriptor set comprised of the substituent hydrophobicity constant π, the Hammett substituent constant σ, and Taft’s steric factor, Es. The derived descriptor−potency relation is then utilized for prospective design. In 1972, Topliss published the first method which later became known as the Topliss Tree.2 The method is structured as a decision tree, and analogues are synthesized stepwise along this tree. Five years later, he introduced his second approach, the Topliss Batchwise Scheme (TBS).3 In © XXXX American Chemical Society
contrast to the Topliss Tree, TBS starts with synthesis and testing of a congeneric series of five phenyl-substituted analogues (Figure 1a), defined as the initial compound group (ICG). According to the method, the medicinal chemist then compares the retrieved ICG potency ranking order to standard ranking orders derived from ICG substituent parameters, defined as Topliss schemes (Figure 1). If the medicinal chemist finds a match between the observed ICG potency ranking order and a Topliss ranking scheme, the method proposes a new set of potentially more potent analogues. This additional set of analogues is defined as the second compound group (SCG), and its composition is determined by the inferred parameter dependency (Figure 1b) of the TBS. The scientific community has broadly discussed and applied Topliss’ methods in potency optimization. At present, according to the Web of Science Core Collection, the Topliss Tree and the Topliss Batchwise Scheme have been cited 336 and 184 times, respectively. More interestingly, although the original methods were published in 1972 and 1977, 123 of 336 and 58 of 184 citations were issued between 2000 and 2016. Received: April 4, 2017 Published: September 21, 2017 A
DOI: 10.1021/acs.jcim.7b00195 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Information and Modeling
Figure 1. Topliss Batchwise Schemes. (a) Potency ranking order of an initial group of five phenyl-substituted analogues (ICG) is compared to π, σ, and Es substituent parameter rankings, defined as Topliss schemes. (b) In the case of a match between an ICG potency ranking order and a Topliss scheme, a scheme-specific second group of compounds (SCG) (orange cells) are proposed to be synthesized and tested next.
database7 to retrieve a significant number of Topliss analogue series that fit the requirements of the TBS. The resulting data set was used to address the following two questions: First, how frequently do potency ranking orders matching Topliss schemes occur? Second, how does the method perform in potency optimization, and do we see performance differences across protein classes? Apart from those assessments, we also strived to gain structural insights into phenyl-accommodating subpockets of Topliss schemes by mining the protein data bank10 (www.rcsb.org).
These numbers clearly highlight the actuality and the utilization of these methods in the field. Despite its broad applicability, a thorough statistical assessment of the TBS in potency optimization has been missing for a long time, not least because of limited collections of relevant data sets. However, in the past decade, technological advancements in both data generation and particular data storage broadened the scope of the field.4,5 This came along with the emergence of publicly available databases relevant for medicinal chemists which allowed data-driven research outside of pharmaceutical companies.6 A remarkable example of this development is the publicly accessible ChEMBL database,7 the most advanced database for compound−target associations, incorporating more than 13 million bioactivity values for two million compounds. This new environment also allows the reevaluation of traditional medicinal chemistry tools.4 In the context of Topliss’ methods, O’Boyle et al.8 employed the matched molecular pair (MMP) formalism9 to analyze the ChEMBL database for matched molecular series. In their analysis, they also derived a data-inferred decision tree and compared it to the Topliss Tree.2 Their analysis could provide a more differentiated perspective on the method. While one branch of the Topliss Tree agreed with the data-inferred tree, the other branch showed more differences. A comparable analysis for the TBS is still been missing. In this study, we aim to analyze and validate the TBS, a method that has shaped the field of medicinal chemistry in the last 40 years. For this purpose, we mined the ChEMBL
■
RESULTS In principle, the TBS procedure can be summarized in five steps (Figure 2): (1) synthesize and test an ICG of five analogues, (2) calculate the potency ranking order within this group, and (3) compare the observed rankings with the ranking schemes provided by Topliss. In case of a match (4) select scheme-specific SCG analogues (Figure 1b) for (5) synthesis and testing. In the following, a hypothetical TBS workflow is presented using data from a study by Rose et al.11 Although the original research did not apply the Topliss method intentionally, the provided data can be used to validate the method retrospectively. In the hypothetical workflow, the ICG ranking order of a phenylurea series tested against epoxide hydrolase11 inferred a π parameter dependency (steps 1−3) (Figure 2). According to Topliss’ π scheme, lipophilic analogues carrying substituents s1−12 (Figure 1b) are proposed to the medicinal chemist for the synthesis of a SCG (step 4) for subsequent B
DOI: 10.1021/acs.jcim.7b00195 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Information and Modeling
in the same assay. Querying of the database with the RDKit PostgreSQL cartridge12 (see Methods) resulted in 129 series with measured potency values for all five analogues of the ICG (Figure 3). For these series, we also collected available potency data of analogues listed in the SCG (Figure 1b, s1−28). This step resulted in additional 488 potency values. Finally, the 129 analogue series contained 645 (129 × 5) potency values for analogues of the ICG and 488 potency values for analogues of the SCG. These data comprised the final “Topliss Assessment Data Set” (Figure 3) on which we performed all subsequent analyses. From these 129 series, the vast majority was measured against enzymes or membrane receptors, with 67 and 48 congeneric series, respectively. The other 14 series were tested against proteins from remaining target classes (ion channels, transporters, transcription factors, adhesion proteins, secreted proteins, and unclassified proteins). Frequency of Topliss Schemes. In the first place, we were interested in the overall frequency of analogue series matching the Topliss schemes (Figure 1b). Therefore, we calculated the ranking potency order of the ICG for each of the 129 congeneric series in the TAD (Figure 3). The ranking orders were then used to count the frequencies of Topliss scheme occurrences. In total 61 of the 129 series showed a potency ranking that followed at least one Topliss scheme while the remaining did not follow any scheme. It has to be taken into account that some schemes overlap partly (Figure 1a). These overlaps occur particularly in the π dominant schemes. For instance, in the phenylurea series introduced previously (Figure 2), the observed potency ranking order matches not only the π but also the 2π − π2 and the 2π − σ schemes (Figure 1a). Figure 4a presents the results of the frequency analysis. We found the π and Es scheme frequently in 17 and 20 congeneric
Figure 2. Exemplary TBS workflow. Reported experimental data can be used to validate the TBS method, retrospectively. In this example of a TBS workflow, data from an epoxide hydrolase assay, reported in the work of Rose et al., was used.11 (a) Numbers in parentheses represent steps of the TBS workflow (see text). EH, epoxide hydrolase.
testing (step 5). In this example, the potencies for four of the 12 SCG analogues were reported in the work of Rose et al. Three of them showed higher potencies in comparison to the most potent 3,4-Cl2 analogue of the ICG (step 6). Compilation of the Topliss Assessment Data Set. It requires a reasonable number of analogue series to make a general statement about the TBS method. For this reason, we queried the ChEMBL 20 database7 with its 13 million bioactivity values for congeneric series constituted of the 3,4Cl2, 4-Cl, 4-CH3, 4-OCH3, and 4-H analogues that were tested
Figure 3. Topliss Assessment Data Set (TAD). Data mining of ChEMBL 207 resulted in 129 Topliss analogue series with potency data for ICG analogues and available potency data of SCG analogues. Topliss schemes were assigned to each congeneric series according to their ICG potency ranking order (Figure 1a). Scheme-specific SCG analogues (Figure 1b) are framed in a rectangular box: ≫, SCG analogue with superior potency compared to ICG analogues;