J. Med. Chem. 1999, 42, 3251-3264
3251
New 4-Point Pharmacophore Method for Molecular Similarity and Diversity Applications: Overview of the Method and Applications, Including a Novel Approach to the Design of Combinatorial Libraries Containing Privileged Substructures Jonathan S. Mason,†,∇ Isabelle Morize,‡ Paul R. Menard,*,‡ Daniel L. Cheney,† Christopher Hulme,§ and Richard F. Labaudiniere§ Computer-Assisted Drug Design and Lead Discovery, Rhoˆ ne-Poulenc Rorer, 500 Arcola Road, Collegeville, Pennsylvania 19426-0107 Received December 11, 1998
A new 4-point pharmacophore method for molecular similarity and diversity that rapidly calculates all potential pharmacophores/pharmacophoric shapes for a molecule or a protein site is described. The method, an extension to the ChemDiverse/Chem-X software (Oxford Molecular, Oxford, England), has also been customized to enable a new internally referenced measure of pharmacophore diversity. The “privileged” substructure concept for the design of high-affinity ligands is presented, and an example of this new method is described for the design of combinatorial libraries for 7-transmembrane G-protein-coupled receptor targets, where “privileged” substructures are used as special features to internally reference the pharmacophoric shapes. Up to 7 features and 15 distance ranges are considered, giving up to 350 million potential 4-point 3D pharmacophores/molecule. The resultant pharmacophore “key” (“fingerprint”) serves as a powerful measure for diversity or similarity, calculable for both a ligand and a protein site, and provides a consistent frame of reference for comparing molecules, sets of molecules, and protein sites. Explicit “on-the-fly” conformational sampling is performed for a molecule to enable the calculation of all geometries accessible for all combinations of four features (i.e., 4-point pharmacophores) at any desired sampling resolution. For a protein site, complementary site points to groups displayed in the site are generated and all combinations of four site points are considered. In this paper we report (i) the details of our customized implementation of the method and its modification to systematically measure 4-point pharmacophores relative to a “special” substructure of interest present in the molecules under study; (ii) comparisons of 3- and 4-point pharmacophore methods, highlighting the much increased resolution of the 4-point method; (iii) applications of the 4-point potential pharmacophore descriptors as a new measure of molecular similarity and diversity and for the design of focused/biased combinatorial libraries. Introduction Pharmacophore Methods. Methods for molecular similarity and diversity that are relevant to drugreceptor interactions are needed for many computerassisted drug design (CADD) applications. In fact, the increased use of combinatorial chemistry and highthroughput screening in pharmaceutical drug discovery research requires methods for design and analysis that are able to handle rapidly large numbers of structures, often of a relatively high conformational flexibility. These design and analysis requirements led us to expand upon our previous work using 3-point pharmacophores2,3 to the use of 4-point pharmacophores and also to the development of methods that internally reference the similarity/diversity measure relative to a special feature/substructural motif of interest. While the 3-point pharmacophore method was very useful,4-6 we considered it extremely important to move ‡
Computer-Assisted Drug Design. Lead Discovery. Current address: Computer-Assisted Drug Design, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543. ∇ To whom correspondence concerning the 4-point pharmacophore method should be addressed: e-mail,
[email protected]. § †
to 4-point pharmacophores to get an increase in the amount of shape information and resolution, including the ability to distinguish chirality, a fundamental requirement for many ligand-receptor interactions. The increased resolution and the results of the ligandprotein similarity studies reported here confirm that this is indeed a major improvement, with an associated technical advance in being able to rapidly “fingerprint” a molecule, resolving up to 350 million 3D pharmacophoric shapes. The ability to do this including extensive conformational flexibility was also considered essential, especially as many of the ligands of interest and combinatorial library compounds being produced are fairly flexible. This flexibility makes inadequate the use of a single (or a small limited set of) conformer(s) to represent the accessible conformational space and thus the potential bound receptor conformation of a ligand. Earlier experiences with the development of a method that uses individual 3D queries2 led us to collaborate with Chemical Design to expand the ChemDiverse module of Chem-X1 that uses 3-point pharmacophores, to 4-point pharmacophores, retaining a relatively fast analysis (0.8 very similar, >0.4-0.5 of some significance). For both the angiotensin and endothelin cases a reasonable to good similarity is found using this dynamically weighted Tanimoto-style coefficient with 4-point pharmacophores (see underlined numbers in Figures 6 and 7). Details of the formula used and some of the exact numbers of total and common potential pharmacophores are shown in Figure 8. Using 3-point pharmacophores a reasonable level of similarity can also be found but with a much higher noise level (similarity values for compounds not expected to have similar activity). The 4-point model appears to increase the number of common potential pharmacophores more for compounds with similar activity and thus reduces the noise level. For example, with the endothelin compounds using 4-point pharmacophores, many other structures in the MDDR had very low similarity; the higher values found were generally
for compounds reported to be 7TM-GPCR ligands (such compounds may indeed have some endothelin activity if they were tested, and the similarity could represent some general properties of 7TM-GPCR ligands). The dynamic weighting was found to improve the differentiation of expected actives and nonactives, by increasing the score for compounds that match a high proportion of the “reference” compound potential pharmacophores by reducing the penalty for additional (and thus not in common) pharmacophores; such compounds thus become ranked higher than some compounds with a lower number of both common and additional potential pharmacophores. The endothelin results also show the importance of correctly identifying features through the parametrization database. In fact, the similarity of the RPR and MDDR compounds is dependent upon the acyl sulfonamide of the MDDR compound being identified as an acid (a group that is very different in 2D similarity to a carboxylic acid); removing that assignment reduces the 3D pharmacophoric similarity significantly (by 0.15). 2. Ligand-Receptor Site Comparison. The 4-point pharmacophore method can also be used as a means to measure similarity when comparing ligands to their binding site targets. The pharmacophore key calculated from a ligand can be compared to the pharmacophore key of its target binding site, and as previously done between ligands, a similarity index can be calculated. The pharmacophore key for the site is generated from complementary site points, as discussed earlier. An example of this come from studies on three closely related serine proteases: trypsin, thrombin, and factor Xa. Site points were manually located on the basis of GRID-generated energy contour maps using the chemical probes sp2CH as hydrophobe, amide NH as hydrogen bond donor, carbonyl oxygen as hydrogen bond acceptor, anionic carboxylate group as acid, and cationic amidine
3258
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17
Mason et al.
Figure 9. Number of potential 3- and 4-point pharmacophores calculated on the basis of complementary site points inserted into the active sites of thrombin, factor Xa, and trypsin.
group as base (see Figure 9). 3- And 4-point pharmacophore keys were generated, and corresponding keys were also generated using full conformational flexibility for two series of highly selective and potent factor Xa and thrombin inhibitors (see Table 3 and Figure 10). We were interested in whether receptor-based similarity as a function of common potential pharmacophores for each ligand/receptor pair could resolve enzyme selectivity. Using 3-point pharmacophore-based similarity, poor resolution of enzyme selectivity was observed (see Figure 10). For example, the thrombin inhibitors NAPAP, MQPA, and BM14.1248 do not have a uniformly greater number of pharmacophores in common with the pharmacophore key generated from the thrombin active site compared to the keys generated from factor Xa or trypsin. When a similar analysis is conducted using 4-point pharmacophores, enhanced resolution of enzyme selectivity is observed. Factor Xa and thrombin inhibitors exhibit greater similarity with the complementary pharmacophore keys of the factor Xa and thrombin active sites, respectively, than with the pharmacophore keys generated from the other enzymes. To evaluate this metric in a broader context of ligand-receptor similarity, the above analysis was repeated using two fibrinogen receptor antagonists taken from the MDDR (MDDR192259 and MDDR-199187). These compounds resembled trypsin-like serine protease inhibitors in terms of 2D structural features (benzamidine) but had no reported activity for this class of enzymes. Using 3-point pharmacophore profiling both molecules exhibited significant
Table 3. Enzyme Selectivity for a Set of Selective Thrombin and Factor Xa Inhibitors Resolved in Terms of Relative Numbers of Common Potential 4-Point Pharmacophores for Each Ligand/Receptor Paira number of common pharmacophores molecule MQPA NAPAP BM14.1248 RPR-118071 DX5633 MDDR-192259 MDDR-199187
Ki 19 nM vs thrombin 6 nM vs thrombin 23 nM vs thrombin 80 nM vs factor Xa 13 nM vs factor Xa
point factor Xa thrombin trypsin 3-pt 4-pt 3-pt 4-pt 3-pt 4-pt 3-pt 4-pt 3-pt 4-pt 3-pt 4-pt 3-pt 4-pt
199 191 144 210 104 104 56b 59b 32b 10b 60 2 3 0
183b 192b 168b 352b 58b 135b 79 32 23 4 57 4 5 0
128 82 99 82 31 40 49 32 16 1 48 0 1 0
a Selectivity is not resolved using 3-point multiple potential pharmacophores in a similar analysis. Inactive ligands with similar 2D structures have low similarity only at the level of 4-point multiple potential pharmacophores. b Number of pharmacophores in common with the enzyme for which the inhibitor is selective.
“similarity” against all three enzymes, while with 4-point pharmacophores the degree of similarity diminishes very substantially. Identification of Enriched/Specific Pharmacophores for an Ensemble of 7TM-GPCR Targets. As an extension to the ligand-ligand comparisons, pharma-
4-Point Method for Molecular Similarity/Diversity
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17 3259
Figure 10. Structures of selective thrombin and factor Xa inhibitors used for the enzyme selectivity studies in Table 3.
cophoric shapes of sets of molecules sharing similar activity were searched with the goal of identifying an ensemble of “specific” pharmacophores (or an ensemble enriched in particular pharmacophores) to bias the design of libraries for 7TM-GPCR targets. In particular analyses were carried out using a set of 7TM-GPCR ligands, a set of molecules known to inhibit enzymes, and a set of random compounds from the ACD with similar flexibility.14 By analyzing 4 random subsets of 200 compounds, it was possible to identify pharmacophores that were unique to, or enriched for, the 7TMGPCR ligands; the 4-point pharmacophore approach found a much higher percentage (42%) “unique” than the 3-point approach (13%).14 The advantage of the 4-point method in resolving a higher proportion of characteristic information was also shown not to be just due to the number of pharmacophores involved but also to their content; the use of 32 distance ranges led to a similar number of 3-point pharmacophores as the 4-point method with 7 ranges, but only 24% “unique” to the 7TM-GPCR set. A “4 times“ occurrence was effectively used in these studies to define a common pharmacophore, but an actual count can be used; using 3-point pharmacophores 15% of the pharmacophores occur at least 4 times more frequently in the 7TM-GPCR set than in the inhibitor set, decreasing to 2% with a count ratio threshold of 10. Pharmacophores Containing a “Privileged Feature” and Design of Combinatorial Libraries Enriched in Pharmacophores Found in 7TM-GPCR Ligands. We discussed above the inclusion of a special feature in the pharmacophore definition to measure the molecular pharmacophoric similarity/diversity relative to a given feature, for instance, a “privileged” substructure. The frequent occurrence of certain substructures in 7TM-GPCR ligands that are believed to be important for biological activity was also discussed. Figure 11 shows the results of searches in the MDDR for four such substructures: diphenylmethane, biphenyltetrazole, spiropiperidine, and indole. To complement our study and provide a practical basis for the design of new combinatorial libraries, we parametrized several of these ”privileged” motifs as the special feature. We then calculated only the 4-point pharmacophores that include the special feature for sets of compounds bearing the
Figure 11. Overview of the 7TM-GPCR “privileged” motifs found in MDDR (version 96.1). A ) any atom type, not in a ring; A-A bond ) single, double, or aromatic.
same privileged motif and known to act on 7TM-GPCRs. This generated a pharmacophore key where similarity/ diversity is measured in a relative sense to the “privileged” substructure and which can be used to help focus libraries on products with 3D properties similar to those of the known active compounds. In practice, to produce libraries for 7TM-GPCR screens, the 7TM-GPCR privileged substructures can be incorporated into the reagents used for a condensation reaction (e.g., the amine, acid, aldehyde, or isonitrile used in the Ugi reaction19-24), as a scaffold, and/or be formed as part of the reaction (e.g., benzodiazapine25,26). The design goal can be to mimic the 3D pharmacophores of existing active structures and/or to explore new pharmacophores to fill missing diversity of pharmacophoric shapes around the “privileged” substructures. This could be used, for example, to produce libraries for new 7TM-GPCR screens that do not have existing small molecule ligands. It is also possible to simultaneously optimize the total pharmacophoric diversity of the library. Several examples have already been published around the idea of mimicking known active molecules by using substructures present in these molecules as reagents in combinatorial synthesis.8 In our case, “privileged” motifs were also introduced as reagents to produce combinatorial libraries, but we biased the design one step further by enriching the libraries with pharmacophores found in known 7TM- GPCR ligands. To illustrate this approach, biphenyltetrazole (BPT) was chosen as the privileged motif using chemistry based on the Ugi reaction19-21 (see Figure 12). The BPT fragment was added to the parametrization database, using a centroid dummy atom to code this ”privileged” substructure as shown in Figure 4 for one of the MDDR compounds. New Chem-X procedures were written to select reagents on the basis of their contribution (at the product level) to the coverage of the 7TM-GPCR pharmacophore space and to the exploration of new diversity around the privileged substructure. In other words,
3260
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17
Figure 12. Example of Ugi chemistry with BPT incorporated as “privileged“ group at the amine position.
reagents were selected when their corresponding products maximized the number of pharmacophores in common with the ones calculated for the 7TM-GPCR MDDR subset of compounds exhibiting BPT in their structure. To avoid always covering the same pharmacophores, the selection procedure has to be dynamic: i.e., the pharmacophores from the 7TM-GPCR pharmacophore key that are covered by the products containing the selected reagents have to be removed for the next step of the selection. While optionally reagents can also be selected if they just add new pharmacophores, not especially the ones found in 7TM-GPCR ligands, in practice this was found to be more or less correlated with the selection based on the 7TM-GPCR overlap. When the number of reagents is very large, then a preselection is done to create a subset that is pharmacophorically and/or 2D structurally diverse. In such cases, the pharmacophore calculations are done with the reagent transformed to be as product-like as possible (i.e., represent reagent as it will be in the product, for example, include the core structure and some fixed choices for the other reagents). A schematic outline of the library design procedure is described in Figure 13. The first step is the 3D database building of library products corresponding to the selected chemistry, the available/preselected reagents, and the privileged motif. The pharmacophore keys for the library products are calculated using one key per reagent in the list to optimize. A dynamic ranking of these “per reagent” pharmacophore keys versus the 7TM-GPCR reference key is performed, with a recalculation of the 7TM-GPCR pharmacophoric space remaining to be covered “reference” key at each step of the selection. Once a reagent is selected the pharmacophores covered by its corresponding products are eliminated from the “reference” (to be matched) key. The next reagent is thus selected with this new recalculated reference key to avoid redundancy in pharmacophore coverage, since the objective is to maximize the coverage of the pharmacophore reference key and to minimize the redundancy between products (and libraries). Ugi Library Design Using BPT as “Privileged” Group. In the given example (see Figure 12), with BPT as “privileged” group at the amine position in the Ugi chemistry, our goal was to select approximately 20 acids from a list of 42 that were known to work well with the Ugi chemistry. As available aldehydes and isonitriles were not very diverse, it was decided to work with fixed sets of 12 aldehydes and 8 isonitriles chosen to represent the structural diversity of these two reagent categories. Consequently, a total of 4032 compounds were built and grouped per acid in 42 subsets; 42 “privileged” pharmacophore keys were then calculated and compared to the 7TM-GPCR BPT “privileged” key. This key was
Mason et al.
produced based on 502 MDDR compounds from the 7TM-GPCR list (181 compounds with data in the field ACTION, 582 related compounds, and 502 containing BPT motif in MDDR version 95.2). The 7TM-GPCR BPT “privileged” key calculated using the 4-point pharmacophore definition and 10 distance ranges contained 160801 potential pharmacophores. The first 22 ranked acid reagents covered 78373 of these pharmacophores (44K from the first reagent set, 11K new from the second; see Figure 14A) and gave a total number of pharmacophores of 261354 for the 2112 products (22 × 12 × 8). Results from the ranking procedure are given as a spreadsheet that contains the list of sorted reagents with the contribution of their products to the 7TMGPCR pharmacophoric space needed to be covered and to the total pharmacophoric space. Generally many more pharmacophores than just ones that contain a privileged substructure are exhibited; the total number of pharmacophores exhibited by the library is also calculated. Figure 14 shows typical histograms obtained for these pharmacophore contributions with this procedure. For the “new added pharmacophores” (histogram A), a large number of pharmacophores occur for the first reagent (the pharmacophores of the first reagent include all the pharmacophores that involve the core structure of the libraries, which are then no longer counted as new for subsequent reagents), followed by a rapid decrease of the number of new pharmacophores for the next reagents (the number of which may vary from one chemistry to another and also depends on the diversity of the reagents). Finally a plateau is reached meaning that very little is added by new reagents in terms of new desired pharmacophores. The second histogram B plots the total number of pharmacophores exhibited by the products for each new selected reagent, and similarly (but inversely) to histogram A after a rapid increase, a plateau is observed. One should note that using a count per pharmacophore, or the additional constraint of a coverage of other properties, would reduce the plateauing and provide further design options. After this first design stage, the typical question ‘what to do next′ has to be answered. In our case followup or related questions were asked: is it worth using BPT again as the “privileged” motif, and if yes, in which position or in which chemistry? Thus new library designs were carried out, evaluating the interest in using BPT at the acid position in the Ugi reaction, instead of in the amine. A set of 34 amines was considered, with the goal to select about 20 of them. Pharmacophores already covered by library 1 were removed from the 7TM-GPCR BPT “privileged” key, the ranking procedure applied to now identify reagents that will cover new pharmacophores not already covered by library 1 (reduce redundancy between libraries). Before finalizing the set of reagents for the library production, other parameters such as molecular weight and cLogP of the products were considered. While it is quite easy to gain new pharmacophores and to cover more and more of the 7TM-GPCR BPT pharmacophores with the first libraries, it becomes rapidly very difficult to cover the ones not covered. At this stage in order to go further an analysis of the
4-Point Method for Molecular Similarity/Diversity
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17 3261
Figure 13. Schematic outline of the library design procedure.
Figure 15. Unrepresented pharmacophores (found in 7TMGPCR BPT set from MDDR) after library production.
Figure 14. Contributions of pharmacophores per acid reagent in the order of the selection for optimization in the Ugi reaction with BPT as “privileged” motif at the amine position. Histogram A: number of new pharmacophores in the BPT 7TMGPCR pharmacophoric space added by each new selected reagent. Histogram B: increase in the total number of pharmacophores for each new selected reagent.
remaining “not covered” pharmacophore keys can be carried out, taking advantage of the pharmacophore method where each descriptor corresponds to a real 3D pharmacophoric shape that can be analyzed in a general way (e.g., number of acids and bases, what distances,...) or specific way (e.g., superposed on a molecule exhibiting
it). As an example, a “by feature” analysis performed on the remaining not covered pharmacophores after production of several libraries is shown in Figure 15 (key exported from Chem-X and analyzed using in-house programs). This indicates clearly that a large part of the missing pharmacophores (36%) contained a combination of acids and bases or a combination involving an acid (23%) or a base (22%). Even if the relative absence of such features in our produced libraries was known, it was useful to quantify it at this stage. It was also found useful to rank the MDDR 7TM-GPCR molecules containing BPT against the key of the not covered pharmacophores and visualize them. This led to the design of further Ugi libraries with reagents containing tert-butyl ester-protected acids or BOC-protected amines to produce products exhibiting free acids and bases after a deprotection stage at the end of the synthesis. These libraries added significant new and previously missing
3262
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17
Figure 16. Sum total number of 4-point pharmacophores from consecutive 14K sets of Ugi libraries designed for 7TMGPCR targets.
pharmacophores relative to earlier libraries. Figure 16 illustrates the progression of new pharmacophores explored with Ugi libraries, analyzed in sets of 14K compounds. It can be seen that it was possible to triple the total number of potential pharmacophores expressed by the first set (Lib1) with consecutive libraries (Lib2 and Lib3) that were designed to maximize the filling of not covered pharmacophores from existing libraries and included the use of many more reagents that would give free acids and bases in the products (Lib3 group particularly). Conclusion The 4-point 3D multiple potential pharmacophore methods implemented and customized as described in this paper provide a new method to measure the similarity and diversity of sets of compounds. A new multipharmacophoric similarity measure has been developed that can be applied to both ligand-ligand (e.g., for molecules required to share a similar biological activity) and ligand-receptor interactions. In particular a new “relative” measure of similarity and diversity was introduced using a subset of pharmacophoric shapes that contain a special feature, such as a “privileged” substructure. This provides a new approach to library design, and procedures and examples for 7TM-GPCR targets have been presented. A useful aspect of this method, multiple pharmacophores containing a special feature, is the ability to focus on relevant information, here being the pharmacophoric shape of the compounds relative to a specific substructure or motif. A consistent frame of reference, independent of alignment of particular conformations, is obtained for comparing molecules, databases of molecules, and complementarity to a protein site whether considering all pharmacophoric shapes or just those containing a special feature. Experimental Section Computational Studies. Calculations were performed using the Chem-X/ChemDiverse software (Jan97 and Apr97 releases) including the pre-release 4-center pharmacophore module (available for general release from the Jul97 release). Customized scripts (PCL) of Chem-X commands were used for this work. A customized parametrization file and database were used, using the special atom types described in Table 2; algorithmic perception of acidic and basic centers was deactivated, and instead fragments in the parametrization database were used to identify the relevant environments and assign atom types with the Chem-X “center” numbers for basic and acidic features (basic ) 1°,2°,3° amines, amidines, guanidines; acidic ) carboxylic, tetrazolic, acyl sulfonamides). Pharmacophore Features and Atom Types. The methods used to assign the atom types have been discussed elsewhere2,5,14 and are based on an order-dependent database of “fragments”. Either a total connectivity or a bond-order-
Mason et al. dependent (with minimum connectivity defined) substructural fragment match is used to assign atom types. The sequential use of the fragments enables atom type assignments by fragment stored later in the database to override (and see) modifications made by earlier fragments. Very specific definitions (e.g., generic case first, then special cases) and the addition of unique atom type identifiers for special groups or features as for the “privileged” substructures discussed below can thus be achieved. Hydrophobic regions are identified in this work using the bond polarity method in Chem-X to add dummy atoms, together with some specific fragments such as isopropyl that force a hydrophobic feature to be assigned, regardless of the environment. Alternatively, or additionally, hydrophobic dummy atoms can be systematically added and evaluated for a polar environment using the electrostatic potential, retaining only those in a nonpolar environment.2 The correct identification of acidic and basic groups was considered to be crucial and was done explicitly through the customized parametrization database to define specific atom types rather than by using the default algorithmic method available in Chem-X. For the work reported in this paper, acids and bases (defined as groups expected to be ionized at physiological pH) were not assigned to hydrogen bond acceptors and donors but kept separately as acidic/basic features. In other studies it could be useful to allow acids/bases to be also hydrogen bond acceptors/donors, and this is readily done by also assigning these features to the corresponding atom types for acids and bases. Distance Ranges and Pharmacophore Keys. The distances between pharmacophoric features for all the evaluated conformers are calculated exactly but stored using a binning scheme, with each distance represented by the bin into whose range it falls. A customized distance range file was used, see Table 4, with the default ChemDiverse range extended from 15 to 18-19.5 Å and new nonlinear distance ranges. Three distances are needed to characterize each 3-point pharmacophore (triangle), while there are six distances needed to describe each pharmacophore for the 4-point method (tetrahedron). To store all the combinations of features and distances, a pharmacophore “key” is used. It is a bit string where each bit represents a geometrically valid combination of features and 3D distances (see Figure 2). The geometrical validity is checked by applying the triangle inequality rule where one distance cannot be longer than the sum of the two others. This removes about one half of the theoretical combinations for the 3-point pharmacophores. One of our requirements for the extension of ChemDiverse to 4-point pharmacophores commissioned with Chemical Design was the ability to flexibly use customized distance ranges and/or combinations of features.27 On the basis of our experience with 3D database searching, and taking into account the torsional increments used in the conformational sampling, customized distance ranges were defined for both 3- and 4-point calculations. Longer distances were included, and the size of each range was varied according to a fixed percentage variation from the distance middle point value (e.g., (15%), such that larger distances had larger ranges. A maximum of 16 ranges for the 3-point method and 10 or 7 ranges for the 4-point method were normally used; this contrasts with the default ChemDiverse ranges (between 2 and 15 Å) of 15 for 4-point (1 Å interval plus additional ranges for distances less than or greater than the defined limits) and 32 for 3-point (0.1-1 Å interval). The use of larger distance bin sizes reduces the risk of failing to set a bit in the pharmacophore key because of conformational sampling limitations (e.g., the size of torsional increment for flexible bonds) and produces keys of a much more manageable size (with 7 features and 13 distance ranges the 4-point pharmacophore key size is 12MB, reducing to 3MB with 10 distance ranges and to 0.7MB with 7 distance ranges). Details of the customized ranges are given in Table 4, and Table 5 gives the total number of hypothetical pharmacophores for different combinations. The key size for 4-point keys was about 125KB/million hypothetical pharmacophores. Figure 3
4-Point Method for Molecular Similarity/Diversity
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17 3263
Table 4. Distance Range (Å) Definition for Pharmacophore Key Calculations Bin distance ranges
0
1
2
3
4
5
6
7
8
9
7 10
0-2.5 0-2.0
2.5-4.0 2.0-2.5
4.0-6.0 2.5-3.2
6.0-9.0 3.2-4.3
9.0-13.0 4.3-5.8
13.0-18.0 5.8-7.9
18.0 > 7.9-10.6
10.6-14.3
14.3-19.5
19.5 >
Table 5. Hypothetical Numbers of 3- and 4-Point Multiple Pharmacophores as a Function of the Number of Distance Ranges no. of distance ranges
3-point pharmacophores 7 features/ point
7 10 13 15 32
14K 33K 70K 107K 870K
4-point pharmacophores 7 features/ 6 features/ point point 5600K 24400K 98680K 227700K
2300K 9700K
shows actual pharmacophore counts observed for a single molecule (endothelin antagonist) using 3- and 4-point pharmacophore keys and three different distance ranges. Experience gained when using 3- and 4-point pharmacophore keys indicates that adequate resolution/differentiation is obtained using the relatively low number of 7 or 10 distance ranges, giving keys of a manageable size. Also, much larger numbers of potential pharmacophores are generated with the 4-point definition: 24 million with 10 distance ranges compared to 33000 for a 3-point definition. The resultant extra information appears to be meaningful and worth the much increased key size (binary key 375 times larger) and to have added resolution compared to just using an equivalent number of 3-point potential pharmacophores (from using more distance ranges); the studies discussed in the Identification of Enriched/Specific Pharmacophores for an Ensemble of 7TM-GPCR Targets section illustrate an example of this. Pharmacophore keys within Chem-X are compared/generated using logical operations (OR, AND, NOT), and details of the pharmacophores can be written out of Chem-X by several different ways. For all logical operations on pharmacophore keys, “tolerances” (define fit/tolerance 0/bond 0) were set to 0 so that exact values are obtained. It was found necessary to define an increased maximum memory size for the Chem-X executable when large numbers of keys were kept in memory (Unix level command: setenv CDL_CHEMX_MEMORY_SIZE 80000). Pharmacophore Quality Checks. Two optional “quality” checks could be applied, with very little increase in CPU time, to potential pharmacophores before they are added to the key. On the basis of an empirical formula, a “volume” check compares the area (3-point) or volume (4-point) of the potential pharmacophore with the heavy atom count for the molecule. This excludes pharmacophores that have a relatively small size when compared to the molecule size, for example, a pharmacophore only involving a single residue in a tetrapeptide molecule. This estimate, although very approximate, was found to be useful and was used routinely once it became supported for all 3-point pharmacophore profiling and for similarity studies with 4-point pharmacophores. An “accessibility” check that eliminates pharmacophores that are potentially inaccessible to receptor interaction based on a putative interaction site (H or lone pair) pointing within the triangle of the 3-point pharmacophore was not used in the studies reported here. Pharmacophore Count. The occurrence of each pharmacophore is set at either a binary (yes/no) or count level. This count is done at either a “molecule” level, such that there is a maximum count of one per molecule for each pharmacophore, or a “conformer” level, such that the count is incremented each time the pharmacophore is exhibited (e.g., in different conformers or several times in the same molecule). Additional pharmacophore key bits are set to count the number of occurrences, and the logical operators available to compare keys are modified accordingly; the key size is increased
significantly in size with a count, depending on the maximum stored count defined. This option was found to be particularly useful with pharmacophore keys calculated for sets of molecules sharing a common type of biological activity. Conformational Sampling. An “on-the-fly” generation of conformers was done at search time. We have used in this work an extensively modified conformational analysis command file. The default file uses a rule-based systematic analysis with 3 rotamers for single bonds, 6 for R bonds (sp2-sp3), and 2 for conjugated and double bonds, with a limit of up to 10 rotatable bonds. This was modified in terms of both the number of rotamers and the conformer generation method used. A random analysis was used for flexible molecules that would not finish a systematic analysis within a defined CPU time limit (normally 45 s on an SG R4400 250 MHz used for this work). No limit on the number of bonds was used, but a limit on time was used for random sampling (e.g., 15 s on SG R4400 250 MHz). R Bonds were generally sampled at a reduced interval of four rotamers, giving an appreciable saving in the total number of conformations to be sampled for many molecules (and thus better overall sampling for a given time period for flexible molecules). Sampling for conjugated systems was also greatly modified: different bond environments are identified through the atom types and different samplings applied. An amide bond (OdC-NH) will be kept in its starting (generally trans) conformation unless it is disubstituted, in which case the other isomer will be sampled. A conjugated bond (e.g., amide-phenyl) is sampled at four positions (e.g., (45°, (135°) if sterically feasible rather than just a single 180° twist of the starting conformation. For extended sampling, six points were used for single and R bonds, with a Tmax increased by 6-fold. CONCORD28-generated 3D structures were used, which are ideal for torsional sampling as they are consistently generated with standard bond angles (i.e., not relaxed to a particular conformer that can cause false high-energy structures for rotamers with the relaxation now being in the wrong sense). Concerning conformer rejections, we found the use of the “bump” check option which eliminates conformations with contacts closer than the “CPK” radii (3/5 VDW radii) to be more effective than to use rules. For series of 7TM ligands and enzyme inhibitors, an average of about 2200 conformations were accepted, from 13-18000 sampled, generating an average of 8200 and 3200 4-point pharmacophores, respectively. Defining the Special Feature. A parametrization fragment (e.g., a biphenyltetrazole substructure) was used to automatically add a dummy atom, defined as a dynamic centroid of atoms (substructure) of interest, with atom type 00Q, assigned “center” number 3, reserved for the special feature in our work; a parametrization fragment of the minimal bond-order-dependent type (name beginning with Z) was used to automatically add the dummy atom centroid.2,5,14 Alternatively, an existing atom type can be used if it is sufficiently unique to the group of interest. To assign an atom (e.g., in a carboxylic acid) to be the special feature required only that the feature (“center”) number reserved for this purpose is assigned to the atom type for the atom. We reserved “center” number 3 for this purpose, used by default for positively charged quatenerary nitrogen atoms in Chem-X (we grouped such atoms, type 14C, together with the basic features); all occurrences of the atom type are assigned the special feature. To define a particular substructure as the special feature required that the substructure first be in the parametrization database. The special feature was then assigned to any unique atom type or to a dynamic dummy atom defined as the centroid of the substructure of interest. Figure 4 illustrates a sample “privileged” pharmacophore for a
3264
Journal of Medicinal Chemistry, 1999, Vol. 42, No. 17
Mason et al.
biphenyltetrazole-containing compound. A customized geometry type file was read into Chem-X before key calculation in order to store only the pharmacophores containing the special feature. A mixed mode where all potential pharmacophores are kept including those not containing the special features occurs if no geometry type file is used. Combinatorial Chemistry. The solution phase Ugi libraries were produced as follows: Equal amounts (0.1 mL) of 0.1 M solutions of the four Ugi inputs (R1CHO, R2NH2, R3NC, and R4CO2H) in methanol were used to generate a theoretical 10 µmol of final Ugi product in a 96-well plate format. Reagents were transferred into a 96-well plate using a Quadra 96 (Tomtech) dispensing system in order of their participation in the Ugi reaction mechanism, specifically aldehyde first, amine second, isonitrile third, and carboxylic acid fourth. The fourcomponent condensation step was performed at room temperature with shaking overnight, and the solvent was evaporated in vacuo at 65 °C to give the desired Ugi product. The products were analyzed by LC/MS (as judged by UV 220 nm), and purity was measured by A% (area percent under HPLC peak of desired product).29 Products containing tert-butyl esters and BOC-protected amines were subsequently treated with a 10% solution of trifluoroacetic acid in dichloroethane followed by evaporation to give the desired carboxylic acid and amino functionality.
(10) Mills, J. E. J.; Dean, P. M. Three-dimensional Hydrogen-bond Geometry and Probability Information from a Crystal Survey. J. Comput.-Aided Mol. Des. 1996, 10, 607-622. (11) Through a collaboration with Chemical Design [this is becoming available as the “DiR” (Design in Receptor) module of Chem-X]. (12) Murray, C. M.; Cato, S. J. Design of Libraries to Explore Receptor Sites. J. Chem. Inf. Comput. Sci. 1999, 39, 46-50. (13) MDL Information Systems Inc., 14600 Catalina St., San Leandro, CA 94577. (14) Mason, J. S.; Pickett, S. D. Partition-based Selection. Perspect. Drug Discovery Des. 1997, 7/8, 85-114. (15) Willett, P. Similarity and Clustering in Chemical Information Systems; Research Studies Press: Letchworth, 1987. (16) Daylight Theory Manual Daylight Software 4.41; Daylight Chemical Information Systems, Inc., 27401 Los Altos, Suite 370, Mission Viejo, CA 92691. (17) Astles, P. C.; Brealey, C.; Brown, T. J.; Facchini, V.; Handscombe, C.; Harris, N. V.; McCarthy, C.; McLay, I. M.; Porter, B.; Roach, A. G.; Sargent, C.; Smith, C.; Walsh, R. J. A. Selective Endothelin A Receptor Antagonists. 3. Discovery and Structure-Activity Relationships of a Series of 4-Phenoxybutanoic Acid Derivatives. J. Med. Chem. 1998, 41, 2732-2744. (18) Hahn, M. Three-Dimensional Shape-Based Searching of Conformationally Flexible Compounds. J. Chem. Inf. Comput. Sci. 1997, 37, 80-86. (19) Ugi, I. The a-Addition of Immonium Ions and Anions to Isonitriles Accompanied by Secondary Reactions. Angew. Chem., Int. Ed. Engl. 1962, 1, 8-21. (20) Ugi, I.; Steinbruckner, C. Isonitriles. II. Reaction of Isonitriles with Carbonyl Compounds, Amines, and Hydrazoic Acid. Chem. Ber. 1961, 94, 734-742. (21) Ugi, I.; Do¨mling, A.; Ho¨rl, W. Endeavor 1994, 18, 115. (22) Hulme, C.; Morrissette, M. M.; Volz, F. A.; Burns, C. J. The Solution Phase Synthesis of Diketopiperazine Libraries via the Ugi Reaction: Novel Application of Armstrong’s Convertible Isonitrile. Tetrahedron Lett. 1998, 39, 1113-1116. (23) Hulme, C.; Peng, J.; Louridas, B.; Menard, P.; Krolikowski, P.; Kumar, N. V. Applications of N-BOC-Diamines for the Solution Phase Synthesis of Ketopiperazine Libraries Utilizing a Ugi/DeBOC/Cyclization (UDC) Strategy. Tetrahedron Lett. 1998, 39, 8047-8050. (24) Hulme, C.; Peng, J.; Morton, G.; Salvino, J. M.; Herpin, T.; Labaudiniere, R. Novel Safety-Catch Linker and its Application with a Ugi/De-BOC/Cyclization (UDC) Strategy to access Carboxylic acids, 1,4-Benzodiazepines, Diketopiperazines, Ketopiperazines and Dihydroquinoxalinones. Tetrahedron Lett. 1998, 39, 7227-7230. (25) Keating, T. A.; Armstrong, R. W. A Remarkable Two-Step Synthesis of Diverse 1,4-Benzodiazepine-2,5-diones Using the Ugi Four-Component Condensation. J. Org. Chem. 1996, 61, 8935-8939. (26) Hulme, C.; Tang, S.-Y.; Burns, C. J.; Morize, I.; Labaudiniere, R. Improved Procedure for the Solution Phase Preparation of 1,4-Benzodiazepine-2,5-dione Libraries via Armstrong’s Convertible Isonitrile and the Ugi Reaction. J. Org. Chem. 1998, 63, 8021-8023. (27) Available in the 4-center pharmacophore module of Chem-X software (see ref 1) from July 1997 release. (28) Written by Balducci, R.; McGarity, C.; Rusinko III, A.; Skell, J.; Smith, K.; Pearlman, R. S. Laboratory for Molecular Graphics and Theoretical Modeling, College of Pharmacy, University of Texas at Austin; distributed by Tripos Inc., 1699 S. Hanley Rd, Suite 303, St. Louis, MO 63144. (29) LC/MS analysis was performed using a C18 Hypersil BDS 3u 2.1 × 50 mm column with a mobile phase of 0.1% TFA in CH3CN/H2O, gradient from 10% CH3CN to 100% over 5 min; HPLC was interfaced with APCI techniques.
Acknowledgment. The authors wish to thank Stephen Pickett for useful discussions on the multiple potential pharmacophore method and also Keith Davies and Cathy Davies-White at Chemical Design for discussions and implementation of the method into the Chem-X software. References (1) Chem-X software; Oxford Molecular, Medawar Centre, Oxford Science Park, Oxford OX4 4GA, England. (2) Pickett, S. D.; Mason, J. S.; McLay, I. M. Diversity Profiling and Design Using 3D Pharmacophores: Pharmacophore-Derived Queries (PDQ). J. Chem. Inf. Comput. Sci. 1996, 36, 1214-1223. (3) Ashton, M. J.; Jaye, M. C.; Mason, J. S. New Perspectives in Lead Generation II: Evaluating Molecular Diversity. Drug Discovery Today 1996, 1, 71-78. (4) Mason, J. S. Drug Design Using Conformationally Flexible Molecules in 3D Databases. In Trends in Drug Research, Proceedings of the 9th Noordwijkerhout-Camerino Symposium, Noordwijkerhout, The Netherlands, May 23-28, 1993; Claasen, V., Ed.; Elsevier: Amsterdam, 1993; pp 147-156. (5) Mason, J. S. Experiences with Searching for Molecular Similarity in Conformationally Flexible 3D Databases. In Molecular Similarity in Drug Design; Dean, P. M., Ed.; Blackie Academic and Professional: Glasgow, 1995; pp 138-162. (6) Good, A. C.; Mason, J. S. Three-Dimensional Structure Database Searches. In Reviews in Computational Chemistry; Lipkowitz, K. B., Boyd, D. B., Eds.; VCH: New York, 1996; Vol. 7, pp 67117. (7) Evans, B. E.; et al. Methods for Drug Discovery: Development of Potent, Selective, Orally Effective Cholecystokinin Antagonists. J. Med. Chem. 1988, 31, 2235-2246. (8) Martin, E. J.; Blaney, J. M.; Siani, M. A.; Spellmeyer, D. C.; Wong, A. K.; Moos, W. H. Measuring Diversity: Experimental Design of Combinatorial Libraries for Drug Discovery. J. Med. Chem. 1995, 38, 1431-1436. (9) Molecular Discovery Limited, West Way House, Elms Parade, Oxford OX2 9LL, U.K.
JM9806998