3290
J. Med. Chem. 2005, 48, 3290-3312
Structural Requirements for Factor Xa Inhibition by 3-Oxybenzamides with Neutral P1 Substituents: Combining X-ray Crystallography, 3D-QSAR, and Tailored Scoring Functions Hans Matter,* David W. Will, Marc Nazare´, Herman Schreuder, Volker Laux, and Volkmar Wehner DI&A Chemistry, Aventis Pharma Deutschland GmbH, A Company of the Sanofi-Aventis Group, Building G 878, D-65926 Frankfurt am Main, Germany Received October 12, 2004
The design, synthesis, and structure-activity relationship of 3-oxybenzamides as potent inhibitors of the coagulation protease factor Xa are described on the basis of X-ray structures, privileged structure motifs, and SAR information. A total of six X-ray structures of fXa/inhibitor complexes led us to identify the major protein-ligand interactions. The binding mode is characterized by a lipophilic dichlorophenyl substituent interacting with Tyr228 in the protease S1 pocket, while polar parts are accommodated in S4. This alignment in combination with docking allowed derivation of 3D-QSAR models and tailored scoring functions to rationalize biological affinity and provide guidelines for optimization. The resulting models showed good correlation coefficients and predictions of external test sets. Furthermore, they correspond to binding site topologies in terms of steric, electrostatic, and hydrophobic complementarity. Two approaches to derive tailored scoring functions combining binding site and ligand information led to predictive models with acceptable predictions of the external set. Good correlations to experimental affinities were obtained for both AFMoC (adaptation of fields for molecular comparison) and the novel TScore function. The SAR information from 3D-QSAR and tailored scoring functions agrees with all experimental data and provides guidelines and reasonable activity estimations for novel fXa inhibitors. 1. Introduction There is tremendous interest in the development of new, orally active anticoagulants for the treatment and prevention of thrombotic diseases, with therapeutic advantages over pharmaceuticals such as heparin and warfarin. Thrombotic diseases, such as deep vein thrombosis and stroke, are major causes of mortality in Europe and the U.S.1 The blood coagulation serine protease factor Xa (fXa) is an attractive target because it is the central enzyme in the activation cascade of the coagulation system,2 linking the intrinsic and extrinsic pathways to the common coagulation pathway.3,4 In combination with factor Va, fXa activates prothrombin on a phospholipid surface to generate the coagulation protease thrombin.5 It is not known to be involved in processes other than hemostasis. Thrombin then converts fibrinogen to fibrin, inducing clot formation and platelet aggregation,6 which can both be related to serious pathological situations. The inhibition of fXa compared to thrombin might allow the effective control of thrombogenesis with a minimal effect upon bleeding7-11 because fXa inhibitors should affect coagulation specifically. Furthermore, inhibition of fXa is seen to be more efficacious because one fXa generates multiple thrombin molecules. Inhibition of fXa should prevent production of new thrombin without affecting the basal thrombin level necessary for primary hemostasis. Hence, it is anticipated that inhibition of factor Xa should prevent thrombus formation without compromising normal hemostasis and platelet function. Many recent * To whom correspondence should be addressed. Phone: ++49-69305-84329. Fax: ++49-69-331399. E-mail:
[email protected].
reviews12-17 and other publications18,19 demonstrate the high interest of the pharmaceutical industry in the development of novel fXa inhibitors. The mature form of factor X with a 139-residue light chain and a 303-residue heavy chain is synthesized in the liver and secreted after post-translational modifications into the blood as zymogen. It is activated by the factor VIIa/tissue factor complex in the extrinsic pathway, initiated by vascular damage or by the factor IXa/ factor VIIIa complex in the intrinsic pathway. The fXa heavy chain contains a serine protease domain in a trypsin-like closed β-barrel fold encompassing the active triad Ser195-His57-Asp102 and two essential protein subsites S1 and S4, which are often explored in structureguided drug design.20 In this publication we report the design and structureactivity relationship of a series of nonchiral 3-oxybenzamides (synthesis shown in Scheme 1) as inhibitors of fXa21 by means of X-ray crystallography, 3D-QSAR, and tailored scoring functions. These inhibitors contain neither a benzamidine nor a guanidine moiety but employ a neutral substituent for binding into the protease S1 pocket. First, the synthesis of a targeted compound library focused toward fXa led to the identification of a guanidine containing hit structure 57 (Table 1) bearing a benzoic acid scaffold and a neutral dichlorophenyl substituent. Although it was expected that this guanidine would bind to the fXa S1 pocket, X-ray structure determination showed that surprisingly the neutral dichlorophenyl substituent interacted with S1 and the guanidine with S4. This protein-ligand interaction motif was systematically explored for subsequent
10.1021/jm049187l CCC: $30.25 © 2005 American Chemical Society Published on Web 04/13/2005
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3291
Scheme 1. Synthesis of 3-Oxybenzamidesa
a Reagents and conditions: (a) MeOH, HCl; (b) 2-(2,4-dichlorophenyl)ethanol, DEAD, PPh -polystyrene, THF/room temp, 16 h; (c) (i) 3 NaOH (aq), dioxane/60 °C, 1 h, (ii) HCl/water, precipitation; (d) amine, TOTU, N-ethylmorpholine, DMF.
design using high-resolution fXa X-ray structures of derivatives, flexible docking, tailored scoring functions, and 3D-QSAR analysis. Our design rationale was based on X-ray structures of fXa in uncomplexed form22 and with inhibitors19,23-25 plus knowledge of privileged motifs directed toward the S1 pocket and structureactivity information on privileged substructures accumulated in the project. Comparative molecular field analysis (CoMFA)26-28 and comparative molecular similarity index analysis (CoMSIA)29 are used to correlate molecular property fields to biological activities based on the X-ray structures of some analogues providing the active inhibitor conformation for alignment. The superposition of all other molecules onto these templates produced consistent models in agreement with binding site requirements. The contour maps from 3D-QSAR models enhance the understanding of electrostatic, hydrophobic, and steric requirements for ligand binding, guiding the design of inhibitors to regions where structural variations reveal a correlation to biological properties. Tailored scoring functions were also derived on the basis of two approaches to establish predictive models for structure-based optimization. The difference to 3DQSAR is the incorporation of binding site information not only for alignment but also to derive descriptors for protein-ligand interactions. First, the recently introduced AFMoC (adaptation of fields for molecular comparison) approach was applied.30 Knowledge-based pair potentials are adapted to a binding site by considering ligand information. Atom-type specific interaction fields capturing binding site characteristics and ligand complementarity are correlated to biological affinities. In contrast, the in-house approach TScore31 captures protein-ligand interaction on an amino acid level. For each binding site residue and ligand, terms describing hydrogen-bonding, lipophilicity, and steric contacts produce a protein-ligand interaction profile. Their statistical analysis by correlating them to affinities led to relevant models. Hence, both approaches led to tailored scoring functions with predictive power and interpretability in agreement with X-ray structures and 3DQSAR. Some promising inhibitors of this protease emerged from this study as starting points for further optimization.
153 nM.21 This compound was expected to bind with its guanidine in the S1 pocket. However, its X-ray crystal structure in complex with factor Xa (see below) revealed that the neutral dichlorophenyl substituent binds to the S1 pocket, while the guanidine interacts with S4. This “chloro-binding mode”, which was independently discovered by others for thrombin32 and fXa,23,24 was explored for further design. Although cationic interactions of ligand substructures with amino acids in both the S1 and S4 subsite are favorable for in vitro affinity, permanently charged groups might be detrimental for oral bioavailability. Hence, the replacement of basic moieties in S1 by neutral substituents and the reduction of the basic nature of the S4 directed substructure were important considerations guiding our structure-based design efforts. We were able with this achiral motif to orient different substituents toward essential fXa subpockets, namely, S1, S4, and the so-called “ester binding pocket” (EBP), which is located adjacent to S1.21 The arginine in 57 was replaced by synthesis of a library of 330 amides selected for their fit into S4 by docking, resulting in a series of potent, less basic pyridines, pyrimidines, and piperidines interacting with this subsite.21 The 4-piperidylpyridine motif resulted in very potent fXa inhibitors, for example, 50 with a Ki of 18 nM. This compound then served as a lead for all further evaluations. For selected analogues high-resolution fXa X-ray structures were obtained, thus confirming individual structure-based design iterations, further underscoring our initial assumption of a consistent binding mode for this series. Critical determinants for ligand binding to fXa were deduced from this combination of structurebased design techniques and 3D-QSAR. The 3D-QSAR models then allowed affinity predictions complementing structure-based scoring functions for evaluation of synthesis candidates. Their interpretation uncovered important steric, electrostatic, and hydrophobic features, which are linked to fXa affinity. On the basis of SAR, docking, and X-ray crystallography, novel compounds were synthesized. The final set of 80 molecules as training set for this study and 27 test molecules are summarized in Table 1, where the training set is indicated as SAR set 1 and the test set is indicated as SAR set 2.
2. Design of Factor Xa Inhibitors
3. Methods
The 3-oxybenzamide scaffold resulted in potent and selective fXa inhibitors after screening of targeted compound libraries. This led to the identification of the arginine containing hit 57 (Table 1) with a Ki value of
3.1. Chemistry and Enzyme Assay. Synthesis of all compounds in Table 1 was achieved from commercially available or easily accessible 3-hydroxy-4-substituted-benzoic acid methyl esters in a conventional synthesis sequence, as described earlier.21,33 Esterification in methanol led to the central
3292
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Table 1. Chemical Structure and Activities for 3-Oxybenzamides
Matter et al.
Factor Xa Inhibition Table 1. (Continued)
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3293
3294
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Table 1. (Continued)
Matter et al.
Factor Xa Inhibition Table 1. (Continued)
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3295
3296
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Matter et al.
Table 1. (Continued)
a
Corresponding numbering of compounds in ref 21.
Table 2. Data Collection and Refinement Statistics for Six Factor Xa X-ray Structures inhibitor detector space group a, Å b, Å c, Å obsd reflns unique reflns resoln, Å R-sym, % completeness, % protein atoms inhibitor atoms calcium ions water molecules R-factor, % R-free, % rmsd bond length, Å rmsd bond angle, deg rmsd dihedrals, deg rmsd impropers, deg
50
63
57
87
56
37
marccd P212121 57.0 72.3 78.1 33 939 9120 2.7 8.9 98.0 2249 35 1 218 19.3 27.7 0.009 1.33 24.8 0.86
marccd P212121 56.0 71.7 78.6 21 647 5987 3.1 12.8 98.2 2248 32 1 294 15.7 23.0 0.015 1.69 25.4 0.81
mar300 P212121 56.3 71.7 77.5 57 893 18 769 2.1 6.3 99.2 2249 37 1 288 22.0 29.6 0.009 1.60 24.8 2.11
mar300 P212121 56.07 71.53 78.36 21 857 9132 2.65 10.8 95.1 2249 33 1 284 17.2 24.6 0.008 1.34 24.8 0.76
mar300 P212121 56.7 71.96 78.55 19 540 5126 3.3 14.98 99.4 2249 35 1 283 15.4 27.7 0.012 1.35 24.5 0.81
mar300 P212121 56.34 71.95 78.07 43 464 11 393 2.5 6.6 99.3 2240 36 1 241 19.8 24.8 0.013 1.29 24.5 0.75
precursor for the attachment of the arylethylene moiety directed toward S1. This etherification was performed under Mitsunobu conditions.34 Attempted alkylation of the 3-hydroxy4-substituted-benzoic methyl esters via tosylates or mesylates of the arylethylene alcohols under various conditions was less successful because of extensive elimination of the activated arylethylene moiety. After saponification of the methyl ester, the acid was subjected to the final amide coupling using TOTU (O-((ethoxycarbonyl)cyanomethyleneamino)-N,N,N′,N′-tetramethyluronium tetrafluoroborate) as activating agent. The biological assay was reported earlier.35 Enzyme inhibition (pKi) is expressed as log[(1/Ki)(1000)]. 3.2. X-ray Structure Analysis. 3.2.1. Crystallization. Purified human fXa was purchased from Enzyme Research Lab (South Bend, IN). The Gla domain was removed, and the Gla-less factor Xa was crystallized in hanging drops as described earlier.23 3.2.2. Data Collection and Processing. Crystals were soaked in a 5 µL reservoir solution containing ∼20 mM or saturated inhibitor (depending on the solubility) for 24-72 h. Data were collected at cryotemperatures. The crystals were picked up with a fiber loop, soaked for a few seconds in a solution containing 20% glycerol and 5-20 mM inhibitor in reservoir solution, and flash-frozen in a stream of gaseous nitrogen at 100 K. The X-ray intensity data were collected on a 130 mm Mar CCD detector mounted on an Elliot GX-13 “big wheel” rotating anode generator (Nonius, The Netherlands) and operated at 40 kV and 55 mA or on a Mar300 imaging plate (X-ray research, Germany) mounted on a FR591 rotating anode (Nonius, The Netherlands) and operated at 40 kV and 80 mA. Data processing and scaling were carried out using XDS.36 Data collection and refinement statistics for selected crystal structures are presented in Table 2. 3.2.3. Structure Solution and Crystallographic Refinement. The structures were solved by molecular replacement. The search models were made from the coordinates of
refined structures of fXa complexes solved previously with the same crystal packing as the current complexes. The bound inhibitors were omitted from the search models. Energyrestrained least-squares refinement was carried out using X-PLOR.37 This refinement was started with rigid body refinement to adjust for small differences in cell dimensions, followed by energy minimization and individual temperature factor refinement. At this stage the 2Fo - Fc and Fo - Fc maps were inspected and the inhibitors were fitted. Solvent molecules were included if they were on sites of difference electron density with values above 3.5σ and if they were within 3.5 Å of the protein molecule or a water molecule. After two to three additional rounds of manual inspection, rebuilding, and refinement, final models were obtained with R factors between 15.4% and 22.0% and free R factors between 23% and 29.6% plus good geometry. The EGF-1 domain is not visible in the electron density maps probably because of disorder, and the rather high free R factors might relate to this disordered EGF-1 domain. The statistics of the crystallographic refinement are listed in Table 2. 3.3. Computational Procedures. 3.3.1. Docking Studies. FXa crystal structures from RSCB38 and our database were used for docking. After analysis of protein-ligand interactions using the program GRID,39 molecules were oriented and minimized within the binding site or automatically docked in different orientations using the program QXP40 with a modified version of the AMBER force field.41 Selected protein side chains were treated as flexible after comparative analysis of fXa X-ray structures. A structurally conserved water situated in the S4 pocket was included in energy minimizations. The alignment for all molecules from Table 1 after visual inspection of QXP docking modes served as the basis for 3D-QSAR studies and tailored scoring functions. To further assess the influence of the force field on the model quality, all compounds were optimized using the MMFF94s force field42 in SYBYL.43 This alignment was also
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3297
used to produce statistical models. Protein/ligand complexes were minimized using quasi-Newton-Raphson (BFGS) or conjugate gradient (CG) procedures with all protein atoms being rigid. Introducing flexibility into amino acids within 4 Å around the ligands did not improve the statistical results. The program MOLCAD44 was used to visualize properties such as lipophilicity45,46 and cavity depth on solvent accessible protein surfaces.47 3.3.2. 3D-QSAR. Default settings were used for CoMFA and CoMSIA, if not otherwise indicated. For CoMFA, steric and electrostatic energies are calculated at grid points with 2 Å spacing, a positively charged carbon atom, and a distancedependent dielectric constant with MMFF94 charges.42 The alignment was also used for CoMSIA steric, electrostatic, and hydrophobic similarity index fields29 using a probe with a charge of +1, a radius of +1, a hydrophobicity of +1, and an attenuation factor R of 0.3 for Gaussian-type distance dependence. Cross-validated analyses48 were run using SAMPLS49 or two and five cross-validation groups with random selection of group members. PLS (partial least squares)50 analyses using two or five random cross-validation groups were averaged over 100 runs. For validation, all affinities were randomized51 100 times and subjected to PLS and the mean cross-validated r2 was calculated. 3.3.3. 3D-QSAR Model Validation. Progressive shuffling19,52,53 was used for randomization. Biological activities were randomized only within 2-12 individual subgroups, while the relationship between subgroups is not changed. This allows the direct evaluation of the stability of each model. With 12 subgroups, only a local perturbation of activities is introduced, while for two subgroups, a much larger portion of the data set is randomized. The number of subgroups was plotted against the mean cross-validated r2 value per model (20 randomizations each). The 2D fingerprints were generated using the program UNITY54 for selection of training and test sets; their similarity is given as the Tanimoto coefficient.55 3.3.4. AFMoC Scoring Function. AFMoC30 implemented in DrugScore 1.256,57 is used to derive a tailored scoring function for fXa. Interaction fields for ligands were calculated30 using the fXa/50 complex with 1 Å grid spacing. The grid box embedded all inhibitors with a margin of at least 4 Å in each direction. Interaction fields from the following SYBYL atom type43 were analyzed: C.3, C.2, C.ar, O.3, O.2, N.am, Cl, N.3, N.ar. A half-width of 0.85 for the Gaussian function to distribute atomic protein-ligand interactions was used. A dimensionless value of 10 was used for the height of the repulsive Gaussian function at the origin of atom-atom contacts. Analyses with and without leave-one-out crossvalidation were performed using AFMoC’s implementation of SAMPLS49 and PLS,50 respectively. Because field types from rarely represented ligand atoms are not included in statistical analyses, an empirical scaling factor is applied to correct pKi values by these contributions.30 Consequently, all r2 and r2(cv) values consider only the part of the binding energy used for PLS,58 while these correspond to statistical parameters considering the total binding affinity. Visual inspection of AFMoC results is based on the contouring of std*coeff fields at appropriate levels. 3.3.5. TScore Scoring Function. In the tailored scoring function TScore, protein-ligand interactions are evaluated for residues within 8 Å around the ligand in the fXa/50 complex.31 Descriptors are calculated using an internal C++ implementation of Chemscore59 and PLP.60 For each residue, the following descriptors were calculated: Chemscore hydrogenbonding and lipophilic terms; PLP steric contact and hydrogen bonding terms. In addition, the summation over all residues in a binding site for each of the four terms, corresponding to original Chemscore and PLP terms, plus the Chemscore rotatable bond term and total score were included in the statistical analysis. The 126 relevant descriptors were autoscaled and subjected to PLS50 and SAMPLS.49 Mild variable selection resulted in a significant and predictive TScore function based on 52 descriptors. TScore visual inspection is based on coloring residues according to PLS std*coeff values.
4. Results and Discussion 4.1. X-ray Structure Analysis. Crystals of the factor Xa/inhibitor complexes diffracted reasonably well with resolutions between 2.1 and 3.3 Å (Table 2). The inhibitor binding modes are well-defined as much as the resolution allows. Hence, the 2.1 Å structure is accurately defined, while the 3.3 Å indicates the overall binding mode. However, high-resolution reference structures and the high degree of ligand similarity resulted in more accurate structures than expected without references. 4.2. Inhibitor Binding Modes: S1 Subsite. Most fXa inhibitors rely on the interaction of a basic moiety with Asp189 at the bottom of the protease S1 pocket. However, six X-ray structures solved at different stages of this project indicate a favorable nonbasic interaction with this site. Collectively these X-ray structures reveal that the inhibitor dichlorophenyl group is invariably located in S1. Figure 1 summarizes the inhibitor binding modes of all X-ray structures including structurally conserved water (cyan spheres) within 4 Å around the ligand. Here, the fXa binding site is indicated as a solvent-accessible surface colored by subsite depth.44,47 In particular, the para chlorine atom is involved in a direct lipophilic contact pointing toward the center of the aromatic ring of Tyr228 at the back wall of S1 (yellow in Figure 1). Distances from 3.6 to 4.5 Å for this Cl‚‚‚C.ar (Tyr228) aromatic interaction are observed in all X-ray structures, in accord with data from knowledgebased and force field methods. The carbon-chlorine bond is directed toward the plane of the Tyr228 ring (dihedral angle C-Cl‚‚‚Tyr228(Centroid)-Tyr228(Cz): 58°). It appears that the gain in binding energy from this lipophilic contact could compensate for the lack of a polar interaction of a basic group to Asp189, resulting in inhibitors with low nanomolar affinity. This lipophilic interaction between chlorine and an aromatic ring (“chloro-binding mode”) is also documented in 41 proteinligand X-ray structures in the database ReliBase,61 retrieved using a 3.5-4.5 Å distance between atoms for 3D searching. The interaction of nonbasic groups in the S1 pocket of the family of Ala190-serine proteases has previously been described in the search for thrombin32 and fXa inhibitors.23,24 A total of 19 PDB entries with this interaction were analyzed for factor Xa (PDB codes: 1ioe, 1iqe, 1iqf, 1iqg, 1iqh, 1iqi, 1iqj, 1iqk, 1iql, 1iqm, 1iqn, 1mq5, 1mq6, 1nfu, 1nfw, 1nfx, 1nfy) and rat trypsin mutants (1jl7, 1ql9). Even in the presence of a benzamidine plus a chlorothiophene or chlorobenzothiophene, the molecules interact using the neutral group in the “chloro-binding mode”, whereas the S4 pocket accommodates the basic moiety.23 This binding mode is in line with our X-ray structures, showing that electrostatic interactions in S1 are not mandatory for high affinity, while neutral replacements open the route to inhibitors with increased potential for oral bioavailability. This para chlorine atom is situated in an area occupied by a structurally conserved water molecule in complexes with benzamidines in S1. This indicates that solvent displacement is required for binding. The free energy cost upon displacement of this water molecule is likely to be close to the maximum value of 2.0 kcal/ mol.62 A view into fXa S1 pockets for a benzamidine-
3298
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Matter et al.
Figure 1. Comparison of X-ray binding modes for different 3-oxybenzamides in complex with human factor Xa. The following compounds are shown: (A) 57 (153 nM), (B) 50 (18 nM), (C) 37 (51 nM), (D) 63 (650 nM), (E) 87 (1412 nM), (F) 56 (25 nM). The experimentally determined factor Xa binding site is presented as solvent-accessible surface color-coded by cavity depth (from orange to blue). Essential pockets are indicated as S1, S4, and EBP (ester-binding pocket). Structurally conserved water is displayed with cyan spheres.
containing ligand (Figure 2A, PDB code 1lqd19) in comparison to 50 (Figure 2B) with a chlorine-Tyr228 interaction illustrates this different binding interaction in S1. This experimental preference for a favorable interaction involving Tyr228 is also uncovered in molecular interaction fields computed using GRID and an organic chlorine as probe atom.39 In addition, favorable contour regions were also obtained using the complementary knowledge-based SuperStar approach.63 Both interaction regions are highlighted by green contours in parts C and D of Figure 2 from profiling the binding site of the fXa/50 complex using GRID and SuperStar, respectively. The para chlorine atom in our X-ray structures is also closer to more polar atoms at the edge of this pocket, namely, Trp215-CdO (3.6 Å) and Ile227-NH (3.4 Å), while the Asp189 carboxylate is at a distance of 4.5 Å. The ortho chloro atom is more solvent-exposed and interacts with the Oγ group of the inward-directed Ser195 and a conserved water. Asp189 at the bottom of S1 points toward the aromatic ring and is not involved in a favorable interaction. 4.3. Inhibitor Binding Modes: Central Scaffold and S4 Subsite. In general the position of the 3-oxybenzamide aromatic ring is well-defined in electron density maps with the benzamide nitrogen hydrogenbonded to the carbonyl oxygen of Gly216. The small oxybenzamide 4- and 5-substituents are oriented toward
the ester binding pocket (EBP) on top of the Cys191Cys220 disulfide bridge and surrounded by Gln192, Arg143, and Glu147 side chains adjacent to S1.21 Figure 1A shows the structure of 57 in fXa with an arginine moiety in S4 as the start for lead optimization (Ki(fXa) ) 153 nM, resolution of 2.1 Å). The 4-methoxy group located in the EBP is in contact with Glu147 and Gln192 Cγ atoms and the main chain carbonyl of Glu147. The C-terminal arginine diethylamide does not favorably interact and is disordered in the electron density maps. The arginine side chain is accommodated in S4 lined by the aromatic side chains of Tyr99, Phe174, and Trp215. The guanidine group is located in the “cation hole” at a position that is occupied by a pyridyl nitrogen in most structures. The terminal guanidine nitrogens interact with a water molecule; one of them is hydrogen-bonded to the inward-pointing backbone carbonyl oxygens of Glu97 and Thr98. The side chain of Glu97 is partly disordered and does not interact directly with the guanidine, while neighboring negative charges of Glu97 and Asp100 create a region favorable for the binding of positively charged residues. This distal part of S4 seems to flexibly adapt to the requirements of the ligand. The ligand arginine carbonyl oxygen interacts via a hydrogen-bonding network to two structurally conserved water molecules with the polar side chains of Arg222 and Glu217. The X-ray structure of 50 (Ki(fXa) ) 18 nM, resolution of 2.7 Å) reveals a similar binding mode with the
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3299
Figure 2. Comparison of factor Xa S1 binding modes for a typical benzamidine containing ligand (PDB code 1lqd) (A) and compound 50 (B) with a chlorine-Tyr228 interaction to illustrate different interaction pattern. Selected S1 amino acids are shown with solvent-accessible binding site surface colored by cavity depth. Depth cueing is used for a view into S1. Shown is the detailed analysis of the fXa/50 complex for favorable interactions with organic chlorine probes using GRID (C) and SuperStar (D). Favorable interaction regions are consistently highlighted by green contours.
4-methoxy group situated in the ester binding pocket and the favorable Cl-Tyr228 interaction in S1. This structure is shown in Figures 1B and 3 with key interactions highlighted. The electron density of the dichlorophenyl and piperidylpyridyl substituents in S1 and S4 is well-defined, while the weak density in the central part indicates some disorder at the scaffold. The benzamide nitrogen interacts with Gly216-CO, while the piperidine in its chair conformation orients the pyridine substituent perfectly into S4, stacking it between the aromatic side chains of Tyr99 and Phe174. The pyridine nitrogen, which corresponds closely to one guanidine nitrogen in the complex of fXa/57, is hydrogenbonded to the carbonyl oxygen of Thr98 via a structurally conserved water molecule in S4, which also interacts with the carbonyl oxygen of Ile175 and the Thr98 hydroxyl group (Figure 3).
The binding mode of 37 in fXa, shown in Figure 1C (Ki(fXa) ) 51 nM, resolution of 2.5 Å), is identical, within experimental error, to 50 with clear and unambiguous electron density. The 5-amide substitution is directed toward the ester-binding pocket, while none of the four polar neighboring side chains are closer than 3.8 Å to this group. The 5-amide group shows one direct hydrogen bond with a structurally conserved water molecule, while the amide oxygen and nitrogen atoms are indistinguishable in the electron density maps. Consequently, for 3D-QSAR studies, all derivatives were consistently built from the actually shown orientation. Figure 1D shows the binding mode of the inhibitor 63 (Ki(fXa) ) 659 nM, resolution of 3.1 Å). Despite its unambiguous electron density, the resolution of 3.1 Å does not allow us to determine atomic positions with extreme accuracy. However, the binding mode of this
3300
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Figure 3. Details of the fXa/50 complex (Ki ) 18 nM) from the 2.7 Å X-ray structure: (A) detailed interactions between 50 and the fXa binding site with key residues indicated; (B) same view with solvent-accessible surface colored by cavity depth; (C) two-dimensional representation of essential proteinligand interactions.
inhibitor with the pyridyl moiety replaced by a lipophilic isopropyl substituent directed toward the edge of S4 is similar to that of the fXa/50 complex without any unfavorable interaction, indicating that the general orientation is not influenced by S4 modifications. Another binding mode for a weak inhibitor is given in Figure 1E for 87 (Ki(fXa) ) 1410 nM, resolution of 2.65 Å). Here, the S4 directed substituent is a substi-
Matter et al.
tuted piperazine located above Gly216 without the possibility of any favorable hydrogen bond interaction with Gly216-CO. Again, the electron density of the bound inhibitor is clear but indicates some conformational heterogeneity. The main interactions in S4 are not possible with the N-dimethylacetamide substituent stacked between Phe174 and Tyr99, situated at the position of the pyridine ring in 50. Consequently the replacement by a 4-piperidylmethyl substituent results in a slight increase of binding affinity (62, Ki(fXa) ) 650 nM). Finally Figure 1F displays the binding mode of 56 (Ki(fXa) ) 25 nM, resolution of 3.3 Å), deduced from a weakly diffracting crystal, thus limiting the accuracy of the structure. However, the electron density of the inhibitor is unambiguous and indicates a similar binding mode to 50. This compound lacks the ortho chlorine atom in S1 and has two hydrophobic substitutions in the EBP. From the data set, it can be concluded that the chlorophenyl group is inserted ∼0.7 Å less deep into the S1 pocket, while this difference is within the error margins. However, this is consistent with docking, suggesting an influence of the size of the EBP substitution to binding into S1, probably because of contacts to the bulky Cys191-Cys220 disulfide bond. This comparative analysis supports the assumption of a common binding mode to explain the structure-activity relationship. 4.4. CoMFA Model and Validation Studies. All 3D-QSAR models based on the fXa-derived alignment rule and different force field minimizations (QXP, MMFF94s) showed a high degree of consistency. During this project, preliminary 3D-QSAR models with fewer compounds at a particular time were used in each design cycle and then updated, while here, we present a final model encompassing a larger degree of structural variations in the set of 3-oxybenzamides (cf. SAR set 1 in Table 1). By use of a 2 Å grid spacing, a CoMFA model with an r2(cv) value of 0.714 for six PLS components and a conventional r2 of 0.947 were obtained (model A, Table 3 and Figure 4). The graph of observed versus fitted biological activities for model A is displayed in Figure 4. The steric field descriptors explain 47% of the variance, while the electrostatic field accounts for 53%, from the normalized sum of standard deviations after CoMFA_STD scaling multiplied by PLS coefficients from the final non-crossvalidated PLS model. From MMFF94s minimizations, a CoMFA model with a lower r2(cv) value of 0.585 for six PLS components and a conventional r2 of 0.913 resulted (model C, Table 3). Model A was extensively validated, which underscored its predictive power and significance. All affinity predictions obtained from this and subsequent models are listed in Table 4. The effect of the alignment relative to the grid definition was evaluated by consistently moving all compounds in increments of 0.5 Å in all three dimensions x, y, and z without affecting the superposition. The r2(cv) values range from 0.636 to 0.758 (mean r2(cv) of 0.70; SD ) 0.04), suggesting only a minor dependence on the orientation of the grid. A variation of 14 atom types as probe atom for CoMFA produced r2(cv) values from 0.714 to 0.744 with Cl or carbon-based atom types showing r2(cv) values of ∼0.72 and nitrogen and oxygen
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3301
Table 3. Model Statistics for 3D-QSAR and Tailored Scoring Functions alignment
model
no. compds
r2(cv)
SD
(a) 3D-QSAR Models To Explain Factor Xa Activity of A 80 0.741 0.507 B 107 0.730 0.484 C 80 0.585 0.610 D 107 0.616 0.580 E 80 0.609 0.587 F 107 0.664 0.542 G 80 0.600 0.607 H 107 0.612 0.583
PLS components
r2
pred r2
0.947 0.889 0.913 0.874 0.898 0.899 0.940 0.882
0.732
0.723 0.659 0.790 0.827 0.775
0.577 0.741 0.301 0.399 0.222
3-Oxybenzamidesa,b
CoMFAc CoMFA CoMFAc CoMFA CoMSIAc CoMSIA CoMSIAc CoMSIA
QXP QXP MMFF MMFF QXP QXP MMFF MMFF
AFMoCe AFMoCe TScoref TScoref,g TScoref
(b) Tailored Scoring Functions To Explain Factor Xa Activity of 3-Oxybenzamidesd QXP I 80 0.495 0.721 4 MMFF J 80 0.414 0.716 4 QXP K 80 0.572 0.615 5 QXP L 80 0.644 0.561 5 MMFF M 80 0.343 0.767 6
6 6 6 6 5 6 8 6
0.696 0.784 0.679
a CoMFA and CoMSIA models were derived using a minimum σ of 2. r2(cv): cross-validated r2 using leave-one-out. SD: standard deviation of error from leave-one-out PLS model. PLS components: optimal number of components. r2: non-cross-validated regression coefficient. b Models based on 80 compounds were derived using SAR set 1 in Table 1. c Models subjected to extensive statistical validation. d r2(cv): cross-validated r2 using leave-one-out. SD: standard deviation of error from leave-one-out PLS model. PLS components: optimal number of components. r2: non-cross-validated regression coefficient. e AFMoC models were derived using nine different field types: C.3, C.2, C.ar, O.3, O.2, N.am, Cl, N.3, N.ar and 80 compounds from SAR set 1 in Table 1. f TScore models were derived using Chemscore lipophilic, hydrogen bond terms, PLP hydrogen bond, contact terms for each residue within 8 Å around the inhibitor binding site plus global Chemscore and PLP terms for 80 compounds from SAR set 1 in Table 1. g TScore model selecting 51 informative variables based on PLS coefficients from model K.
Figure 4. Graph of observed versus fitted binding affinities for the final non-cross-validated CoMFA model A.
probes r2(cv) of ∼0.74, suggesting only a slight dependence on the probe (mean r2(cv) of 0.73; SD ) 0.01). Onehundred randomizations of biological activity resulted in a mean r2(cv) of -0.05 (SD ) 0.06; high, 0.10; low, -0.21), showing that model A is significantly better than a random model. Although cross-validation reflects the predictive power of a model, the leave-one-out method might produce too optimistic r2(cv) values. Thus, PLS analyses were run 100 times with two and five randomly chosen crossvalidation groups containing randomly selected compounds for prediction. The mean r2(cv) value of 0.683 for six PLS components and five cross-validation groups (SD ) 0.03; high, 0.742; low, 0.604) is slighly lower than using the leave-one-out method. By use of only two cross-validation groups, a lower mean r2(cv) value of 0.610 was observed (SD ) 0.06; high, 0.722; low, 0.450). The influence of increasing information in X-space on model predictivity is estimated from incrementally dividing the set of 80 3-oxybenzamides into training and test sets using statistical design,64 producing subsets with 20-76 members. For each subset a leave-one-out PLS analysis served to extract the r2(cv), while a PLS
analysis without cross-validation led to a model for predicting the remaining compounds to produce the predictive r2 value for each model. The r2(cv) value is low with less than 44 diverse compounds in the training set. Although the conventional r2 is high, the predictive ability is not sufficient. The mean Tanimoto coefficient for this 44-compound training set is 0.83, while the pair of most similar compounds in this subset shows a Tanimoto coefficient of 0.88. When the subset size is increased, the cross-validated r2 reaches values between 0.55 and 0.78. For the corresponding external predictions the predictive r2 is between 0.67 and 0.85, demonstrating stable models with good predictive capabilities for these external test sets. This example suggests that a reliable prediction of biological affinities can be expected for novel molecules with a higher similarity than defined by this Tanimoto threshold of ∼0.83. The degree of extrapolation increases with decreasing similarity, thus leading to less reliable affinity predictions for external test sets. Progressive shuffling19,52,53 was applied to assess model stability against variations in biological data. After the data set is divided into 2-12 groups using activity thresholds, each group is internally randomized. By use of 12 groups, only a small uncertainty is probed resulting in a mean r2(cv) of 0.68, while for two subgroups, a larger portion of the y-block is randomized. This interpretation aids in understanding the effect of biological errors on PLS predictions and estimates model stability. With one group, the results are similar to complete randomization; negative r2(cv) values are observed. Shuffling with more than two subgroups produces models of remarkable quality (mean r2(cv) for three groups 0.53), suggesting that the final models are less dependent on minor biological variations. These extensive validation studies collectively support the finding of a stable, significant, and predictive PLS model. 4.5. CoMSIA Model and Validation Studies. Similar statistical results were obtained using CoMSIA steric, electrostatic, and hydrophobic fields. By use of a
3302
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Matter et al.
Table 4. Fitted and Predicted Factor Xa Affinities from 3D-QSAR Models for 3-Oxybenzamidesa ID SAR_set FXa_pKi CoMFA_A CoMSIA_E AFMoC_I TScore_L
ID
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
1 1 2 1 1 1 2 1 1 1 2 2 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 2 2 2 2 2 1 2 2 1 1 1 1 1 1 2 2
4.70 4.30 4.40 4.43 4.44 3.99 3.91 3.50 2.81 2.71 2.60 3.38 2.34 2.00 4.15 3.69 3.69 3.51 2.34 3.15 2.48 2.81 2.00 2.00 3.97 4.12 3.81 4.43 3.85 4.36 4.54 4.32 4.12 3.50 2.75 3.23 4.29 4.55 4.21 3.63 3.90 4.30 3.58 4.41 4.70 4.07 2.98 5.00 4.89 4.74 4.39 4.27 4.39 4.89
4.66 4.05 4.23 4.47 4.26 4.09 4.08 3.40 2.58 2.69 2.55 3.79 2.52 1.98 4.23 4.00 4.13 3.35 2.53 2.70 2.59 3.68 1.99 1.96 3.98 4.25 4.04 3.74 4.04 4.33 4.08 3.33 3.22 3.14 2.78 3.28 4.42 4.35 3.69 3.65 4.57 3.50 3.86 3.84 4.89 4.84 3.03 5.00 4.51 4.58 4.58 4.40 4.05 4.26
4.78 4.08 4.38 4.71 4.73 4.04 4.03 3.22 2.63 3.12 2.41 3.54 2.51 2.07 4.57 3.89 4.16 3.10 2.51 3.02 2.27 3.40 2.26 2.06 3.87 4.18 4.22 3.80 3.63 4.01 3.64 3.80 3.28 3.41 3.31 3.27 4.21 4.21 3.90 3.71 4.40 3.63 4.22 3.93 4.73 4.83 3.08 4.72 4.35 4.20 4.37 4.51 4.08 4.55
5.18 4.04 4.32 4.53 3.29 5.23 4.67 3.63 3.20 2.92 2.34 4.39 2.99 1.66 2.78 4.88 4.72 3.57 3.05 2.81 2.21 4.16 2.42 1.90 3.96 3.32 2.99 3.62 4.33 4.15 3.97 3.34 3.95 3.05 3.15 3.78 4.07 4.45 3.66 3.61 4.10 3.11 3.36 3.42 3.63 3.66 3.06 4.94 3.96 3.67 3.87 3.99 4.49 4.84
4.69 3.83 4.01 4.31 4.50 4.30 4.07 3.63 2.73 2.84 2.91 2.60 2.41 2.15 4.25 4.12 3.40 3.74 2.64 2.77 2.94 2.44 1.65 1.96 3.88 4.36 4.26 3.84 4.11 3.88 3.74 3.03 3.37 3.63 2.76 3.11 3.96 4.02 2.42 2.64 3.94 2.48 2.64 3.25 4.42 4.14 2.84 4.38 3.97 4.24 4.31 4.24 4.37 4.26
SAR_set FXa_pKi CoMFA_A CoMSIA_E AFMoC_I TScore_L 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1
3.55 4.60 3.82 3.72 4.24 3.96 3.58 3.22 3.19 3.02 3.01 2.95 2.81 3.02 3.39 2.21 2.00 2.00 3.76 2.29 3.85 2.00 2.00 2.00 2.00 2.83 2.86 2.90 2.00 2.00 2.00 2.36 2.85 2.00 2.98 2.39 2.24 3.78 3.51 2.00 2.27 2.52 2.66 2.41 2.65 2.37 2.03 2.19 2.21 2.60 2.59 2.55 2.31
3.34 4.89 3.83 3.59 4.04 4.09 3.68 3.35 3.20 3.32 3.06 3.32 2.72 3.18 3.13 2.45 2.32 2.44 3.93 2.38 3.70 2.21 1.66 1.89 1.73 3.15 3.21 3.09 2.00 2.06 2.09 2.42 2.81 2.32 2.90 2.37 2.43 3.64 3.41 1.96 2.27 2.28 2.75 2.33 2.82 2.50 1.72 2.02 1.86 2.68 2.60 2.34 2.62
3.88 4.97 4.21 3.93 3.68 4.02 3.34 3.27 2.61 3.44 2.83 3.09 2.95 2.84 2.85 2.69 2.28 2.51 3.84 2.25 3.71 1.87 1.99 1.92 1.81 3.11 2.78 3.16 2.24 1.85 2.15 1.97 3.08 2.05 2.94 2.13 2.69 3.49 3.40 1.89 2.45 1.89 2.72 2.43 2.56 2.46 2.28 1.98 1.68 2.44 2.63 2.33 2.46
3.27 4.97 3.39 3.30 3.75 3.40 3.52 3.11 2.92 3.56 2.64 2.61 2.80 2.32 2.68 2.67 2.63 3.05 3.64 2.95 3.52 1.70 2.14 1.61 2.30 2.88 3.13 3.09 2.31 2.62 2.08 2.75 2.82 2.39 2.61 2.43 2.65 3.04 3.15 1.88 2.34 2.06 2.79 2.09 2.87 2.31 2.16 2.37 2.42 2.69 2.92 2.61 4.11
4.23 4.43 3.67 3.52 4.09 4.18 3.95 3.21 3.68 3.70 2.85 2.98 3.20 2.86 3.32 2.18 1.86 2.43 3.91 2.16 3.79 1.86 2.36 2.13 2.06 2.82 3.16 2.92 2.47 2.56 2.22 2.81 3.06 2.47 2.87 2.03 2.46 2.93 2.69 2.22 2.36 2.42 2.80 2.01 2.68 2.28 2.09 2.11 1.79 2.89 2.59 2.61 3.75
a Experimental biological activity pKi_fXa is expressed as log[(1/ )(1000)]. See text and Table 3 for details on 3D-QSAR models and Ki tailored scoring functions. Predictions for all compounds from models A, E, I, L, and SAR set 1 were based on the corresponding noncross-validated PLS models with optimal number of components, while compounds indicated as SAR set 2 were used as the external prediction set.
2 Å grid spacing, a CoMSIA model with an r2(cv) value of 0.609 for six PLS components and a conventional r2 of 0.898 was obtained (model E, Table 3). The steric field descriptors explain 16% of the variance, the electrostatic descriptors explain 48%, and the additional hydrophobic field explains the remaining 36%. The alignment from MMFF94s minimization produced a CoMSIA model with a comparable r2(cv) value of 0.600 for eight PLS components, and a conventional r2 of 0.940 resulted (model G, Table 3). Model E was subjected to validation, which collectively supports the finding of a stable and predictive model. No effect of the alignment relative to the grid was observed because of the CoMSIA Gaussian-type smoothing function.65 When the activities are randomized, a
mean r2(cv) of -0.10 (SD ) 0.12; high, 0.21; low, -0.47) is observed. When over 100 PLS analyses with two random cross-validation groups are averaged, a mean r2(cv) of 0.464 (SD ) 0.08; high, 0.639; low, 0.228) results. This value is increased for five cross-validation groups. Here, a mean r2(cv) value of 0.553 for six PLS components results (SD ) 0.04; high, 0.640; low, 0.448). The incremental splitting of the data set in training and test using statistical design led to comparable results to CoMFA. For CoMSIA, the first reliable PLS models emerge with 48 training set molecules, a r2(cv) of.0.57, and a predictive r2 of 0.63. In general, both the crossvalidated and predictive r2 values are slightly lower for CoMSIA. This study based on external test sets suggests that for this case the CoMFA models have a higher
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3303
Figure 5. Experimental versus fitted or predicted binding affinities for 3D-QSAR models (Table 3). Shown are 80 training compounds as dots, while crosses represent 27 compounds for external predictions: (A) CoMFA model A; (B) CoMSIA model E.
predictive power than CoMSIA, while CoMSIA is less dependent on alignment inaccuracies and might be comparable from a practical point of view for design. For each training set with acceptable statistical parameters it was checked whether its interpretation in terms of CoMFA and CoMSIA contour maps led to similar conclusions, which was always the case. Progressive shuffling applied to the final CoMSIA model also demonstrates its robustness against variations in biological data. The increase of the mean r2(cv) values with more than one subgroup is obvious, suggesting that shuffling using more than three subgroups still produces models of remarkable quality for CoMSIA (mean r2(cv) of 0.59 for 3 subgroups and 0.69 for 12 subgroups). This analysis again shows that both QSAR techniques are robust and tolerate minor biological variations. 4.6. Prospective Design Based on 3D-QSAR Models. The final 3D-QSAR models were applied prospectively to 27 3-oxybenzamides as an external set (SAR set 2 in Table 1). New compounds were docked and aligned in a consistent way and minimized using the appropriate force field. The obtained predictive r2 values for models in Table 3 are 0.732 and 0.696 for CoMFAQXP (model A) and CoMFA-MMFF (model C), respectively, and 0.784 and 0.679 for CoMSIA-QXP (model E) and CoMSIA-MMFF (model G), respectively. A graph of experimental versus predicted fXa binding affinities is given in parts A and B of Figure 5 for models A and E, while affinity predictions for training and test sets are summarized in Table 4 for all relevant 3D-QSAR models and tailored scoring functions discussed below. The inspection of Figure 5 suggests that the final 3DQSAR models are of comparable statistical quality in terms of predicting novel binding affinities with a slightly better performing alignment derived from docking using QXP. However, this minor difference might also be due to the partial side chain flexibility introduced in QXP docking. Both CoMFA and CoMSIA models were applied to prioritize synthesis candidates. Prospective predictions for 44 additional new compounds resulted in a predictive r2 value of 0.68 for CoMFA model A and 0.82 for CoMSIA model E (no data given), clearly showing that the quality of external predictions was acceptable for both CoMFA and CoMSIA. Indeed, this model was used for further design
cycles. A combination of structure-based design with inspection of 3D-QSAR contours proved very effective in supporting the design of novel inhibitors. Synthesis proposals were generated, evaluated using this procedure, and reliably ranked by 3D-QSAR affinity predictions. Each design cycle then was completed by generation and applications of novel, improved 3D-QSAR models. This stepwise design and synthesis procedure based on CoMFA predictions constantly improved statistical results and predictivity in this series. The final 3D-QSAR models encompassed all 107 3-oxybenzamides from Table 1 within a single analysis, which did not significantly alter the statistical quality. The corresponding CoMFA and CoMSIA models based on QXP or MMFF alignments are summarized in Table 3 with r2(cv) values of 0.730 and 0.616 for CoMFA and 0.664 and 0.612 for CoMSIA, respectively. No significant changes in CoMFA and CoMSIA contour regions were obvious compared to models A and E. 4.7. Tailored Scoring Functions. The 3D-QSAR models were very consistent with each other and with the two tailored scoing functions, despite the use of different force field protocols. Note that the QXP alignments were statistically superior in all cases, however. By use of a 1 Å grid spacing, an AFMoC model with an r2(cv) value of 0.495 for four PLS components and a conventional r2 of 0.723 was obtained (model I, Table 3 and Figure 9). Its graph of observed versus fitted biological activities is displayed in Figure 6A. Dots indicate training set molecules (SAR set 1 in Table 1), and crosses represent the test set (SAR set 2 in Table 1). In total, nine atom-type-derived interaction fields were utilized to derive this model: C.3, C.2, C.ar, O.3, O.2, N.am, Cl, N.3, N.ar. Because the atom types occur frequently in test and training sets, the number of fields in the final model was not reduced. Changing the minimization protocol led to a PLS model with a slightly lower r2(cv) value of 0.414 for four PLS components and a conventional r2 of 0.659, while the predictive r2 for the external test set (SAR set 2) was improved from 0.577 (AFMoC model I, QXP) to 0.741 (AFMoC model J, MMFF) (see Table 4). The relative contributions of individual field types from both AFMoC models are presented in Table 5, showing that lipophilic atom types such as C.3, C.ar, and Cl dominate the models. Prospec-
3304
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9
Matter et al.
Figure 6. Experimental versus fitted or predicted binding affinities for different scoring functions: (A) AFMoC model I; (B) TScore model L; (C) DrugScore global scoring function; (D) ChemScore global scoring function. Table 5. Relative Contributions of Field Types to the Final AFMoC Modelsa % contribution field type
AFMoC_I
AFMoC_J
C.3 C.2 C.ar O.3 O.2 N.am Cl N.3 N.ar
22.4 1.7 41.3 4.6 0.7 1.1 20.7 0.3 7.2
22.6 2.6 38.2 5.4 1.2 1.8 16.9 0.6 10.5
a PLS-derived relative contributions of different AFMoC field types for the nine different fields and AFMoC models I and J.
tive predictions for 44 additional compounds resulted in predictive r2 values of 0.69 for model I and 0.75 for model J (no data given), showing again that the quality of external predictions for these tailored scoring functions is comparable to that of CoMFA and CoMSIA. TScore as an alternative tailored scoring function also led to predictive PLS models, summarized in Table 3b. By use of descriptors for binding site residues within 8 Å around the template molecule 50, a TScore model with an r2(cv) value of 0.572 for five PLS components and a conventional r2 of 0.790 were obtained after QXP minimization (model K, Table 3b). This model encompassed all 126 nonzero TScore descriptors (four per
residue plus global terms), while a mild variable selection reduced this number to 51 relevant terms listed in Table 6. The final TScore model L then showed an r2(cv) value of 0.644 for five PLS components and a conventional r2 of 0.827 (model L; Table 3b and Figure 10). The graph of observed versus fitted biological activities for this model is displayed in Figure 6B. Changing the minimization protocol resulted in a PLS model with a lower r2(cv) value of 0.343 for six PLS components, a conventional r2 of 0.767, without significant prediction for the external test set (predictive r2 of 0.222 for SAR set 2). In contrast, the previous model L showed a predictive r2 value of 0.399 for this external set (Table 4) and a predictive r2 value of 0.72 in a prospective study of 44 additional compounds (no data given). While the predictive r2 value for SAR set 2 of model L is lower than observed for the 3D-QSAR models (cf. 0.732 for CoMFA model A), this value is still acceptable given the chemical diversity of this test set. A detailed inspection of predictions for both AFMoC and TScore highlights intrinsic features of scoring functions. Those tend to extract SAR information close to the protein surface, while 3D-QSAR are able to capture substitution effects, which are more distant to the surface but relevant to affinity for other reasons, such as entropy, long-range electrostatic effects, and others. Because this information is explicitly excluded from both tailored scoring functions, 3D-QSAR approaches still offer complemen-
Factor Xa Inhibition
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 9 3305
Table 6. PLS Coefficients of Factor Xa Amino Acid Properties from TScore Model La
with the inhibitor 48 (Ki(fXa) ) 10 nM) having a 4-piperidylpyridyl moiety in S4 and a dichlorophenyl substituent in S1. Green contours in Figure 7A (>80% contribution) indicate regions where steric bulk is favorable for fXa affinity, while yellow contours (80% contribution) refer to regions where an increase of positive charge (or less negative charge) is favored to enhance affinity, while red contours (80% contribution) refer to regions where additional steric bulk is favorable for affinity, while yellow contours (80% contribution) refer to regions where negatively charged substituents are unfavorable for affinity. Red contours (80% contribution) refer to regions where additional steric bulk is favorable for affinity. Yellow contours (80% contribution) refer to regions where negatively charged substituents are unfavorable. Red contours (80% contribution) refer to regions where hydrophilic substituents are favorable. Yellow contours (80% contribution). Those regions are located at the rear of the S4
subsite, where protein carbonyl groups point inward forming a hydrogen-bonding network and the piperidyl nitrogen is positioned on top of the Trp215 indole ring. Preferred hydrophobic interactions are indicated by yellow contour regions (0.04 (green) and 0.08 (green) and 0.04 (yellow) and 0.08 (yellow) and