Chapter 19
Reduced Dimensionality in Ligand-Protein Structure Prediction: Covalent Inhibitors of Serine Proteases and Design of Site-Directed Combinatorial Libraries Daniel K . Gehlhaar, Djamal Bouzida, and Paul A . Rejto
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
Agouron Pharmaceuticals, Inc., 3301 North Torrey Pines Court, La Jolla, CA 92037
Structure prediction of ligand-protein complexes is greatly facilitated when the location of ligand functional groups relative to the protein is known. This situation arises in two applications of practical interest: when a covalent bond is formed between the ligand and the protein, and when a fragment of the ligand dominates the molecular recognition with the protein. In both of these cases, it is shown that the predicted structure corresponds to the experimentally observed structure with increased probability. Using this approach, a library of compounds is screened for potential inhibitors of dihydrofolate reductase and porcine pancreatic elastase; known inhibitors were ranked favorably in both cases.
The computational prediction o f binding geometries o f compounds i n a protein active site, termed the docking problem, is an important component i n computer-aided structure-based drug design (1). The problem can be fruitfully approached with an energy function that is sufficiently accurate to discriminate between correct and incorrect binding modes and that is amenable to a conformational search method that can determine the lowest energy state o f the complex, which defines the predicted binding mode. Early methods treated both the ligand and the protein as rigid bodies and the search was limited to optimizing the position and orientation o f the ligand within the active site, while more recent methods incorporate flexibility o f the ligand and limited protein flexibility (2-7). It remains a significant challenge because o f the large number o f degrees o f freedom and the need to calculate interaction energies, but is eased significantly when the number o f degrees o f freedom is reduced. This can be achieved, for example, by fixing the conformation o f the ligand i n its bound
292
© 1999 American Chemical Society
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
293
conformation, but i n general this knowledge is not available prior to experimental determination o f the structure o f the complex. A n alternate method to reduce the number o f degrees o f freedom is to fix part o f the ligand within the active site o f the receptor, a simplification that is exploited i n this paper for two special cases. In the first case, the ligand forms a covalent complex with the protein, and the location o f the ligand is constrained by this chemical bond. In the second case, the portion o f the ligand that is primarily responsible for molecular recognition, the anchor, is fixed within the active site. Assuming that the binding mode o f the anchor fragment is not affected, this approach, termed fixed anchor docking, allows the efficient evaluation o f various substituents. In both cases, the search space is significantly reduced compared to fully flexible docking simulations because the center o f mass degrees o f freedom have been eliminated and the number o f binding modes are limited dramatically. W e discuss the methodology employed for the docking simulations and discuss validation studies. In conjunction with a method for estimating the binding affinity o f the resulting complexes, we use this approach to search libraries o f compounds for covalent inhibitors o f porcine pancreatic elastase and to build a virtual site-directed library for dihydrofolate reductase. W e demonstrate that known inhibitors o f porcine pancreatic elastase and a potent inhibitor o f dihydrofolate reductase, methotrexate, are identified using this method. Serine Proteases. The active site o f many viral serine proteases, such as cytomegalovius protease and hepatitis C protease, are solvent-exposed and comparatively shallow (8-10). These proteases differ from enzymes such as H I V protease, which has a highly constrained binding site, and has been a successful target for structure-based design (11-13). In order to achieve high potency, noncovalent inhibitors o f viral serine proteases might need to be relatively large, consistent with the long recognition sequence o f these proteases (14). Due to the binding energy provided b y the covalent bond, inhibitors that form a chemical bond with the protein can be smaller than those that do not, while maintaining the desired potency. Such comparatively small compounds are desirable both for increased oral bioavailability and for ease o f synthesis, although the presence o f a reactive functional group may lead to non-specific reactions with other serine proteases and reactive oxygens, as well as complicate the delivery of these compounds. The activated serine oxygen can bond covalently with an electrophilic group v i a a tetrahedral intermediate (77), but the software commonly used for predicting the structure o f ligand-protein complexes does not take such interactions into account. W h i l e some docking software does allow the modeling o f covalent interactions (2), the user is required to place the ligand i n the correct tetrahedral geometry relative to the protein, a limitation that makes the program unsuitable for searching large chemical databases. W e describe a method for the fully automated and rapid flexible docking o f inhibitors covalently bound to serine proteases, combining an energy function specifically tuned for molecular docking with an efficient search engine based on evolutionary programming.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
294
Site-Directed Combinatorial Libraries. Another application o f structure prediction methods i n reduced dimensionality arises i n the design o f combinatorial chemistry libraries, a field that is beginning to have significant impact on drug discovery (15,16). Combinatorial approaches can accelerate the speed o f research and increase the amount o f structure-activity data typically available by orders o f magnitude. Because o f the combinatorial explosion, however, the number o f compounds that i n principle can be synthesized typically far exceeds the number that can be made, let alone the number that can be assayed i n a reasonable timeframe. While considerable progress has been made i n automatic synthesis and screening methods, smaller, targeted libraries solve many o f the problems associated with large generic libraries. In this approach, the library is biased towards features that promote molecular recognition. These features can be obtained from the sequence o f known substrates (17), transition state mimics (18), or the three-dimensional structure o f the receptor (19,20). Combinatorial approaches and structure-based drug design approaches complement each other. The large number o f compounds inherent i n combinatorial methods reduces the need for precise estimates o f binding affinity, so critical i n traditional single compound design. A t the same time, constraints imposed b y the structure o f the receptor can reduce significantly the size o f library that needs to be considered. Due to the combinatorial nature o f the synthesis, eliminating even modest numbers o f monomers at each stage can have significant results. A library that consists o f a central core fragment with three attachment sites and 100 possible monomers at each site contains 10 compounds, at the limits o f current synthesis and technology. I f some fraction o f the monomers / can be eliminated at each site, the size o f the library scales down a s / , where n is the number o f attachment sites. Hence, eliminating 7 5 % o f the monomers i n this example with three attachment sites results i n a library approximately 1% o f the original size, well within the capability o f current experimental methodologies for synthesis and screening. 6
1
Methodology Energy Function. The energy function used to predict the structure o f ligand-protein complexes contains an intermolecular term for the interaction between the ligand and the protein binding site, and an intramolecular term for the conformation o f the ligand. The intramolecular term consists o f the van der Waals and torsional strain terms o f the D R E I D I N G force field (21), and is used to differentiate between low- and high-energy internal geometries o f the ligand. The intermolecular potential is a pairwise sum o f piecewise-linear potentials (Figures l a and l b ) between ligand and protein heavy (nonhydrogen) atoms (7), with parameters depending on the type o f interaction and the size o f the atom. Ligand and protein heavy atoms are classified as hydrogen bond donors, acceptors, donors and acceptors (e.g. a hydroxyl oxygen), or nonpolar. Small (fluorine and metal ions), medium (carbon, oxygen, and nitrogen), and large (sulfur, phosphorus, chlorine, and bromine) atoms are assigned atomic radii o f 1.4,1.8, and 2.2 A , respectively. These parameters were derived from interatomic distances observed i n high-quality crystal structures, and then optimized.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
295
Interatomic Distance
Interatomic Distance
Angle (degrees)
Figure 1. (a) Functional form o f the atomic pairwise potential used for the hydrogen bonding (A=15.0, B=2.3, C=2.6, D=3.1, E=3.4, F=-4.0) and dispersion terms (B=0.93a, C = a , D=1.25 a , E=1.5 a , F=-0.4, where a is the sum o f the atomic radii o f the protein and ligand atoms) o f the intermolecular potential, (b) Functional form o f the atomic pairwise potential used for donor-donor and acceptor-acceptor repulsive interactions. A=15.0, B=3.2, C=6.0, F=1.5. (c) Scaling factor used to modulate the hydrogen bonding and repulsive terms based on the relative orientation o f protein and ligand atoms. A l l units are A and kcal/mole.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
296
Each pair o f interacting atoms is assigned one o f three interaction types: a hydrogen bond interaction between donors and acceptors, a repulsive interaction for donor-donor and acceptor-acceptor contacts, and a generic dispersion term for other contacts. Both the hydrogen bond and repulsive terms are modulated by a scaling factor that imparts a crude angular dependence (Figure l c ) . The hydrogen bond angle for ligand acceptors is defined by the ligand atom, the protein donor atom, and its associated polar hydrogen, while for ligand donors, it is defined by the ligand atom, the protein acceptor and its neighbor. In both cases, the attractive region o f the potential is scaled. The angle for the repulsive term is defined by the ligand atom, the protein donor atom and its polar hydrogen for ligand donors, or, for ligand acceptors, the ligand atom, the protein acceptor and its neighbor. In this case, the long-range region o f the potential (between parameters B and C i n Figure l b ) is scaled. Search E n g i n e . Evolutionary programming (22), modeled after natural selection where a population o f solutions competes for survival, has been effective i n a variety o f optimization problems. In this study, the population encodes the dihedral angles for all single-bonded atoms, not including resonant bonds. These angles are initialized to a random value and allowed to vary during the optimization. The search process consists o f a fixed number o f generation cycles, where i n each cycle, the members o f the population are scored using the energy function described above. A subset o f the population is selected to be "parents" for the next generation, with the remainder discarded. Parents for the production o f "offspring" are selected i n a stochastic tournament, wherein the energy o f each member o f the population is compared with a fixed number o f randomly selected opponents. A w i n is assigned to the member with the lowest energy, and the number o f competitions a member wins is used to determine whether it survives into the next generation. Sufficient offspring are generated from the parent vectors through the addition o f Gaussian mutations, restoring the population to its original size and completing the generation cycle. Although it is possible to calculate the ideal standard deviation size for simple energy functions (22), they are not known for more complex response surfaces. Consequently, a self-adaptive strategy was used where the mutation sizes are allowed to vary (7), with selection pressure determining the ideal values as the simulation progresses. The best-scoring individual i n the final generation is minimized using a conjugate gradient search and defines the predicted structure. Over the course o f the search, the repulsive component o f each pairwise potential i n the intermolecular scoring function is scaled linearly, starting at a low value and rising to its final value at the end o f the run. This allows the ligand to freely interpenetrate the protein early i n the search, promoting population diversity and ensuring a more complete sampling o f the search space. For a ligand with ten rotatable bonds, a typical optimization involves approximately 70,000 energy evaluations and uses approximately 15 seconds o f C P U time on a Silicon Graphics R10000 processor. C o v a l e n t D o c k i n g . While many chemical groups can serve as the site o f covalent attack, one may be interested i n a particular group or may want to specifically disallow
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
297
attack at certain sites, such as protecting groups. To be o f use for large database searches, the specification o f ligand attack sites must be fully automated. A convenient representation o f the covalent attack sites is afforded by the C I F (Crystallographic Information File) format (23). A data definition was created that allows an atom to be specified i n terms o f its properties, such as element type, hybridization state, number o f neighbors, and aromaticity, as well as the properties o f its neighbors. The description is recursive, with each neighbor description having its own neighbor descriptions, so that any chemical group, regardless o f size, may be described. Each group description can be associated with a label string. For example, the attack site specification for a triflouromethyl ketone is given by "C,HeavyAtomNeighbors=3, Neighbor(0, BondOrder=2), Neighbor(C, Neighbor(F), Neighbor(F), Neighbor(F)), Neighbor(C)". This defines a carbon atom with three heavy atom neighbors; one neighbor is a double-bonded oxygen, another is a carbon which itself has three fluorine neighbors, and the third neighbor may be any carbon atom. Accurate modeling o f the tetrahedral state o f the covalent inhibitors is necessary for correct prediction o f the bound structure. M a n y inhibitors undergo substantial conformation changes upon formation o f the covalent interaction, as the electrophilic atom changes from planar to tetrahedral geometry. These conformational changes can affect ligand atoms far away from the covalent interaction, especially i f the covalent attack occurs i n a predominantly rigid region o f the ligand such as a ring. A summary o f the method used to generate the geometry about the tetrahedral center for reactive carbonyls is shown i n Figure 2. Starting from a rough approximation o f the tetrahedral state, a proton is added to the carbon perpendicular to the plane defined by the carbonyl carbon and oxygen and another adjacent atom, reducing the carbonyl to an alcohol. A short minimization with the B a t c h M i n program (24) using the all-atom A m b e r force field (25) and a distance-dependent dielectric is used to correct the geometry. The tetrahedral model is then aligned with the catalytic serine, fixing the distance between the serine oxygen and ligand carbon at 1.45 A , the bond angle about the serine oxygen at 109.5°, but allowing the serine C a - C P and C(3-0y bonds to rotate freely during the search. The process is then repeated for the opposite enantiomer as it is impossible to predict a priori which is most active. L i b r a r y Generation. The protocol for directed library design comprises four major steps: database querying, virtual library building, fixed anchor docking, and binding affinity estimation and ranking. It results i n predicted binding modes and affinities for each compound i n the library. Each substitution site i n the central anchor is assigned its o w n reaction. In the first step, the database is screened for compounds that have at least one moiety that can be linked covalently to the central anchor fragment following some pre-specified chemical reactions. For very large databases, additional requirements based on simple molecular properties such as molecular weight can also be imposed. A l l fragments that satisfy the above constraints are linked to the central anchor motif, thus building a virtual library o f compounds. The third step represents the primary filter, whereby the binding mode o f all virtual molecules within the active site is predicted, subject to the constraint that the anchor motif is fixed i n its initial
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
298
Target compound
Crude approximation of tetrahedral intermediate Rl HO H
Bound state
Minimized tetrahedral compound
Figure 2. Generation o f the tetrahedral intermediate structure from a carbonylcontaining ligand.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
299
position and orientation. After the structure o f each compound is predicted, the entire ligand is relaxed within the binding site using a conjugate-gradient minimization, and the final step is binding affinity estimation and ranking o f the compounds.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
Validation Studies Covalent Docking. To be useful for database searches, a structure prediction method for covalently bound inhibitors must discriminate not only between alternate binding modes, but also between the active and inactive stereoisomers. It must be relatively insensitive to small changes i n the internal geometry o f the ligand, so the same structure should result when a ligand is obtained from a high-resolution crystal structure or generated from a molecular modeling program. Five ligand-protein complexes containing covalently bound ligands were selected from the Brookhaven Protein Data Bank ( P D B ) (26): wheat serine carboxypeptidase (lbcr), human leucocyte elastase (leas and leau), and porcine pancreatic elastase (4est and 8est). Although many structures i n the P D B are reported to contain a covalently bound ligand, a surprisingly small number had tetrahedral geometry at the site o f covalent attachment; many ligands were i n a planar geometry. The lack o f a significant number o f structures with the ligand i n a tetrahedral geometry made validation o f our method o f generating the tetrahedral state difficult. W e therefore chose a somewhat generic parameterization o f the geometry at the attack site, with a bond distance o f 1.45 A and bond angle o f 109.5°. The five crystallographic complexes were chosen based on the quality o f the structures and the size o f the inhibitors, which vary i n size from 18 to 38 heavy (nonhydrogen) atoms, and have between six and 13 rotatable bonds. T w o o f the structures ( l b c r and 8est) undergo covalent attack at an aldehyde moiety, while the others have reactive diflouro- or triflouro-methyl ketone moieties. For each system, hydrogen atoms were added to the receptor and non-buried water molecules were removed. A total o f three sets o f 100 docking simulations were performed for each complex, one using the tetrahedral geometry given i n the crystal structure, and one for each o f two stereoisomers o f a computationally generated ligand conformation. For the latter, the ligands were constructed i n a molecular modeling program, minimized to convergence i n the A m b e r force field (14), and the tetrahedral geometry was generated using the method described above. N o specific knowledge about the crystal structures, such as torsion angles about nonrotatable bonds, was used to generate the modeled ligands. A structure prediction was defined to be successful when the root mean square deviation ( R M S ) o f the heavy atoms i n the predicted structure was within 2.0 A o f the crystallographic structure. For all five complexes, the structure o f the bound inhibitor was predicted correctly; the crystallographic binding mode o f the ligand had the lowest energy (Table I). Furthermore, correct and incorrect stereoisomers could be distinguished, even when the ligand was obtained by molecular modeling. For the 4est complex, an alternate binding mode was essentially isoenergetic with the correct binding mode, but where the phenyl moiety occupied a pocket that had been filled by a crystallographic water i n the original protein structure. When the water was replaced, the success rate increased to 4 1 % .
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
300
Table I: Comparison of Docking Simulations Using Three Different Tetrahedral Geometries at the Covalent Attack Site. Crystal Success Rate*
PDB Entry
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
lbcr leas leau 4est 8est
78% 79% 61% 19% (41%) 99%
b
Model Stereoisomer l Energy Success Rate* 0
46% 49% 58% 68% 88%
d
Model Stereoisomer 2 Energy Success Rate* 0
-56.0 -55.5 -50.2 -56.3 -66.9
12% 41% 30% 31% 90%
-63.5 -75.2 -74.7 -86.2 -75.8
Tercent o f predicted structures within 2.0 A R M S o f the crystallographic structure. Stereoisomer corresponding to the crystal structure. Energy o f the best-scoring docked ligand. Success rate when a crystallographic water was replaced. c
d
Fixed Anchor Docking. W e validated the fixed anchor docking method by performing multiple docking simulations o f five ligand-protein complexes with known crystal structures. The non-covalent complexes were methotrexate bound to dihydrofolate reductase ( D H F R ) (3dfr), Viracept bound to HTV-1 protease (27), biotin bound to streptavidin (lstp), F K 5 0 6 bound to F K B P (28), and a tripeptide inhibitor bound to thermolysin (4tmn). For these simulations, a docking simulation is defined to be successful using the more stringent requirement that the predicted structure is within 1.5 A R M S o f the crystal structure. T w o sets o f docking simulations, each comprising 100 trials o f each o f the above ligands, were performed. In the first set, a fully flexible docking simulation was performed, while i n the second an anchor fragment o f the ligands was fixed i n its crystallographic position (Figure 3). For the flexible docking simulations, the success rate is correlated with the number o f degrees o f freedom o f the ligand, while for fixed anchor docking, the X-ray binding mode is consistently reproduced (Table II). In the case o f F K 5 0 6 , the conformation o f the macrocycle is fixed i n its crystallographic form, rationalizing its anomalously high success rate. A s expected, fixing the binding mode o f the anchor fragment substantially increases the rate o f successful prediction.
Table II: The number of bonds allowed to rotate during the conformational search and the success rate for flexible and fixed anchor docking for five ligandprotein complexes. ^ ^ ^ ^ Protein-ligand
complex
Biotin/Streptavidin Methotrexate/DHFR FK506/FKBP Viracept/HIV-1 Protease 4tmn/Thermolysin
Rotatable bonds 5 1 1 8 13
Flexible
docking Fixed anchor 99% 75% 93% 57% 44%
docking
100% 100% 100% 100% 100%
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
301
(a)
(b) H
**t*T^ , _ *NH
2
00Q
"k^YNH
0*
(c)
v3
_
0
0
OH
ho
x
(d)
P t
(e)
0 OyNH
O-P=OO*
Figure 3. Compounds used for fixed anchor validation are (a) biotin, (b) methotrexate, (c) F K 5 0 6 , (d) viracept, and (e) 4tmn. Atoms fixed during the docking simulation are indicated by stars; for F K 5 0 6 , the entire macrocycle is fixed.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
302
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
Database Screening In principle, the goal o f a computational database search is to accurately rank the entire set o f compounds i n order o f affinity and thereby eliminate the need for experimental screening. T o achieve this goal, the relative binding free energy o f a large number o f disparate compounds must be computed, a task that is beyond the capability o f current methodologies. A s a result, computational database screenings typically focus on the more limited goal o f increasing the likelihood o f finding an active compound i n an experimental screen o f a subset o f the database, as compared to random. A seeded database experiment, where a known inhibitor is added to an otherwise arbitrary database, can be used to assess the effectiveness o f the computational procedure by considering the relative ranking o f the known inhibitor. Covalent Inhibitors. T o test the effectiveness o f finding active covalent inhibitors from a database, the algorithm was used to screen a subset o f the Available Chemicals Directory ( A C D ; M D L Information Systems, San Leandro, C A ) for inhibitors o f porcine pancreatic elastase (4est). This system was chosen because o f the high resolution o f its crystallographic structure (1.6 A), the well-defined binding site, and the existence o f a large inhibitor. Approximately 25,000 A C D compounds were screened, each with between zero and twelve rotatable bonds, twelve and forty heavy atoms, and at least one aromatic ring. Covalent attack was allowed for non-aromatic ketones and esters, with the exception o f protecting groups, such as carbobenzoxy. W h i l e not all o f these ketones are sufficiently activated to form a covalent complex, they can be modified to a more active form, and hence these compounds represent a diverse set o f potentially reactive groups. The known inhibitor was also included as part o f the search. After the structure o f the bound complex was predicted, a number o f screens were imposed to eliminate compounds with characteristics that are unlikely to be found i n ligand-protein complexes (Figure 4). These include large movement o f the serine oxygen upon binding (over 0.5 A), gross close contacts with the binding site (at least one non-hydrogen bonding pair o f heavy atoms with a 30% or greater van der Waals overlap), and a high desolvation penalty. The structures that passed these screens were clustered according to their predicted binding free energies, resulting i n 5,858 unique structures with a favorable binding energy (Figure 5). The desolvation penalty and the predicted binding free energies were computed using a modification and extension (29) o f the L U D I scoring function (30) that includes short-ranged repulsive interactions and a term to describe hydrogen bonding geometry (57). The known inhibitor was ranked i n the top one percent o f all compounds that satisfied the screening criteria, and i n the top one quarter o f one percent o f all compounds i n the initial database. In addition, the correct stereoisomer and binding mode for this compound were selected. A number o f compounds chemically unrelated to the known inhibitor were found, some o f which form favorable hydrogen bonds i n the active site (Figure 6). None o f these compounds have been tested for activity against the enzyme.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
303
25,618 A C D compounds
Structure prediction
Protein structure
21,498 structures with no severe close contacts
48,344 ligand-protein complexes (6,043 attack sites x 2 isomers x 4 predicted structures)
Filter out heavyatom close contacts (30% threshold)
Serine Oyhas moved 0.5 A or less
34,694 structures with reasonable serine Qy position
Binding affinity prediction and desolvation energy screen
16,465 structures without severe desolvation penalties
Extract structure with
5,858 unique compounds with predicted binding energy < 0 kcal/mol
best predicted binding Figure 4. Summary o f covalent inhibitor database search processing and results.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
304
Figure 5. Distribution o f predicted binding affinities for the covalent inhibitors. The relative position o f the tripeptide inhibitor 4est is indicated.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
305
S E R 195
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
(a) GLY193 | ^ N H
^isT^ H
\
VAL216
o
a
— O H
THR41
!
V H
o«
H
o
HIS 57
1.4 S E R 195
GLY193 | ~NH
H
VAL216
—OH THR41
-NH Figure 6. Hydrogen-bond interactions between the active site o f porcine pancreatic elastase and (a) inhibitor 4est, (b) a compound identified from the database search, (c) a second compound identified from the database search. The range o f the hydrogen bonds, indicated by dashed lines, is between 2.5 and 3.4 A.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
306
Figure 6. Hydrogen-bond interactions between the active site o f porcine pancreatic elastase and (a) inhibitor 4est, (b) a compound identified from the database search, (c) a second compound identified from the database search. The range o f the hydrogen bonds, indicated by dashed lines, is between 2.5 and 3.4 A.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
307
Site-Directed Combinatorial Library Design. The protocol was used to design a site-directed combinatorial library for dihydrofolate reductase ( D H F R ) (32), a popular test case for computational methods for structure prediction and binding affinity estimation. Methotrexate is an unusually high-affinity inhibitor o f D H F R and as such is a good inhibitor for validating new methods. While identification o f methotrexate is a necessary condition for a useful computational screening protocol, it does not guarantee success i n all cases, since lead compounds are not always highly potent. Methotrexate contains a pteridine ring whose structure is predicted consistently within 0.5 A o f its crystallographic binding mode. This high consensus suggests that the pteridine is essential for molecular recognition and can be considered an anchor motif for D H F R . Given this anchor fragment, direct alkylation was chosen for the attachments o f substituents to the pteridine ring. The A C D was restricted to compounds with molecular weight less than 250 and which contain an amine group with at least one hydrogen and one heavy atom neighbor that is part o f an aromatic group. F r o m the 7,256 compounds obtained, a virtual library o f 7,677 compounds was generated, all o f which contain the pteridine motif. The number o f compounds i n the library is larger than the number o f compounds retrieved from the database because some compounds had multiple sites where the reaction could take place. A series o f ten docking simulations was performed for each compound, keeping the pteridine ring fixed within the D H F R active site. A l l structures with 2 5 % or greater heavy atom overlaps with the active site atoms were discarded, and the binding affinity o f the remaining molecules was estimated. A l l the compounds with a large predicted desolvation penalty were removed. O f the 7,677 compounds i n the virtual library, only 516 satisfied all the screening criteria, 7% o f the original library. A s anticipated, methotrexate is predicted to have the best binding energy, near -12 kcal/mol (Figure 7). In addition to methotrexate, a number o f other compounds were generated that form good hydrogen bond interactions within the active site (Figure 8). Conclusion Reducing the dimensionality o f the conformational space i n ligand-protein structure prediction reduces the computational complexity o f the problem and can lead to significantly improved prediction. Often, restricted mobility can only be achieved b y making strong approximations regarding the nature o f the binding process. There are however, at least two special cases where the dimensionality is reduced without need for such assumptions. In ligand-protein complexes with a covalent bond, the restriction is imposed by the requirement that the chemical bond be formed between specific ligand and protein atoms. W e have demonstrated that for such systems, the crystal structure o f ligand-protein complexes can be reproduced with high fidelity, and that a known inhibitor o f the porcine pancreatic serine protease elastase can be identified from a database o f random compounds that contain reactive functional groups. Furthermore, methotrexate is identified as a potent inhibitor o f D H F R when seeded i n a database o f amines that can react with the pteridine central core. This method can reduce the size o f combinatorial libraries, and, i n conjunction with more sophisticated selection o f reactant molecules, can aid i n the generation o f site-directed libraries (33).
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
308
Figure 7. The binding affinity distribution for the virtual library o f compounds generated using the pteridine core i n D H F R .
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
309
Figure 8. Hydrogen-bond interactions between the active site o f D H F R and (a) methotrexate, (b) a compound identified from the database search, (c) a second compound identified from the database search. The range o f the hydrogen bonds, indicated by dashed lines, is between 2.5 and 3.4 A.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
310
Acknowledgments W e thank our colleagues S. Arthurs, A . Colson, D r . S. Freer, V . Larson, and L . Schaffer for significant contributions to the software described i n this work, D r . Peter Rose for providing the empirical binding affinity method, and D r . Gennady Verkhivker for useful discussions about molecular recognition.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
Literature Cited
1. Kuntz, I. D.; Meng, E. C; Shoichet, B. K. Accts. Chem. Res. 1994, 27, 117-123. 2. Jones, G.; Willett, P., Glen, R. C.; Leach, A. R.; Taylor, R. J. Mol. Biol. 1997, 267, 727-748. 3. Kuntz, I. D.; Makino, S. J. Comp. Chem, 1997, 18, 1812-1825. 4. Desmet, J.; Wilson, I. A.; Joniau, M . ; DeMaeyer, M . ; Lasters, I. FASEB J. 1997, 11, 164-172. 5. Welch, W.; Ruppert, J.; Jain, A. N . Chem. Biol. 1996, 3, 449-463. 6. Rarey, M . ; Kramer, B.; Lengauer, T.; Klebe, G. J.Mol.Biol.1996, 261, 470-489. 7. Gehlhaar, D. K.; Verkhivker, G. M . ; Rejto, P. A.; Sherman, C. J.; Fogel, D. B.; Fogel, L. J.; Freer, S. T. Chem.Biol.1995, 2, 317-324. 8. Chen, P.; Tsuge, H.; Almassy, R. J.; Gribskov, C. L.; Katoh, S.; Vanderpool, D. L.; Margosiak, S. A.; Pinko, C.; Matthews, D. A.; Kan, C.-C. Cell 1996, 86, 835-843. 9. Love, R. A.; Parge, H. E.; Wickersham, J. A.; Hostomsky, Z.; Habuka, N.; Moomaw, E. W.; Adachi, T.; Hostomska, Z. Cell 1996, 86, 331-342. 10. Kim, J. L. Morgenstern, K . A.; Lin, C.; Fox, T.; Dwyer, M . D.; Landro, J. A.; Chambers, S. P.; Markland, W.; Lepre, C. A.; O'Malley, E. T.; Harbeson, S. l.; Rice, C. M . ; Murcko, M . A.; Caron, P. R.; Thomson, J. A. Cell 1996, 87, 343-355. 11. Babine, R. E.; Bender, S. L. Chem. Rev. 1997, 97, 1359-1472. 12. Wlodawer, A.; Erickson, J. W. Annu. Rev. Biochem. 1993, 62, 543-585. 13. Appelt, K.; Bacquet, R. J.; Bartlett, C. A.; Booth, C. L.; Freer, S. T.; Fuhry, M . A. M . ; Gehring, M . R.; Herrmann, S. M . ; Howland, E. F.; Janson, C. A.; Jones, T. R.; Kan, C.; Kathardekar, V.; Lewis, K. K.; Marzoni, G. P.; Matthews, D. A.; Mohr, C.; Moomaw, E. W.; Morse, C. A.; Oatley, S. J.; Ogden, R. C.; Reddy, M . R.; Reich, S. H.; Schoettlin, W. S.; Smith, W. W.; Varney, M. D.; Villafranca, J. E.; Ward, R. W.; Webber, S.; Webber, S. E.; Welsh, K. M . ; White, J. J. Med. Chem. 1991, 34, 19251934. 14. LaFemina, R. L.; Bakshi, K.; Long, W. J.; Pramanik, B.; Veloski, C. A.; Wolanski, B. S.; Marcy, A.; Hazuda, D. J. J. Virology 1996, 70, 4819-4824. 15. Thompson, L.A.; Ellman, J.A. Chem. Rev. 1996, 96, 555-600. 16. Gordon, E. M . ; Barrett, R. W.; Dower, W. J.; Fodor, S. P. A.; Gollop, M . A . J. Med. Chem. 1994, 37, 1385-1401. 17. Yu, H.; Chen, J. K.; Feng, S.; Dalgarno, D. C.; Brauer, A. W.; Schreiber, S. L. Cell 1994, 76, 933-945. 18. Wiley, R. A.; Rich, D. H. Med. Res. Rev. 1993, 13, 327-384.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.
Downloaded by UNIV OF NORTH CAROLINA on July 18, 2013 | http://pubs.acs.org Publication Date: July 7, 1999 | doi: 10.1021/bk-1999-0719.ch019
311
19. Kick, E. K.; Roe, D. C.; Skillman, A. G.; Liu, G.; Ewing, T. J.; Sun, Y.; Kuntz, I.D.; Ellman, J .A. Chem. Biol. 1997, 4, 297-307. 20. Salemme, F. R.; Spurlino, J.; Bone, R. Structure 1997, 4, 319-324. 21. Mayo, S. L.; Olafson, B. D.; Goddard, W. A. J. Phys. Chem. 1990, 94, 8897-8909. 22. Fogel, D.B. Evolutionary Computation: Towards a New Philosophy of Machine Intelligence. IEEE Press: New York, NY, 1995; 75-155. 23. Hall, S. R.; Allen, F. H.; Brown, I. D. Acta Cryst. 1991, A47, 655-685. 24. Mohamadi, F.; Richards, N . G. J.; Guida, W. C.; Kiskamp, M.; Caufield, C.; Chang, G.; Hendrickson, T.; Still, W. C. J. Comput. Chem. 1990, 11, 440-467. 25. Weiner, S. J.; Kollman, P. A.; Case, D. A.; Singh, U . C.; Ghio, C.; Algona, G.; Profeta, S. Jr.; Weiner, P. J. J. Am. Chem. Soc. 1984, 106, 765-784. 26. Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer, E. F. Jr.; Brice, M . D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M . J. Mol. Biol. 1977, 112, 535-542. 27. Appelt, K.; Perspect. Drug Discovery Des. 1993, 1, 23-48. 28. Van Duyne, G. D.; Standaert, R. F.; Karplus, P. A ; Schreiber, S. L.; Clardy, J. J. Mol. Biol. 1993, 229, 105-124. 29. Rose, P. W. In Computer-Assisted Molecular Design Course, UCSF; San Francisco, CA, 1997; Chapter 12. 30. Bohm, H. J. J. Comput. Aided Mol. Des. 1994, 8, 243-256. 31. McDonald, I. K.; Thornton, J. M . J. Mol. Biol. 1994, 238, 777-793. 32. Bolin, J. T.; Filman, D. J.; Matthews, D. A.; Hamlin, R. C.; Kraut, J. J. Biol. Chem. 1982, 257, 13650-13662. 33. Polinsky, A.; Feinstein, R. D.; Kuki, A. In Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery; Chaiken, I. M . , Janda, K . D., Eds.; Conference Proceedings Series; American Chemical Society: Washington, D.C. 1996; Chapter 20, 219-232.
In Rational Drug Design; Parrill, A., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1999.