Three-Dimensional Protein Models: Insights into Structure, Function

Brigitte Turmel , Jean-Paul Daris , Jacques Banville , Philippe Lapointe , Carl Ouellet , Pierre Dextraze , Marcel Menard , John J.K Wright , Juli...
0 downloads 0 Views 3MB Size
Bioconjugate

Chemistry MAY / JUNE 1994 Volume 5, Number 3 0 Copyright 1994 by the American Chemical Society

REVIEWS Three-Dimensional Protein Models: Insights into Structure, Function, and Molecular Interactions Jurgen Bajorath’st and Alejandro Aruffo+,* Bristol-Myers Squibb, Pharmaceutical Research Institute, 3005 First Avenue, Seattle, Washington 98121, and Department of Biological Structure, University of Washington, Seattle, Washington 98195. Received October 19, 1993 Primary structures of proteins are being determined a t a significantly higher rate than are tertiary structures. Only a relatively small number of protein structures of importance to molecular biologists and medicinal chemists are presently available in experimentally determined form. This is one of the reasons for the increasing interest in theoretical modeling of protein structures (1-4). Computer workstations, high-resolution computer graphic displays, and computational methods have become more readily available and are now widely used in biological and chemical research environments. Less than a decade ago, access to such instrumentation and methodology was more or less limited to a small number of groups of computational chemists and protein crystallographers. Given the computational resources that are currently available and the accessibility of protein sequences and three-dimensional structure data bases, computer modeling of proteins has become a fast and approachable technique for scientists who wish to use protein structure data for experimental design. How reliable is this discipline? How can structural models be assessed, analyzed, and used for the design and analysis of experiments? In this paper, we will discuss protein modeling methodology and, as a practical example, review recent work on P-selectin. PROTEIN MODELING STRATEGIES Protein modeling approaches can be divided into two classes. These are, first, methods that attempt the de nouo prediction of protein structures from sequence data

* To whom correspondence should be addressed.

+ Bristol-Myers Squibb.

* University of Washington. lQ43-10Q2l94/29Q5-0173$Q4.5QfQ

( 5 , 6 )and, second, methods that try to identify and utilize experimentally determined structures as templates for model building (7,8). This latter structure-based approach is often called homology modeling or comparative modeling. Methods which attempt to predict the structure of a protein de novo (i.e., without the use of an experimental structure as template) often start from secondary structure predictions based on a single sequence or multiple sequence alignments of homologous proteins (6,9,10)which (on a per residue basis) have an accuracy of -70% (11, 12). Accurate predictions of the secondary structure elements in proteins have been reported for the CAMP-dependent protein kinase catalytic subunit (231, the Src homology 3 domain (14),and interleukin-2(25). The spatial assembly of predicted secondary structure elements (5, 16) is required if a three-dimensional model is to be derived using this approach, and there is no obvious route to do so.

An alternative route to de novo tertiary structure prediction involves computer simulation of protein folding using lattice models. This approach does not start from secondary structure predictions but from a random coil representation of the protein on a lattice (17, 18). Theoretical models with protein backbone root mean square (rms) deviations of 2-3 A relative to crystallographic models have been derived using this methodology (18).In general, three-dimensional models derived using ab initio prediction methods can be regarded as “lower resolution” models. The objective of de novo predictions, however performed, is often to understand the overall folding of a protein with unknown structure. For example, it may

0 1994 American Chemical Society

174

Bioconjugate Chem., Vol. 5, No. 3, 1994

be possible to understand that a protein has an all-helical structure and belongs to the four helix bundle folding type. In structure-based or comparative modeling the initial goal is to assign protein sequences to families of structurally related proteins for which a t least one structure has been experimentally determined (8, 19). If more than one experimentally known template is available, structurally conserved regions can be identified by structural comparison (19). The structure template to be used for modeling should be an experimental structure or a combination of the structurally conserved regions derived from several family members whose structures are known. Once the conserved protein core, which usually consists of well-ordered secondary structure elements, is defined, nonconserved loop regions are modeled by either incorporating related loops of other crystal structures (20) or employing theoretical methods such as conformational searching (21). Side chain replacements can be carried out using libraries of experimental side chain conformations (22) and combinatorial techniques (23). Finally, computational refinement of the initially built protein model utilizing molecular mechanics or dynamics calculations are employed to optimize the intramolecular contact and stereochemistry of the model. “Established”examples of comparative protein models with implications for inhibitor design include renin ( 2 4 , 2 5 ) and HIV-protease (26,27). Comparative models are in general more accurate than ab initio models provided that there is significant structural similarity between the template(s) and the protein which is modeled. In some cases, experimental accuracy may be approached (28,29). In others, it may not even be possible to attempt comparative model building due to insufficient structural similarity between potential templates and the protein to be modeled. Most importantly, comparative model building is, in contrast to ab initio methods, unable to predict completely novel protein structures. What are the more practical limitations of comparative protein models? The conformations of loop regions in protein models which cannot be assigned to known structural templates as well as the conformations of nonconservatively replaced side chains are usually the least reliable parts of the protein model. The overall quality of a model is dependent on the degree of structural similarity to the template structure and the template’s crystallographic resolution and degree of structural refinement. PROTEIN SEQUENCES A N D STRUCTURAL SIMILARITY

Protein structure types, characterized by defined spatial arrangements of secondary structure elements, are called protein folds. A common fold is often a characteristic feature of a family of proteins such as the trypsin family of the serine proteases. An example of a recently described and previously unobserved protein fold is the structure of the calcium-dependent lectin domain of the mannose binding protein (MBP) which is shown in Figure 1 (top). How many protein folds are known? How many are unknown? According to a recent estimate, the maximum number of protein families may be 1000 (30). Others have estimated that 500-700 protein folds with distinct topology may exist (31). Comparison of structures deposited in the Brookhaven Protein Data Bank (32,33)through April 1992 has revealed the availability of approximately 150 nonhomologous protein folds (34). On the basis of these numbers, it may follow that 10-20% of the threedimensional protein structure spectra are presently known.

Bajorath and Aruffo

However, protein structures represent much more a continuum (34) than a discrete spectrum, and the classification of structures into structure types or folds remains critically dependent on the criteria being applied. In any case, new folding motifs such as the recently described 0-helix of pectate lyase C (35) are continuously being elucidated. It is evident that the identification and assessment of structural similarities between proteins play a key role in protein modeling attempts that aim a t generating detailed protein models. How can three-dimensional structural similarity be detected? The identification of structures similar to the protein to be modeled is straightforward if significant sequence similarity (40 % or more) between the protein with unknown structure and the protein with known structure can be detected in sequence similarity searches and alignments. Significant sequence similarity directly correlates to structural similarity: The higher the sequence similarity, the more similar the three-dimensional structures (36). Template structures can be selected based on sequence similarity. Early examples of protein families which have been the subject of comparative model building, such as antibodies or serine proteases (19),usually show significant sequence similarity. It is generally true that three-dimensional protein structure is significantly more conserved than sequence (36). In other words, sequences that fold into defined folding motifs, such as the immunoglobulin fold, may show great variability in their amino acid sequences. This means that sequence similarities of 20% or less may still indicate significant structural similarity. This has, for example, been shown for the heat shock protein fragment HSC70 and actin whose structures are very similar in spite of having less than 15% sequence identity (37). The threedimensional structures of these proteins are also similar to hexokinase (38). At the same time, sequence similarities of approximately 20% may also be found in rather distantly related structures having major differences in their topologies. The presence of such similarities may therefore not be sufficient for the selections of structural templates and detailed comparative modeling. The tenuous relationship between structures having low sequence similarity and related tertiary structure requires that more detailed studies be performed to evaluate the degree of structural similarity. In the case of moderate to low sequence similarity of approximately 25% or less, alignments of multiple sequences based on available three-dimensional structure(s) ( 3 9 ) ,often called structure-based sequence alignments or sequence-structure alignments, are considerably more informative than sequence alignments alone. Threedimensional constraints can now be taken into account, and the conservation of key residues important for a particular protein fold may be detected or excluded. Tolerated sequence variance a t given spatial positions can be assessed. Using this approach, it was found that 60% of the residues in two variant surface glycoproteins of Trypanosoma brucei are structurally equivalent despite the fact that there is only 16 5% sequence identity between these two molecules (40). Sequence-structure alignments have also allowed the generation of a detailed model of the CD40 ligand based on the structure of tumor necrosis factor (41). In cases of very low or virtually no sequence similarity, structural similarity sufficient for the meaningful selection of template structures may still be detected using the socalled “inverse folding” methodology. This method evaluates the compatibility of protein sequences with a given

Reviews

Bioconjugate Chem., Vol. 5, No. 3, 1994

175

c

Figure 1. (Top) stereo representation of the fold of lectin domain of the mannose binding protein (MBP) from rat (78).MBP was the first calcium-dependent (C-type) mammalian lectin domain whose structure was solved experimentally. The structure revealed a previously unobserved protein fold which is shown here in solid ribbon representation. The a helices in MBP are colored in white, and the B strands are colored in green. The view is along a helix 2 (lower left corner). As can be seen, one of the striking features of this protein fold is its unusually high content of non-a helical or p strand secondary structure. Prominent loop regions can be seen, for example, in the upper part of the structure, colored in red. The fold of this C-type lectin domain represents a structural prototype for the protein family of mammalian C-type lectins which includes the ligand binding domains of the selectins. The N- and C-termini of the lectin domain are in close proximity (at the bottom of the picture). This spatial arrangement of the termini explains why C-type lectin domains can, as independent folding units or modules, be part of many multidomain cell surface proteins. (Bottom) superposition of the MBP crystal structure (red) and the model structure of the P-selectin ligand binding domain (blue). The view of the stereo comparison is according to Figure 1 (top). The position of a calcium ion whose coordination sphere is conserved in MBP and in the selectins is depicted as a purple ball, and the calcium binding site special to MBP and predicted to be not present in the selectins is shown in lavender. Major structural differences between MBP and P-selectin occur in loop regions. In contrast, the core regions are conserved in these proteins. In a way, this represents a result typical of comparative model building. Alignments of the sequences of the selectins and MBP based on the MBP crystal structure (sequence-structure alignments) suggested that P-selectin displays, despite the relatively low sequence similarity of approximately 25 % ,the same overall and previously unobserved fold as MBP. Figures 1 (top and bottom) may be viewed with stereo glasses to obtain the three-dimensional effect.

three-dimensional fold (42) by analyzing the residue environments in a three-dimensional structure (43,44) or by threading sequences onto folds (45-48), followed by analysis of pairwise residues interaction energies (49).This technique does not depend on the initial presence of any sequence similarity. It is possible to screen protein sequences against a database of three-dimensional structures and vice versa to detect, for example, the structural similarities between HSC70,actin, and hexokinase, which as mentioned above, are structurally related.

No matter how structural similarities between proteins with known and unknown structure are identified, the assessment of the degree of similarity and the identification of dissimilar regions in protein structures represents a very important stage of the model-building process. A critical analysis of these aspects largely determines whether a meaningful model can be generated and where the limitations in the use of the model lie. Using inverse folding techniques, the compatibility of each residue with its environment in a given three-dimensional model can be

176

Bloconjugate Chem., Vol. 5, No. 3, 1994

calculated ( 5 0 ) . The global incompatibility of sequence and structure and some local inconsistencies such as, for example, a buried charged residue, can be detected using these methods. This analysis also provides further means to assess the confidence level of a structural model. PROTEIN MODELS TO AID DRUG DESIGN

Advances in macromolecular structure determination by crystallography, NMR, and computational chemistry have made structure-based drug design one of the current focal points of pharmaceutical research (51,52). A number of three-dimensional structures are now available which are of interest as targets for drug design (53). However, successes in structure-based drug design such as in the design and refinement of inhibitors of HIV protease (541, thymidylate synthase (55,56),and influenza virus sialidase (57)are still the exception rather than the rule. Although computational methods for the de novo design of inhibitors based on structural templates have been developed (58, 591, it remains a difficult task to create a chemical lead based on a three-dimensional structure. If no structures of complexes with ligands are available, potential ligand binding or catalytic sites in the protein may have to be localized and characterized by chemical residue modification or mutagenesis. Many parameters such as the desolvation free energy of potential inhibitors critically influence protein-ligand interactions, and the structural details of these interactions are hard to rationalize in the absence of experimentally determined complexes. Once an initial lead is somehow discovered, iterative cycles of complex crystallographic analysis and lead compound modification may be required (55) to design a potent in vitro inhibitor. Protein models are generally of lower accuracy than their crystallographic templates. If model structures are to be useful in the drug design process, a high degree of accuracy is essential as illustrated in studies on the design of renin inhibitors (25). Approximate models may still allow the outline of potential ligand binding sites in a protein but are unlikely to meaningfully aid computational ligand docking studies (60) where steric and chemical complementarity between binding sites in proteins and ligands are crucial criteria for the evaluation of test compounds. A recent study by Cohen and colleagues (61) has shown that comparative protein models of parasitic serine and cysteine proteases can successfully be used in combination with a computational docking and database searching technique (62) to identify inhibitory compounds which bind to the target protein in the micromolar range. The active site geometries of serine and cysteine proteases are known and well described. Therefore, carefully built model structures of the catalytic domain of these enzymes should have a relatively high accuracy. As demonstrated by Cohen and colleagues, such models can, in the absence of crystallographic data, be useful in the design of inhibitors. For the application of structural models in drug design, two aspects seem to be very critical. These are the integration of structural studies into an experimental drug discovery effort (51,521and the development of an understanding of which questions may be answered based on the analysis of structural models and which ones may not. Such decisions have to be made on a case-bycase basis since established procedures are not yet available. In the following text we will discuss a specific example which may illustrate the critical stages, opportunities, and limitations of protein modeling and its implications for drug design. This example is the molecular model of the

Bajorath and Aruffo

amino terminal extracellular domain of P-selectin and its application. The selectin family of cell adhesion molecules includes P-selectin (63, 6 4 ) . This lectin is in part responsible for the initial attachment of leukocytes to activated vascular endothelium (65) which is one of the early events in an inflammatory response. The exact molecular nature of the physiological ligand of P-selectin is presently unknown. In vitro studies have shown that P-selectin is able to bind to Le” (661, sialylated Lex (67691,sulfatide (701,sulfoglucoronyl glycosphingolipids (71), and a 250-kDa glycoprotein expressed by leukocytes (72). Considerable interest has surrounded studies of Eand P-selectin as targets t o inhibit early events in the inflammatory responses such as the attachment of leukocytes to the vascular endothelium. Experimental structures (X-ray and/or NMR) of the selectin ligand binding domains have not yet been reported. We have focused our structure-function studies on P-selectin (73, 74). Experiments by others on E- (75) and P-selectin (76) have provided essentially the same conclusions regarding the location of the ligand binding site in the selectins and the identification of residues within the binding site which are critical for binding. Identification of a Structural Template. Insight into the structural features of the lectin-like domain in P-selectin could not be obtained for a considerable time, since it was not possible to relate P-selectin to any known three-dimensional folds including plant lectins (77). This changed when the crystal structure of the lectin domain of a rat MBP, the first experimentally determined structure of a C-type lectin, became available (78). The sequence identity between the rat MBP and the selectin lectin domains is 25 7%. Alignments of the selectin and MBP sequences relative to the crystallographic structure (74, 78) showed that residues in the hydrophobic core regions, the disulfide bonds, and the residues of at least one of the calcium binding sites in MBP are conserved in the selectins (78). This analysis strongly suggested the close similarity of the MBP and selectin structures. Figure 1(top) outlines the previously unobserved fold of MBP which would have been hard, if not impossible, to predict from its amino acid sequence. Model Building. The starting point for modeling P-selectin ligand binding domain was the MBP structures solved at 2.5-A resolution. T h e atomic coordinates for this structure were obtained from the prerelease section of the Brookhaven Protein Data Bank (ref 33, entry “1MSB”). The conserved core region in MBP and P-selectin, including the disulfide bonds, and one fully conserved calcium coordination sphere provided the basis for the modeling of P-selectin. The conformation of loop regions in P-selectin which could not be modeled from known crystallographic structures were approximated by conformational search calculations. Amino acid replacements were carried out via computer graphics as similar as possible to the original conformation or, alternatively, in low-energy rotamer conformations (22). The initial model was refined using energy minimization calculations with harmonic constraints applied to the protein backbone and with the conserved calcium coordination sphere held fixed in space. Figure 1 (bottom) shows a superposition of the P-selectin model on the MBP structure. This Figure shows that amino acid insertions and deletions occur in surface loops but not in the core regions of the proteins which display a conserved spatial arrangement. Model Assessment. Three-dimensional-profile analysis of the P-selectin model and its sequence relative to the MBP structure and its sequence showed that the sequenceN

N

Reviews

Bioconjugate Chem., Vol. 5, No. 3, 1994

177

Figure 2. Ligand binding site in P-selectin. The P-selectin model structure is colored in silver and shown in the same orientation as in Figure 1. Residues important for ligand binding are shown in space-filling representation and color-coded according to their

importance for binding to the cellular ligand of P-selectin on myeloid cells and for binding to sulfatide. The representation shows that the cellular ligand and sulfatide bind to an overlapping but not identical set of residues in P-selectin. The ligand binding site in P-selectin is located proximal to the conserved calcium position (shown as a red ball) which is functionally important and also thought to be vital for the structural integrity of the ligand binding domain. Residue asparagine 105 (N105), one of the residues of importance for the binding of both ligands, is part of the conserved calcium coordination s here. Lysine 113 (K113) is an important residue for the binding to both ligands. The side chain of this residue is approximately 10 f f r o m the conserved calcium. The ligand binding site in P-selectin was identified based on the analysis of the P-selectin model. This analysis had led to an hypothesis regarding the location of the binding site region. On the basis of these studies, site-specific mutagenesis experiments revealed residues that are critical for binding of P-selectin to its cellular ligand (73). Subsequently, it was shown that the P-selectin binding sites for the cellular ligand and €or sulfatide are overlapping (74). structure compatibility of the P-selectin model is comparable to the sequence-structure compatibility of MBP (73). The global stability of the P-selectin model was further confirmed by extensive molecular dynamics calculations where, after equilibration, correlated structural motions were observed equivalent to those found in simulations of MBP (79). The overall sequence-structure compatibility and stability of the P-selectin model is consistent with the proposed structure similarity between MBP and the selectins. Identification of the Ligand Binding Site. Inspection of the P-selectin model suggested the presence of a shallow groove proximal to the conserved calcium binding site. This region of P-selectin was considered as a possible ligand binding site and subjected to site-specific mutagenesis experiments (73). The binding of wild-type P-selectin and P-selectin mutants to its cellular ligand on HL-60 cells was determined and compared. Mutation of amino acids within the putative ligand binding site of P-selectin resulted in the identification of a number of residues which when mutated significantly reduced or completely abolished binding of P-selectin to HL-60 cells.

In contrast, mutation of other residues on the surface of the P-selectin, distal to the putative binding site, had no effect on the binding of P-selectin to HL-60 cells. The residues which were found to participate in the formation of the P-selectin binding site for its cellular ligand are depicted in Figure 2. Residues Tyr 48, Tyr 94, and Lys 113 were found to be especially important for binding. Any mutation of these residues, for example, the changes of Tyr 48 and 94 to Phe, removing a single hydrogen bonding donor/acceptor moiety, abolished binding completely. In the next step, binding of sulfatide to P-selectin was examined using the panel of P-selectin mutants which mapped the cellular ligand binding site of P-selectin. The results of these experiments are shown in Figure 2. Sulfatide was found to bind to the same site in P-selectin as the cellular ligand, utilizing an overlapping but not identical set of residues (74). Lys 113 is critical for binding of P-selectin to both the cellular ligand and to sulfatide. In contrast, Tyr 48 and Tyr 94, crucial for binding to myeloid cells, are not critical for binding to sulfatide. Lys 111, which was not critical for binding of P-selectin to the

178 Bioconjugate Chem., Vol. 5, No. 3, 1994

Bajorath and Aruffo

L

Figure 3. Schematic model of sialylated Lewis‘ (sLeX) binding to the P-selectin ligand binding site. P-selectin is colored in blue and shown in a closeup side view. This orientation is obtained from the previous orientation by approximately 90’ orientation around the y-axis. The carbohydrate ligand is shown with small spheres on its atoms and in standard atom coloring (carbon, green; oxygen, red; nitrogen, blue). The proposed “molecular anchors” in P-selectin are shown in space-filling representation and are colored in purple. These are the conserved calcium (on the left side of the picture) and residue lysine 113 (on the right side). These residues are spaced approximately 10 8, apart. We propose that the calcium interacts with the fucose moiety in sialylated Lewis‘ and that lysine 113 interacts with the negatively charged sialic acid moiety in sialylated Lewis‘. The interaction between the calcium in P-selectin and the fucose was inferred from crystallographic data of MBP in complexwith oligomannose(82).The interaction between the negatively charged sialic acid moiety and the positively charged lysine 113 was proposed based on mutagenesis experiments and based on the spatial separation of these potential molecular recognition sites. The use and the limitations of such schematic models are described in the text.

cellular ligand, is equally as important as Lys 113 for binding to sulfatide. As expected, the binding of both ligands was found to be strictly calcium-dependent (74). In an independent study, the same region in E- and P-selectin was found to be responsible for binding of sLeX glycolipid (75). Recently, it has been reported that sLeX significantly reduced lung injury in a rat model of acute inflammation (80),presumably by blocking selectin function. Taken together, these results suggest that sLeXor related derivatives may provide a starting point for the design of specific selectin inhibitors which could be used in a clinical setting to block early events in inflammation. A Model of sLeXBinding to P-Selectin. Given the information regarding the location of the ligand binding site in P-selectin, can a model of the sLebP-selectin interaction be developed? The accuracy of current model building methodology is insufficient to predict proteincarbohydrate complexes a t the atomic level of detail (81) even if crystallographicstructures of carbohydrate-binding proteins are available as starting points. For example, it is hard, if not impossible, to predict the role of water molecules which play an important role in mediating protein-carbohydrate interactions (81, 82). The great conformational flexibility of many oligosaccharidesmakes

it very difficult to approximate their binding conformations. Therefore, predicted protein-carbohydrate complexes are usually approximate at best. In the case of P-selectin, further information has become available which allows a simple model of P-selectin-sLex interactions to be constructed. Weis et al. (83) have determined the crystal structure of the C-type lectin domain of the rat MBP complexed with oligomannose. This structure revealed a previously unobserved mode of carbohydrate-protein interaction and explained the calcium dependence of carbohydrate binding to C-type mammalian lectins. It shows that a mannose residue directly coordinates with two equatorial hydroxyl groups to the calcium ion by replacing a water molecule in the calcium coordination sphere (83). The fact that this calcium binding site is rigorously conserved in the selectins suggests how a fucose residue, part of the sLeXligand, can bind to the calcium in the selectins. An approximate conformation of sLeXcan be modeled based on an NMR structure of Lex (84). In the modeled conformation, which uses the suggested calcium-fucose interaction as a molecular anchor point, the sLeXcan be readily docked into the ligand binding site of the P-selectin model. The intermolecular interactions are optimized by constrained

Reviews

molecular mechanics calculations. This schematic model is shown in Figure 3. The calcium-fucose interaction limits the possible orientations of the ligand in the binding site. A striking feature of this model complex is that the negatively charged group of the sialic acid moiety in sLeX is in a suitable position for ionic interactions with residue Lys 113 in P-selectin which, as discussed earlier, has been found to be essential for the binding of the selectin ligands. This schematic model allows us to draw several conclusions about the interaction of P-selectin with one of its carbohydrate ligands. The mode of carbohydrate binding by P-selectin is different from previously reported carbohydrate-protein interactions (83, 84). The complex must be stabilized a t the surface of P-selectin rather than in a cavity or groove, and likely important molecular recognition sites such as the conserved calcium and Lys 113 are -10 A apart. These predictions are consistent with results obtained in studies of the binding of E- and P-selectin to sLeXand its derivatives (75, 76, 85). Interestingly, only functional groups on one side of the ligand are involved in important contacts with the protein while the other side of the ligand remains completely exposed to solvent. In the absence of more detailed structural information, this limited model of selectin-carbohydrate binding can be applied to support inhibitor design. To take advantage of this model for the design of novel selectin ligands it should be understood that the focus is the functional groups most likely to interact with the P-selectin binding site. The spatial separation of residues important for binding and their proximity to functional groups in the ligand should be taken into account in the design of selectin inhibitors. In addition, compounds which may specifically interact with additional residues proximal to the binding site, taking advantage of the chemical nature of the residue side chain, should show enhanced binding affinity. Binding studies with compounds generated using these criteria may lead to the preparation of new inhibitory compounds with predictable binding characteristics. CONCLUSIONS

There is an ever-increasing interest in the prediction of three-dimensional protein structures and in the application of such protein models for the design of novel biologically active ligands. Fast computers and sophisticated computer graphic tools do not p e r se increase the quality of models. Currently, the most reliable models are being generated by comparative rather than ab initio modeling techniques. Comparative model building requires extensive analyses of structural relations and similarities to known structures. Methods which allow structural comparison of proteins with moderate to low sequence similarities significantly enhance the ability to identify structural templates for comparative model building. Methods which assess the sequence-structure compatibility of structural models are essential to assign a global confidence level to these models. The meaningful use of protein models for protein engineering and, even more so, drug design studies requires a high degree of model accuracy. Although some successful studies have now been reported, the use of model structures for drug design applications requires many approximations and is still in its infancy. Studies on the ligand binding domain of P-selectin have shown that a protein model can successfully be used for the design of mutagenesis experiments and that ideas for inhibitor design can be developed by rather simple and approximate model building of receptor-ligand complexes. This example also demonstrates how crucial the identification of a meaningful

Bioconjugate Chem., Vol. 5, No. 3, 1994 179

structural template can be for the success of a protein modeling and engineering project. ACKNOWLEDGMENT The authors are grateful to Peter Senter for critical review of the manuscript and many helpful suggestions and Debby Baxter for help in the preparation of this manuscript. NOTE ADDED IN PROOF Recently, two publications have appeared which should be mentioned. The crystallographic structure of E-selectin (Graves e t al. (1994) Nature 367,532-538) has confirmed the proposed structural similarity of the C-type lectin domain of the mannose binding protein and the selectins. The results of the mutagenesis experiments on E-selectin by Graves et al. are consistent with the P-selectin binding site analysis. Furthermore, an instructive example of structure-based drug design has been reported by Lam e t al. ((1994) Science 236,380-384). These researchers have designed novel inhibitors of HIV protease which include a mimic of a structural water molecule found in previously reported crystal structures of HIV protease-inhibitor complexes. LITERATURE CITED (1) Fetrow, J. S., and Bryant, S. H. (1993) New programs for

protein structure prediction. Biotechnology 11, 479-484. (2) Thornton, J. M. (1990) Tackling a loopy problem. Nature 343, 411-412. (3) Thornton, J. M., Flores, T. P., Jones, D. T., and Swindells, M. B. (1991) Prediction of progress at last. Nature 354,105106. (4) Bajorath, J., Stenkamp, R., and Aruffo, A. (1993)Knowledgebased model building of proteins: Concepts and Examples. Protein Science 2, 1798-1810. (5) Cohen, F. E., Richmond, T. J., and Richards, F. M. (1979) Protein folding: Evaluation of simple rules for the assembly of helices into tertiary structure with myoglobin as an example. J . Mol. Biol. 132, 275-288. (6) Benner, S. A. (1992) Predicting de novo the folded structure of proteins. Curr. Opin. Struct. Biol. 2, 402-412. (7) Blundell, T. L., Sibanda, B. L., Sternberg, M. J., and Thornton, J. M. (1987)Knowledge-basedprediction of protein structures and the design of novel molecules. Nature 326,347352. (8) Greer, J. (1990)Comparative modeling methods: applications to the family of the mammalian serine proteases. Proteins 7, 317-334. (9) Gamier, J., Osguthorpe, D. J., and Robson, B. (1978)Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J . Mol. Biol. 120, 97-120. (10) Bazan, J. F. (1990)Structural design and molecular evolution of acytokine receptor family. Proc. Natl. Acad. Sci. USA. 87, 6934-6938. (11) Rost, B., Schneider, R., and Sander, C. (1993) Progress in protein structure prediction? Trends Biochem. Sci. 18, 120123. (12) Rost, B., and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232,584-599. (13) Benner, S. A., and Gerloff, D. (1991) Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: the catalytic domain of protein kinases. Adu. Enz. Regul. 31, 121-181. (14) Benner,S.A., Cohen,M.A.,andGerloff,D. (1993)Predicted secondary structure for the Src homology 3 domain. J. Mol. Biol. 229, 295-305. (15) Bazan, J. F. (1992)Unraveling the structure of IL-2. Science 257,410-412.

180

Bajorath and Aruffo

Bioconjugate Chem., Vol. 5, No. 3, 1994

(16) Cohen, F. E., Sternberg,M. J. E., and Taylor, W. R. (1982)

The analysis and prediction of the tertiary structureof globular proteins involving the packing of alpha helices against beta sheets: A combinatorial approach. J . Mol. Biol. 156,821-862. (17) Skolnick, J., and Kolinsky, A. (1989) Computer simulations of globular protein folding and tertiary structure. Ann. Rev. Phys. Chem. 40, 207-235. (18) Skolnick, J., Kolinski, A., Brooks, C. L., 111,Godzik, A., and Rey, A. (1993) A method for predicting protein structure from sequence. Curr. Biol. 3, 414-423. (19) Greer, J. (1991) Comparative modeling of homologous proteins. Methods Enzymol. 202, 239-252. (20) Jones, T. A,, and Thirup, S.(1986) Using known substructures in protein model building and crystallography. EMBO J . 5, 819-822. (21) Bruccoleri, R. E., and Karplus, M. (1987) Prediction of the

folding of short polypeptide segments by uniform conformational sampling. Biopolymers 26, 137-168. (22) Ponder, J. W.,andRichards,F. M. (1987) Tertiarytemplates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J. Mol. Biol. 193, 775-791. (23) Bajorath, J., and Fine, R. M. (1992) On the use of

minimization from many randomly generated loop structures in modeling antibody combining sites. ImmunoMethods 1, 137-146. (24) Blundell, T. L., Sibanda, B. L., and Pearl, L. (1983) Three-

dimensional structure, specificity and catalytic mechanism of renin. Nature 304, 273-275. (25) Hutchins, C., and Greer, J. (1991) Comparative modeling of proteins in the design of novel renin inhibitors. Crit. Rev. Biochem. Mol. Biol. 26, 77-127. (26) Pearl, L. H., and Taylor, W. R. (1987) A structural model for the retroviral proteases. Nature 329, 351-354. (27) Weber, I. T., Miller, M., Jaskolski, M., Leis, J., Skalka, A. M., and Wlodawer, A. (1989) Molecular modeling of the HIV-1 protease and its substrate binding site. Science 243,928-931. (28) Weber, I. T. (1990) Evaluation of homology modeling of HIV protease. Proteins 7, 172-184. (29) Chothia, C., Lesk, A. M.,Tramontano, A., Levitt, M., Smith-

Gill, S.J., Air, G., Sheriff, s.,Padlan, E. A., Davies, D., Tulip, W. R., et al. (1989) Conformations of immunoglobulin hypervariable regions. Nature 342, 877-883. (30) Chothia, C. (1992) One thousand families for the molecular biologist. Nature 357, 543-544. (31) Blundell, T. L., and Johnson, M. S. (1993) Catching a common fold. Protein Science 2, 877-883. (32) Orengo, C. A,, Brown, N. P., and Taylor, W. R. (1992). Fast structure alignment for protein databank searching. Proteins 14, 139-167. (33) Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,

E. F.,Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977) Protein Data Bank: a computerbased archival file for macromolecular structures.J. Mol. Biol. 112, 535-542. (34) Orengo, C. A., Flores, T. P., Taylor, W. R., and Thornton, J. M. (1993) Identification and classification of protein fold families. Protein Eng. 6, 485-500. (35) Yoder, M. D., Keen, N. T., and Jurnak, F. (1993) New domain

(40) Blum, M. L., Down, J. A,, Gurnett, A. M., Carrington, M., Turner, M. J., and Wiley, D. C. (1993) A structural motif in the variant surface glycoproteins of Trypanosoma brucei. Nature 362, 603-609. (41) Aruffo, A., Farrington, M., Hollenbaugh, D., Li, X., Mila-

tovich, A., Nonoyama, S., Bajorath, J., Grosmaire, L. S., Stenkamp, R., Neubauer, M., et al. (1993) The CD40 ligand, gp39, is defective in activated T cells from patients with X-linked hyper-IgM syndrome. Cell 72, 1-20. (42) Wodak, S. J., and Rooman, M. J. (1993) Generating and testing protein folds. Curr. Opin. Struct. Biol. 3, 247-259. (43) Bowie, J. U., and Eisenberg, D. (1993) Inverted protein structure prediction. Curr. Opin. Struct. Biol. 3, 437-444. (44) Overrington, J., Donelly, D., Johnson, M. S., Sali, A,, and Blundell, T. (1992) Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sei. I , 216-226. (45) Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 35486-89. (46) Godzik, A., and Skolnick, J. (1992) Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. Proc. Natl. Acad. Sci. U.S.A. 89, 12098-12102. (47) Bryant, S. H., and Lawrence, C. E. (1993) An empirical

energy function for threading protein sequence through the folding motif. Proteins 16, 92-112. (48) Sippl, M. J., and Weitckus, S.(1992) Detection of nativelike models for amino acid sequences of unknown threedimensional structure in a database of known protein conformations. Proteins 13, 258-271. (49) Sippl,M. J. (1990) Calculation of conformational ensembles from potentials of mean force. An approach to the knowledgebased prediction of local structures in globular proteins. J . Mol. Biol. 216, 859-883. (50) Luthy,R., Bowie,J. U.,and Eisenberg, D. (1992) Assessment of protein models with three-dimensional profiles. Nature 356, 83-85. (51) Kuntz, I. D. (1992) Structure-based strategies for drug design and discovery. Science 257, 1078-1082. (52) Navia, M. A., and Murcko, M. A. (1992) Use of structural information in drug design. Curr. Opin. Struct. Biol. 2, 202210. (53) Walkinshaw, M. D. (1992) Protein targets for structurebased drug design. Med. Res. Rev. 12, 317-372. (54) Erickson, J., Neidhart, D. J., VanDrie, J., Kempf, D. J.,

Wang, X. C., Norbeck, D. W., Plattner, J. J., Rittenhouse, J. W., Turon, M., Wideburg, N., et al. (1990) Design, activity, and 2.8 A crystal structure of a Cz symmetric inhibitor complexed to HIV-1 protease. Science 249, 527-532. (55) Appelt, K., Bacquet, R. J., Bartlett, C. A., Booth, C. L., Freer, S.T., Fuhry, M. A,, Gehring, M. R., Herrman, S.M., Howland, E. F., Janson, C. A., et al. (1991) Design of enzyme inhibitors using iterative protein crystallographic analysis. J . Med. Chem. 34, 1925-1934. (56) Shoichet, B. K., Stroud, R. M., Santi, D. V., Kuntz, I. D., and Perry, K. M. (1993) Structure-based discovery of inhibitors of thymidylate synthase. Science 259, 1445-1450. (57) von Itzstein, M., Wu, W.-Y., Kok, G. B., Pegg, M. S.,Dyason,

motif The structure of pectate lyase C, a secreted plant virulence factor. Science 260, 1503-1506. (36) Chothia, C., and Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823-826. (37) Flaherty, K. M., McKay, D. B., Kabsch, W., and Holmes, K. C. (1991) Similarity of the three-dimensional structures of actin and the ATPase fragment of a 70-kDa heat shock cognate protein. Proc. Natl. Acad. Sci. U.S.A. 88, 5041-5045. (38) Brhden, C.-I.(1990)Founding fathers and families. Nature

J. C., Jin, B., Phan, T. V., Smythe, M. L., White, H. F., Oliver, S.W., et al. (1993) Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature 363,418-423. (58) Bohm, H. J. (1992) The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J . Comput.-Aided Mol. Des. 6, 61-78. (59) Rotstein, S. H., and Murcko, M. A. (1993) GroupBuild: A fragment-based method for de novo drug design. J . Med. Chem. 36, 1700-1710. (60) Burt, S. K., Hutchins, C. W., and Greer, J. (1991) Predicting receptor-ligand interactions. Curr. Opin. Struct. Biol. I, 213-

346, 607-608. (39) Cygler, M., Schrag,J. D., Sussman, J. L., Harel, M., Silman, I., Gentry, A. K., and Doctor, B. P. (1993) Relationship between

218. (61) Ring, C. S., Sun,E., McKerrow, J. H.,Lee, G. K.,Rosenthal, P. J., Kuntz, I. D., and Cohen, F. E. (1993) Structure-based

sequence conservation and three-dimensional structure in a large family of esterases,lipases, and related proteins. Protein

inhibitor design by using protein models for the development of antiparasitic agents. Proc. Natl. Acad. Sei. U.S.A.90,3583-

Sei. 2, 366-382.

3587.

Reviews

(62) Kuntz, I. D., Blaney, J. M., Oatley, S. J., Landgride, R., and Ferrin, T. E. (1982)A geometric approach to macromoleculeligand interactions. J . Mol. Biol. 161, 269-288. (63) Springer, T. A. (1990) Adhesion receptors of the immune system. Nature 346,425-434. (64) Lasky, L. A. (1992) Selectins: interpreters of cell-specific carbohydrate information during inflammation. Science 258, 964-969. (65) Lawrence,M. B., and Springer, T. A. (1991)Leukocytes roll on a selectin at physiologic flow rates: distinction from and prerequisite for adhesion through integrins. Cell 65,859-873. (66) Larsen, E., Palabrica, T., Sajer, S., Gilbert, G. E., Wagner, D. D., Furie, B. C., and Furie, B. (1990) PADGEM-Dependent adhesionof platelets to monocytesand neutrophils is mediated by a lineage-specificcarbohydrate, LNF 111 (CD15). Cell 63, 467-474. (67) Foxall, C., Watson, S. R., Dowbenko, D., Fennie, C., Lasky, L. A., Kiso, M., Hasegawa, A., Asa, D., and Brandley, B. K. (1992) The three members of the selectin receptor family recognize a common carbohydrate epitope, the Sialyl Lewisx oligosaccaride. J. Cell. Biol. 117, 895-902. (68) Polley, M. J., Phillips, M. L., Wayner, E., Nudelman, E., Singhal,A. K., Hakomori, S.-I., and Paulson, J. C. (1991)CD62 and endothelial cell-leukocyte adhesion molecule 1 (ELAM1)recognizethe same carbohydrate ligand, sialy-Lewisx.Proc. Natl. Acad. Sci. U.S.A. 88, 6224-6228. (69) Zhou, Q., Moore, K. L., Smith, D. F., Varki, A., McEver, R. P., and Cummings,R. D. (1991) The selectin GMP-140binds to sialylated, fucosylated lactosaminoglycanson both myeloid and nonmyeloid cells. J . Cell Biol. 115, 557-564. (70) Aruffo, A., Kolanus, W., Walz, G., Fredman, P., and Seed, B. (1991) CD62/P-selectin recognition of myeloid and tumor cell sulfatides. Cell 67, 35-44. (71) Needham, L. K., Schnaar, R. L. (1993)The HNK-1reactive sulfoglucuronyl glycolipids are ligands for L-selectin and P-selectin but not E-selectin. Proc. Natl. Acad. Sci. U.S.A.90, 1359-1363. (72) Moore,K. L., Varki, A., andMcEver, R. P. (1991)GMP-140 binds to a glycoprotein receptor on human neutrophils: evidence for a lectin-like interaction. J. Cell Biol. 112, 491499. (73) Hollenbaugh, D., Bajorath, J., Stenkamp, R., and Aruffo, A. (1993) Interaction of P-selectin (CD62) and its cellular ligand analysis of critical residues. Biochemistry 32, 29602966.

Bioconjupte Chem., Vol. 5, No. 3, 1994

181

(74) Bajorath, J., Hollenbaugh, D., King, G., Harte,W., Jr., Eustice, D. C., Darveau,R. P., and Aruffo, A. (1994)The CD62/ P-selectin binding sites for myeloid cells and sulfatides are overlapping. Biochemistry 33, 1332-1339. (75) Erbe, D. V., Wolitzky, B. A., Presta, L. G., Norton, C. R., Ramos, R. J., Burns, D. K., Rumberger, J. M., Rao, B. N. N., Foxall, C., Brandley, B. K., and Lasky, L. A. (1992) Identification of an E-selectinregion critical for carbohydrate recognition and cell adhesion. J . Cell Biol. 119, 215-227. (76) Erbe, D. V., Watson, S. R., Presta, L. G., Wolitzky, B. A., Foxall, C., Brandley, B. K., and Lasky, L. A. (1993) P- and E-selectin use common sites for carbohydrate recognition and cell adhesion. J . Cell. Biol. 120, 1227-1235. (77) Sharon, N. (1993)Lectin-carbohydrate complexesof plants and animals: an atomic view. Trends in Biol. Sci. 18,221-226. (78) Weis, W. I., Kahn, R., Fourme, R., Drickamer, K., and Hendrickson,W. A. (1991) Structureof the calcium-dependent lectin domain from a rat mannose-bindingprotein determined by MAD phasing. Science 254, 1608-1615. (79) Harte, W., Jr., and Bajorath, J. (1994)Synergismof calcium and carbohydrate binding to mammalian lectin suggested by a dynamic model. Submitted. (80) Mulligan, M. S., Paulson, J. C., De Frees, S., Zheng, Z.-L., Lowe, J. B., and Ward, P. A. (1993) Protective effects of oligosaccharidesin P-selectin-dependent lung injury. Nature 364, 149-151. (81) Bundle, D. R., and Young, N. M. (1992) Carbohydrateprotein interactions in antibodies and lectins. Curr. Opin. Struct. Biol. 2, 666-673. (82) Vyas, N. K. (1991)Atomic features of protein-carbohydrate interactions. Curr. Opin. Struct. Biol. 1, 732-740. (83) Weis, W. I., Drickamer, K., and Hendrickson, W. A. (1992) Structure of a C-typemannose-bindingprotein complexed with an oligosaccharide.Nature 360, 127-134. (84) Miller, K. E., Mukhopadhyay, C., Cagas, P., and Bush, A. C. (1992) Solution structure of the Lewis x oligosaccharide determined by NMR spectroscopy and molecular dynamics simulations. Biochemistry 31, 6703-6709. (85) Tyrrell, D., James, P., Rao, N., Foxall, C., Abbas, S., Dasgupta, F., Nashed, M., Hasegawa, A., Kiso, M., Asa, D., et al. (1991)Structuralrequirements for the carbohydrate ligand of E-selectin. Proc. Natl. Acad. Sci. U.S.A.88,10372-10376.