Structure-based Drug Discovery Using GPCR Homology Modeling

Comparison of composer and ORCHESTRAR. Michael A. Dolan , Matthias Keil , David S. Baker. Proteins: Structure, Function, and Bioinformatics 2008 72 (4...
0 downloads 0 Views 244KB Size
'$  &'  !*  

BIOINFORMATICS

Homology modeling, model and software evaluation: three related resources '$&' ')#!,/  $. "#&  $# '(/  #)*' '&*  & )+ )#&  &+) ') &+# &!#&)#&! & #'+"&'$'!.  '%(,+#&!   #$)! )%&.

-& , &

    

     

    



Abstract Motivation: Homology modeling is rapidly becoming the method of choice for obtaining three-dimensional coordinates for proteins because genome projects produce sequences at a much higher rate than NMR and X-ray laboratories can solve the three-dimensional structures. The quality of protein models will not be immediately clear to novices and support with the evaluation seems to be needed. Expert users are sometimes interested in evaluating the quality of modeling programs rather than the quality of the models themselves. Results: Three servers have been made available to the scientific community: a homology modeling server, a model quality evaluation server and a server that evaluates models built of proteins for which the structure is already known, thereby implicitly evaluating the quality of the modeling program. Availability: The modeling-related servers and several structure analysis servers are freely available at http://swift.embl-heidelberg.de/servers/ Contact: [email protected] Introduction The routine sequencing of entire genomes is providing an overwhelming flood of sequence information. Every day ∼2000 nucleotide sequences are deposited in publicly accessible databases (Stoesser et al., 1997). Compared with this, the numbers of protein structures that are solved and deposited stand in marked contrast at four per day on average. Thus, the structure gap—the difference between the number of known sequences and the number of experimentally determined three-dimensional structures—is widening rapidly. Consequently, homology modeling is becoming the technique of choice for routine structure ‘determinations’. The CASP bi-annual modeling competition (Dunbrack et al., 1997; Mosimann et al., 1995) has made it clear that model building by homology can provide remarkably good results if the sequence identity is high between the protein to be modeled and the template protein (>75%), but at low sequence  Oxford University Press

identities (70 different classes of checks, and this program has been used to search for violations (the PDBREPORT database: http://swift.embl-heidelberg.de/pdbreport/) in all files in the PDB (Bernstein et al., 1997). The second server presented here executes those checks of the WHAT_CHECK program that are relevant for the validation of models built by homology. The CASP bi-annual modeling competition (Dunbrack et al., 1997; Mosimann et al., 1995) has created the realization that homology modeling is a viable science, the results of which are not black box magic, but verifiable observations. CASP has spurred many researchers into improving the quality of model-building-by-homology software. The judges of this competition so far have used several quality criteria, but the all-atom RMS deviation between the model and the corresponding real structure still is their major determinant of model quality. It is, however, becoming generally accepted that this RMS deviation is a bad measure. Abagyan et al. (1997) suggested a local contact-based score, called the CAD score, as a better determinant of model quality. Although the CAD score captures the intuitive feeling about model quality better than the RMS deviation, it also describes only one aspect of the model quality. The third server presented here provides a long list of different types of comparisons when a model and the corresponding real structure are submitted.

Modeling Since the WHAT IF program (Vriend, 1990) performed well for the high-sequence-identity test cases in both CASP competitions, we decided to make this modeling program avail-

523

R.Rodriguez et al.

Fig. 1. WWW layout of the modeling server.

able as a WWW-based server (see Figure 1) that can be used for high-sequence-identity modeling tasks. The server is not as automatic as, for example, the SwissModel server (Peitsch, 1996); it does not search for the optimal template and it does not yet model insertions. The search for an optimal template will perhaps be implemented in the near future, but ab initio modeling of loops has, in the two CASP competitions, been proven to be too difficult for today’s techniques (Dunbrack et al., 1997; Mosimann et al., 1995) (especially when limited CPU time is available, like in a WWWbased server set-up) and will only be implemented in the server after a major technical or scientific breakthrough. However, we are presently working on the incorporation of a module that will allow for the insertion of very short loops (one or two residues), and hope to make this module available soon. The algorithms implemented in the WHAT IF modeling option have been described before (DeFilippis et al., 1994; Chinea et al., 1995; Rodriguez and Vriend, 1997) and will only be summarized very briefly here. The strength of the WHAT IF modeling module comes from the fact that residues that are conserved between the model and the template are modified as little as possible, and from the use of posi-

524

tion-specific rotamers for residues that have to be exchanged going from the template to the model. The study by De Filippis et al. (1994) indicated that one has the best chance to model conserved residues correctly by not modifying them at all. The study by Chinea et al. (1995) showed that the use of position-specific rotamers (Jones and Thirup, 1986) is a very good solution for the problem of selecting the best rotamer from among the many different possibilities that exist at each position in the model. A rotamer distribution for a certain residue type at a certain position—called a positionspecific rotamer distribution—is determined by extracting from a database of non-redundant protein structures (Hobohm et al., 1992) all suitable fragments of five or seven residues (seven in helix and strand; five in the case of irregular local backbone). Suitable fragments are those that have a local backbone conformation similar to the one around the evaluated position, and have the same residue type at the central position. In the server implementation, the RMS deviation of the backbone alpha carbons is maximally allowed to be 0.5 Å. A list of models built by homology, for which the real structure is known but not used in the modeling process, is available (Rodriguez and Vriend, 1997) to give the user an

Homology modeling, model and software evaluation

impression of the quality of this model building by homology server. The user can, of course, also get an impression of the quality of this modeling server by building a couple of models of known proteins and submitting these models to the WWW-based model precision determination server. The casual user is probably better served with the SwissModel server, but the user who wants to improve the model quality by iteratively changing the alignment or even the template structure will probably want to use the server described here. The process of merging/combining our server and the SwissModel server is in progress.

Model validation Most types of violations that WHAT_CHECK can detect have already been described elsewhere (WHAT_CHECK help: http://swift.embl-heidelberg.de/whatcheck/; Vriend and Sander, 1993; Hooft et al., 1996a,b, 1997). The WHAT_CHECK options that are aimed at solving typically crystallographic problems, such as crystal cell dimension deviations, B-factor distributions, or inconsistencies in the administrative crystallographic records (Hooft et al., 1994), are not useful for the validation of models built by homology and are not implemented in the model validation server. Since models are often, for a large part, kept the same as the template structure, we also have to check elementary geometric aspects of a model (bond lengths, bond angles, planarities, chiral centers). The techniques used to determine misthreading in X-ray structures can equally well be used to determine alignment errors that underlie errors in models. The Ramachandran plot is probably the most powerful determinant of the quality of protein coordinates (Laskowski et al., 1993; Hooft et al., 1997). When the ‘quality’ of the Ramachandran plot of the model is significantly worse than that of the template, then it seems likely that there are significant differences between the backbone of the template and the structure to be modeled. WHAT_CHECK spends a large fraction of its CPU time determining if any Asn, His or Gln side chains need to be rotated by 180_ about their χ2, χ2 or χ3 angle, respectively. The modeling process will sometimes change the side chain torsion angles that are required for optimal hydrogen bonding. However, as it is not yet possible to model reliably more than a few water molecules per protein, the hydrogen bond checking facilities in the model validation server cannot be as precise as in the structure validation server. A detailed description of all the differences between model validation and structure validation was given by Hooft et al. (WHAT_CHECK help: http://swift.embl-heidelberg.de/whatcheck/).

Modeling program validation We have added to the WHAT IF (Vriend, 1990) software a module that compares a model with an experimentally determined structure and provides a wide spectrum of comparison values. As a service to the modeling community, and to allow interested scientists to check for themselves the quality of their favorite modeling program, we have made this model validation server, called WHAT_MODQ, available via the WWW. We will use the terms ‘template’, ‘model’ and ‘real structure’, respectively, for the protein used to model from, the model that was built by homology, and the experimentally determined structure the modeling of which was attempted. In contrast to experimental structures that should always be as good as possible all over the structure, the biological questions that initiated the modeling determine what is important in a model and what therefore needs to be modeled with the greatest precision. For example, if the model is supposed to suggest a new lead for drug design, it is important that the active site is modeled with great precision, but further away from the active site this precision becomes of less interest to the drug designer. A problem that can be encountered is domain displacements. If one domain rotates with respect to another, a very poor RMS score will result, but the model can at every position be of sufficient local precision to answer all biological questions correctly. It is, therefore, important to augment the global RMS values with RMS scores that are based on local comparisons only. Crystallographic symmetry contacts are a problem that has not yet been taken into account upon judging the CASP entries. Nevertheless, it is not a rare event if in a small protein more than a quarter of all side chain conformations are influenced by crystal contacts. Crystal contacts in the template can be determined, and a good modeling program will try to compensate for them. Crystal contacts in the model, however, cannot be predicted and thus any residue that makes a symmetry contact in the real structure should not be taken into account if the quality of the model is judged. To cope with the aforementioned problems, the WHAT_MODQ server provides a large series of comparisons between model and real structure. These tests are designed to dissect as clearly as possible the nature of the modeling errors and to help the modeler find the algorithmic origin of those problems. WHAT_MODQ provides RMS deviations, but also the linear (or robust) errors because these numbers are less influenced by one or two outliers. These numbers are provided for all helical, strand, loop, buried and accessible residues. In several categories, symmetry-contacting and not-symmetry-contacting residues are separated. Several local comparison parameters, such as one similar to

525

R.Rodriguez et al.

Fig. 2. Model comparison server output. The small table on the top is repeated for all residue classes listed below it. Series of dots indicate that some output has been removed in this figure for brevity.

Abagyan’s CAD score (Abagyan et al., 1997) and a moving average short fragment RMS deviation, are provided. The 10 poorest modeled residues are listed, and plots of the superposition of model and real structure are provided for all secondary structure elements. Of course the WHAT_MODQ server deals correctly with a large series of problems such as missing or misnamed atoms or, for example, a tyrosine that roughly has a 180_ rotation about the Cβ–Cγ bond in either the model or the real structure. Figure 2 shows some examples of the output. The quality of a model built by homology is partly a subjective measure because the relative importance of different errors is partly determined by the biological questions that

526

the model is supposed to answer. The WHAT_MODQ server tries to accommodate the user by providing the widest possible range of comparisons between the model and the real structure. We suggest that both the template and the real structure are submitted to the BIOTECH structure validation server (http://biotech.embl-heidelberg.de:8400/) or that the quality of these two real structures are looked up in the PDBREPORT database (Hooft et al., 1996a; http://swift.embl-heidelberg.de/pdbreport/). It would be unfair to compare a model with a residue in the real structure that sits ‘the wrong way around’ or is otherwise not correct. The three servers described above are part of a larger series of structure and model analysis servers (see Figure 3).

Homology modeling, model and software evaluation

Fig. 3. The WHAT IF-based modeling related servers.

Acknowledgements We thank G.Padron, B.Bywater, C.Sander and A.Tramontano for stimulating discussions. We thank B.Altenberg and K.Krmoian for technical assistance, and J.Weare for technical assistance and for critically reading the manuscript. This work was partly funded by EC grants CT960166 and CT960189, and by the BMBF RELIWE grant.

References Abagyan,R.A. and Totrov,M.M. (1997) Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. J. Mol. Biol., 268, 678–685. Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol., 112, 535–542. Chinea,G., Padron,G., Hooft,R.W.W., Sander,C. and Vriend,G. (1995) The use of position specific rotamers in model building by homology. Proteins, 23, 415–421.

DeFilippis,V., Sander,C. and Vriend,G. (1994) Predicting local structural changes that result from point mutations. Protein Eng., 7, 1203–1208. Dunbrack,R.L., Gerloff,D.L., Bower,M., Chen,X., Lichtarge,O. and Cohen,F.E. (1997) The second meeting of the critical assessment of techniques for protein structure prediction (CASP2). Fold. Des., 2, R27–R41. Hobohm,U., Scharf,M., Schneider,R. and Sander,C. (1992) Selection of representative protein data sets. Protein Sci., 1, 409–417. Hooft,R.W.W., Sander,C. and Vriend,G. (1994) Reconstruction of symmetry-related molecules from protein data bank (PDB) files. J. Appl. Crystallogr., 27, 1006–1009. Hooft,R.W.W., Sander,C., Vriend,G. and Abola,E. (1996a) Errors in protein structures. Nature, 381, 272. Hooft,R.W.W., Sander,C. and Vriend,G. (1996b) Verification of protein structures: Side-chain planarity. J. Appl. Crystallogr., 29, 714–716. Hooft,R.W.W., Sander,C. and Vriend,G. (1997) Objectively judging the quality of a protein structure from a Ramachandran plot. Comput. Applic. Biosci. Jones,T.A. and Thirup,S. (1986) Using known substructures in protein model building and crystallography. EMBO J., 5, 819–823.

527

R.Rodriguez et al.

Laskowski,R.A., MasArthur,M.W., Moss,D.S. and Thornton,J.M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr., 26, 283–291. Mosimann,S., Meleshko,R. and James,M.N.G. (1995) A critical assessment of comparative molecular modelling of tertiary structures of proteins. Proteins, 23, 301–317. Peitsch,M.C. (1996) ProMod and Swiss-Model: Internet based tools for automated comparative protein modelling. Biochem. Soc. Trans., 24, 274–279. Pontius,J., Richelle,J. and Wodak,S. (1996) Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol., 264,121–136.

528

Rodriguez,R. and Vriend,G. (1997) Professional gambling. In Vergoten,G. and Theophanides,T. (eds), Biomolecular Structure and Dynamics. Kluwer Academic, Dordrecht. Stoesser,G., Sterk,P., Tuli,M.A., Stoehr,P.J. and Cameron,G.N. (1997) The EMBL nucleotide sequence database. Nucleic Acids Res., 25, 7–13. Vriend,G. (1990) WHAT IF: A molecular modelling and drug design program. J. Mol. Graph., 8, 52–56. Vriend,G. and Sander,C. (1993) Quality control of protein models: Directional atomic contact analysis. J. Appl. Crystallogr., 26, 47–60.