From Ramachandran Maps to Tertiary Structures of Proteins - The

22 Jun 2015 - Development of a Web-Server for Identification of Common Lead Molecules for Multiple Protein Targets. Abhilash Jayaraj , Ruchika Bhat ...
2 downloads 0 Views 4MB Size
Article pubs.acs.org/JPCB

From Ramachandran Maps to Tertiary Structures of Proteins Debarati DasGupta,†,‡ Rahul Kaushik,‡,§ and B. Jayaram*,†,‡,§ †

Department of Chemistry, ‡Supercomputing Facility for Bioinformatics & Computational Biology, and §Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India ABSTRACT: Sequence to structure of proteins is an unsolved problem. A possible coarse grained resolution to this entails specification of all the torsional (Φ, Ψ) angles along the backbone of the polypeptide chain. The Ramachandran map quite elegantly depicts the allowed conformational (Φ, Ψ) space of proteins which is still very large for the purposes of accurate structure generation. We have divided the allowed (Φ, Ψ) space in Ramachandran maps into 27 distinct conformations sufficient to regenerate a structure to within 5 Å from the native, at least for small proteins, thus reducing the structure prediction problem to a specification of an alphanumeric string, i.e., the amino acid sequence together with one of the 27 conformations preferred by each amino acid residue. This still theoretically results in 27n conformations for a protein comprising “n” amino acids. We then investigated the spatial correlations at the two-residue (dipeptide) and three-residue (tripeptide) levels in what may be described as higher order Ramachandran maps, with the premise that the allowed conformational space starts to shrink as we introduce neighborhood effects. We found, for instance, for a tripeptide which potentially can exist in any of the 273 “allowed” conformations, three-fourths of these conformations are redundant to the 95% confidence level, suggesting sequence context dependent preferred conformations. We then created a look-up table of preferred conformations at the tripeptide level and correlated them with energetically favorable conformations. We found in particular that Boltzmann probabilities calculated from van der Waals energies for each conformation of tripeptides correlate well with the observed populations in the structural database (the average correlation coefficient is ∼0.8). An alpha-numeric string and hence the tertiary structure can be generated for any sequence from the look-up table within minutes on a single processor and to a higher level of accuracy if secondary structure can be specified. We tested the methodology on 100 small proteins, and in 90% of the cases, a structure within 5 Å is recovered. We thus believe that the method presented here provides the missing link between Ramachandran maps and tertiary structures of proteins. A Web server to convert a tertiary structure to an alphanumeric string and to predict the tertiary structure from the sequence of a protein using the above methodology is created and made freely accessible at http://www.scfbio-iitd.res.in/software/proteomics/rm2ts.jsp.



or “de novo” methods in the field) are employed extensively, the most successful techniques for structure prediction being clever combinations of de novo and homology methods. The current success rate,59 however, for predicting the tertiary structure to within 5 Å root-mean-square deviation (RMSD), i.e., to “medium resolution”, from the native (experimental) structure is hovering around 40% and drops to 15% if a “higher resolution” structure (≤2 Å from the native) is required, necessitating some newer approaches. Apart from contributing to plausible solutions to the grand challenge of protein structure prediction, the scope of structure prediction methods is to help accelerate structure based drug discovery. The work presented here is a step in this direction. Conformations of simple saturated acyclic organic molecules are specified as syn/eclipsed, gauche, anti (e, g+, g−, a), etc.,

INTRODUCTION The protein folding problem deals with how the amino acid sequence of a protein determines its three-dimensional structure. Since the times of Pauling,1,2 Kauzmann,3−5 Ramachandran,6−8 and Anfinsen,9 considerable progress has been made in multiple directions10−34 in developing a deeper understanding of the problem aided especially by the growing database35 of X-ray and NMR structures, molecular dynamics simulations,36−43 and quantum calculations.44−58 Predicting protein tertiary structure from its sequence, a problem of great importance to chemistry and biology, however, remains unsolved. Critical assessment of protein structure prediction (CASP) experiments59 in vogue since the 1990s has done a yeoman’s service to the evolving field of structure prediction.60−62 Methods based on homology modeling which arise due to the axiom that similar sequences adopt similar structures and those originating in physics based potentials or knowledge based (data driven) statistical potentials with a suitable conformational sampling algorithm (categorized as “ab initio” © XXXX American Chemical Society

Special Issue: Biman Bagchi Festschrift Received: March 29, 2015 Revised: May 14, 2015

A

DOI: 10.1021/acs.jpcb.5b02999 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

allowed space in “experimental” Ramachandran maps.63 These 27 classes can be broadly divided into three basins, viz., H, E, and C. Classes C1−C7 are in the helix (H) basin, C8−C15 in the sheet (E) basin, and C16−C27 represent loop (C)-like torsions. The definition of the conformational classes C1−C27 and the percentage occurrence of amino acid residues in each class are given in Table 1. The observed φ, ψ values for each residue in experimental structures of proteins can be mapped onto one of the conformational classes. Figures 3 and 4 describe the steps involved in the 3D → 1D mapping protocol. A utility to convert a tertiary structure to an alphanumeric string and vice versa is provided at http://www.scfbio-iitd.res.in/software/ proteomics/rm2ts.jsp. One could have introduced more classes in Table 1, but the margin of return diminishes. One could have also decreased the number of classes by choosing +60 or +90° as in the case of simple organic acyclic molecules, but the ability to regenerate a structure close to the native (