Single-Coordinate-Driving Method Coupled with Simulated Annealing

A randomized kinematics-based approach to pharmacophore-constrained conformational search and database screening. Steven M. LaValle , Paul W. Finn ...
1 downloads 0 Views 170KB Size
J. Phys. Chem. B 1997, 101, 7863-7868

7863

Single-Coordinate-Driving Method Coupled with Simulated Annealing. An Efficient Tool To Search Conformational Space Eva Fadrna´ and Jaroslav Kocˇ a* Department of Organic Chemistry and Laboratory of Biomolecular Structure and Dynamics, Faculty of Science, Masaryk UniVersity, 611 37 Brno, Czech Republic ReceiVed: March 25, 1997; In Final Form: June 11, 1997X

A single-strand AUG RNA trimer has been studied by computational tools using a single internal coordinate driving method incorporated in the CICADA program interfaced with AMBER molecular mechanics. CICADA, previously appearing a good tool for the global description of the small and middle-sized flexible organic molecules conformational space, failed in this particular case, giving the best minimum of 10 kcal/ mol higher than molecular dynamics. We then combined the single-coordinate-driving (SCD) method with simulated annealing (SA), obtaining a global minimum substantially better than that from both CICADA and molecular dynamics. In addition, an interconversion path between both minima has been revealed.The new global minimum exhibits violation of stacking at the G terminal as reported in the experimental data. In comparison with the data obtained by other methods and reported in the literature, the new method yields better results. Additionally, the new method is able to find pathways of conformational interconversions connecting distant parts of the conformational space. The method is now implemented in the updated version of the CICADA program. The results have confirmed that even the PES of such a relatively small molecule exhibits a large number of local minima and that a clustering of the minima into conformational families appears necessary.

Introduction

Motivation

Conformational behavior of oligonucleotides and nucleic acids has been the focus of attention in bioorganic chemistry, biophysics, and molecular biology for a long time. The knowledge of conformational behavior of these structures provides new insight into protein recognition and gene expression control. Such data can be obtained by experimental methods, especially X-ray1 and NMR.2 However, reliable interpretation of the data can be achieved only by theoretical analysis. For a realistic picture of the distribution of conformers, analysis of the conformational space is needed. Moreover, it has recently been shown that such a picture provides the necessary background for efficient techniques of free energy simulation.3 In fact, there is no adequate computational tool for a complete solution of the conformational problem including free energy. Molecular dynamics (MD) and Monte Carlo (MC) methods are particularly used for this purpose, neither of them yielding completely satisfactory results. MD samples a part of the conformational space where conformers are not separated by high energy barriers. MC methods produce a set of conformers covering various parts of the conformational space, but no information about their mutual relationships (interconvertibility) is given. Additionally, MC methods usually become inefficient when the low-energy parts of potential energy (hyper)surface (PES) is required. The present paper describes a method that gives a detailed picture of the low-energy parts of the entire conformational space. The method, tested on short oligonucleotides, has yielded better results in comparison with the reported data obtained by other methods.

For several years now we have been working in conformational analysis using computational tools. Our effort has been focused on global description of the conformational space, including search for conformers (energy minima) and pathways of conformational interconversions. For this purpose, we have used the method of “forced” driving of a single internal coordinate (dihedral angle) with simultaneous relaxation of the rest of the molecule. We have incorporated the method into two computer programs, DAISY4 and CICADA,5 which we have recently developed and interfaced with molecular mechanics MMX,6 PMMX,7 MM2/MM3,8,9 and AMBER.10 The methodology has successfully been applied to conformational studies of several dozen small and middle-sized organic molecules including pheromones,4,11 peptides,7,12,13 carbohydrates,14-16 oligonucleotides,17 and hydrocarbons. We have also studied the AUG single strand RNA trimer, the starting codon in biosynthesis of proteins. The unusual structure in AUG may be significant within that process. It has been reported that stacking is shifted or violated especially on the G part of the molecule.18 Molecule labeling is shown in Scheme 1 and Table 1. In the following, A stands for adenine, U for uracil, and G for guanine, respectively. This trimer was studied by NMR that found either BU (bulged)19 or HX (helical) family with G shifted.18 These results imply that the molecule is probably flexible and conformationally variable. CICADA interfaced with AMBER had found 1617 conformers for this molecule before the calculation was stopped. The majority of them fell into one single large low-energy conformational family, the rest of the conformers showing substantially higher energies. The lowest energy conformer found (I) kept a good stacking geometry between A and U, but the guanosine was shifted. The structure is fixed by four sugar-backbone H-bonds and is similar to helical A-RNA (HX).

* E-mail: [email protected] X Abstract published in AdVance ACS Abstracts, August 15, 1997.

S1089-5647(97)01069-9 CCC: $14.00

© 1997 American Chemical Society

7864 J. Phys. Chem. B, Vol. 101, No. 39, 1997

Fadrna´ and Kocˇa

SCHEME 1: General Formula of the Molecule Studieda

structures I and II. Only dihedral angles, which were different for both structures, were driven. The assumption was that CICADA will locate a pathway connecting I and II. However, no such pathway was found by the program. When driving the guanosine glycosidic bond, the energy continuously increased. The reason was that the rest of the molecule was not able to relax sufficiently during the minimization procedure. Hence, we decided to incorporate simulated annealing to help relaxation. Methodology

a The rings are marked A-C for unambiguous description of torsions. The Greek letters represent labeling of dihedral angles (see Table 1).

TABLE 1: Notation of Torsion Angles in Nucleotidesa torsion angle R β γ δ  ζ χ a

atoms involved (n-1)O3′-P-O5′-C5′

P-O5′-C5′-C4′ O5′-C5′-C4′-C3′ C5′-C4′-C3′-O3′ C4′-C3′-O3′-P C3′-O3′-P-O5′(n+1) O1′-C1′-N1-C2 (pyrimidines) O1′-C1′-N9-C4 (purines)

Atoms denoted (n - 1) and (n + 1) belong to adjacent units.21

To confirm the CICADA results, we ran molecular dynamics at 300 K, starting with one of the geometries not too far from the structure I but higher in energy by about 10 kcal/mol. After a 14 ps trajectory the geometry dropped into a very deep well. Minimization resulted in conformer II, whose energy is about 10 kcal/mol below the energy of the original lowest energy structure I. The structure II is stabilized by hydrogen bonding. The guanosine base is not stacked and the molecule is folded on the G terminal. This structure is called half-stacked20 (abbreviated as HS(G), with only A and U stacked and G unstacked). It is clear from the results that CICADA, which worked well for small and flexible molecules, missed at least one important conformational family containing the conformer II. We want to know the reason of this failure. Therefore we started a new run. The starting set of conformers was composed of the

Nomenclature. Dihedral angles are denoted by the Saenger convention21 (Table 1). The base orientation is described by the expressions syn ) -90°< χ < 90° and anti ) 90° < χ < 270° according to ref 21. Conformational Families. There are four possible conformational families reported in the literature:20 helical (HX); close to canonical A-RNA, half-stacked (HS(G); for explanation see above, reverse-stacked (RS(A)); base 1 is sandwiched between bases 2 and 3, and bulged (BU); the middle base is unstacked, while bases 1 and 3 are stacked. In the case of AUG two additional families are possible: HS with A folded and U with G stacked (HS(A) and, analogously, RS with base 3 sandwiched between bases 1 and 2 (RS(G)). It is difficult to classify all the conformers found by CICADA in the above families on the basis of single geometrical parameters (torsional angles, atom distances, etc.). Hence, we have performed conformational clustering based on atomic distances (N9(A)-N1(U), N1(U)-N9(G), N9(A)-N9(G)) and angles between base planes (A and U planes, U and G planes, A and G planes). Clustering Technique. The clustering of single conformations into families was performed by the program FAMILLE.22 It is based on similarity of selected geometrical parameters. The program starts with the global minimum and creates a cluster of all conformers within a predefined energy window, which are closer to each other than a predefined distance measured in selected dihedral angles subspace. Once no new conformer is found to fit the family, new global minimum conformer is taken which does not fall into the first family, and a new family is created. The process stops when no unclustered conformer is found. Search Method: Combination of Single-Coordinate Driving (SCD) with Simulated Annealing (SA). The procedure consists of the following steps: 1. Take the lowest energy conformer (current global minimum). 2. Rotate a selected torsion angle by a predefined step. Constrain the torsion angle at this value. 3. Heat the structure slowly from 10 to 50 K in 1000 1fs molecular dynamics (MD) steps. Then run 8 ps MD trajectory at 50 K. Cool the structure to 10 K in 2000 1fs MD steps. Minimize the structure and record the energy. 4. Repeat steps 2 and 3. If the energy profile obtained from the previous steps shows a minimum then go to step 5, otherwise go to step 2.

The Single-Coordinate-Driving Method 5. Apply the annealing procedure described in step 3 but without any constrains. Store the structure as a new conformer if not previously stored. Store the highest energy structure on the pathway obtained in steps 2-3. Store its energy as the transition barrier of the conformational interconversion. 6. Repeat steps 2-5 for all predefined torsions and in both directions of rotation (clockwise and counterclockwise). 7. If any interesting conformer remains, mark it as current global minimum and continue from step 1. Otherwise stop the procedure. Note that the driving step as well as MD characteristics in steps 3 and 5 are variable parameters. We have incorporated the above procedure into the existing CICADA program and interfaced it with the AMBER program system. CICADA and AMBER Parameters. The calculation started with a single conformer generated by AMBER and the resulting number of points describing PES was 2007, of which 603 were conformers (energy minima). The lowest energy conformer found by CICADA was significantly lower in energy compared to the starting one (energy jump is about 14 kcal/mol). Ten backbone torsions (A, ζA, RB, βB, γB, B, ζB, RC, βC, γC) and three base rotation angles (χA, χB, χC) were considered as active (driven). Two conformers were considered different if at least one of the selected dihedral angles differed by more than 30° or 15° for exo- or endocyclic torsion, respectively. For AMBER parameters see ref 23. The calculations were performed on a Silicon Graphics Power Challenge R8000 computer. One run of AUG required approximately 1 week. Results and Discussion Interconversion Pathways. We have applied the new method to the AUG trimer for which the original CICADA failed. The starting set of conformers again consisted of conformers I and II. After a few pathways were calculated, the system jumped into new global minimum, conformation III.

J. Phys. Chem. B, Vol. 101, No. 39, 1997 7865

Figure 1. Key structures on the best interconversion pathway found between the HX and HS via transition state TS. The energy barrier is 6.3 kcal/mol for HX f TS and 11.1 kcal/mol for HS f TS, respectively. The interconversion pathway consist of several steps.

TABLE 2: List of the Basic Families (the Best Conformers of These Families) Found by CICADA for the AUG Trimer with Respect to Base Stacking Clusteringa relative energy of the lowest plane angles atom distances energy conformer (deg) (Å) family in the family index (kcal/mol) A-U U-G A-G A-U U-G A-G HX BU HS(G) HS(A) RS(A) RS(G)

0.00 3.10 4.51 4.77 6.77 10.59

4.0 49.0 34.0 43.0 6.0 15.0

22.0 39.0 89.0 24.0 8.0 9.0

25.0 18.0 82.0 67.0 3.0 10.0

4.50 6.34 4.43 9.16 4.46 7.61

4.88 7.12 7.68 4.98 7.03 4.25

7.70 4.32 7.32 7.75 4.99 5.19

a For denotations of the families see Scheme 2 and the text. Atom distances are defined by: N9 in A, N1 in U, and N9 in G (see Scheme 1).

Structures III and I exhibit a very similar base-stacking geometry. But the backbone conformations are completely different. The backbone is bent and fixed by H-bonds in structure III. Our original aimsto connect two distinct minima (HX and HS)shas also been reached. Key structures on the best interconversion pathway found are pictured in Figure 1. A similar problem has been reported in the literaturesto connect RS and HX minima of the AAA trimer by MC methods at a high temperature.24 The energy barrier found in this case was substantially higher (about 30 kcal/mol) than in our results (11.1 kcal/mol). It means that our methodology found a substantially better transition region. The corresponding pathway we have found leads through several local minima as shown in Figure 2. Conformational Families. Our new methodology has found representatives of all known families (see Table 2) and several additional ones.

SCHEME 2: Simple Scheme of Six Conformer Families (for Example, the HS Family Has Two Possibilities of Terminal Folding, Analogically in the RS Family both A1 and G3 Bases Can Be Sandwiched between the Remaining Ones) with the Abbreviations Used in the Texta

a

For a more detailed description see the text.

Table 2 and Scheme 2 show that there are three families with parallel bases (HX, RS(A), RS(G)). The base angles approach zero value, although not perfectly parallel. The three families differ substantially in the geometry of the sugar-phosphate

7866 J. Phys. Chem. B, Vol. 101, No. 39, 1997

Fadrna´ and Kocˇa

Figure 2. Schematic picture of the conformational space. There are three low-energy clusters (families) (HX, BU, and HS(G)) separated by 9.5 and 11.1 kcal/mol energy barriers, respectively. For example, when the barrier of 9.5 kcal/mol is exceeded, the clusters 1 and 2 merge. The energy difference between HX and the transition state T1 is the interconversion barrier. The fundamental problem of this analysis is to locate the cluster border.

Figure 3. Stereoview of the lowest energy conformers of the six above-mentioned families. The drawings were produced by the INSIGHT II software (BIOSYM/Molecular Simulations).25

backbone which is clearly shown by atomic distances (Table 2). For example, in the case of HX the largest distance was found between A and G bases, with distances A-U and U-G approximately equal. In the RS(A) structure, the range of distances is different from HX family. The largest distance can be found between U and G, which means that the A base is sandwiched between them. The situation in the remaining families is not so transparent. BU, HS(A), and HS(G) families represent geometries where two bases are parallel and the third one is either perpendicular (HS(G)) or transversal (BU, HS(A)) to them. Table 2 shows

that the parallel bases have the shortest atom distances. For example: the HS(G) structure shows the lowest plane angle between A-U and also the distance between appropriate atoms is the lowest. The G base is perpendicular both to A and to U. The important feature of the CICADA method is also the ability to provide an information about energy needed to merge single conformational families.The situation for AUG is pictured in Figure 4. The above clustering is based on atomic distances and angles between base planes. We have also performed clustering by torsional angles χA,B,C describing rotation of bases. Eight basic

The Single-Coordinate-Driving Method

J. Phys. Chem. B, Vol. 101, No. 39, 1997 7867

Figure 4. Dendrogram showing the grouping of conformational families into larger clusters. When the energy exceeds 9.5 kcal/mol the HX and BU families merge and form one cluster, etc. All the families will together create a single cluster when the energy exceeds 21 kcal/mol. In that case, any conformer will be convertible to any other.

conformational families show combinations of syn and anti orientations.26 In the 3D representation these families form a cube. For more detailed information, including interconversion pathways, see ref 23.

now implemented in the updated version of the CICADA program and tested on a number of oligonucleotides and peptides, including the AAA single-strand RNA trimer. PDB files of all the structures mentioned in the paper are available via e-mail from the authors upon request.

Conclusions We have combined two well-known methods to search conformational space, the SCD method and SA. The former method incorporated in the CICADA program has been shown to be a systematic tool that gives a complete picture of the conformational space of small-to-medium-sized molecules including information about conformational interconversions. The results of this method may be as good as grid search data.15 The method is not suitable for molecules with high steric hindrance as we have shown in this paper for the AUG RNA trimer. The reason is that a standard minimization routine is not able to relax the molecule sufficiently. Simulated annealing is not a systematic method, but it is able to relax the molecule. It has a tendency to converge to the global minimum. Therefore, it is not suitable to explore the entire conformational space. It gives no information about conformational interconversions. The SCD-SA combination is a systematic method that gives a good description of the entire conformational space. It can be used for molecules with steric hindrance and also for large systems. The test on the AUG RNA trimer showed that the new method found more conformational families than described in the literature so far. The basic disadvantage of the new method is computational intensity. So, for smaller molecules SCD provides the same results as SCD-SA combination, but in a much shorter time. Therefore, the new method should be used only for such molecules where SCD fails.The new method is

Acknowledgment. The research has partially been supported by the Grant Agency of the Czech Republic, Grant 203/96/1513. The authors would like to thank the Czech Academic Supercomputer Center in Brno for providing them access to computer facilities. References and Notes (1) Suck, D.; Manor, D. A.; Saenger, W. Acta Crystallogr. 1976, B32, 1727. (2) Gorenstein, D. G. Chem. ReV. 1994, 94, 1315. (3) Senderowitz, H.; Guarnieri, F.; Still, W. C. J. Am. Chem. Soc. 1995, 117, 8211. (4) Kocˇa, J.; Carlsen, P. H. J. J. Mol. Struct. (THEOCHEM) 1992, 257, 105. (5) Kocˇa, J. J. Mol. Struct. (THEOCHEM) 1994, 308, 13. (6) MMX is an extended and improved version of Allinger’s MM2 developed by J. J. Gajewski and K. E. Gilbert. The program is available from Serena Software, Box 3076, Bloomington, IN 47402-3076. (7) Kocˇa, J.; Carlsen, P. H. J. J. Mol. Struct. 1993, 291, 271. (8) Sprague, J. T.; Tai, J. C.; Allinger, N. L. J. Comput. Chem. 1987, 8, 581. (9) MM3(92), MM3(94), QCPE, Creative Arts Building 181, Indiana University, Bloomington, IN 47405. (10) Pearlman, D. A.; Case, D. A.; Cadwell, J. C.; Seibel, G. L.; Singh, U. Ch.; Weiner, P.; Kollman, P. AMBER Version 4.0; University of California, San Francisco, 1991. (11) Kocˇa, J.; Carlsen, P. H. J. J. Mol. Struct. 1992, 268, 263. (12) Kocˇa, J.; Krˇ´ızˇ, Z.; Carlsen, P. H. J. J. Mol. Struct. 1994, 306, 157. (13) Kocˇa, J.; Carlsen, P. H. J. J. Mol. Struct. (THEOCHEM) 1995, 337, 17. (14) Kocˇa, J.; Pe´rez, S.; Imberty, A. J. Comput. Chem. 1995, 16, 296.

7868 J. Phys. Chem. B, Vol. 101, No. 39, 1997 (15) Engelsen, S.; Pe´rez, S.; Imberty, A.; Braccini, I.; Herve´ du Penhoat, C.; Kocˇa, J. Carbohydr. Res. 1995, 276, 1. (16) Imberty, A.; Mikros, E.; Kocˇa, J.; Mollicone, R.; Oriol, R.; Pe´rez, S. Glycoconjugate J. 1995, 12, 331. (17) Fadrna´, E.; Kocˇa, J. J. Biomol. Struct. Dynam. 1996, 14, 137. (18) Stone, M. P.; Winkle, S. A.; Borer, P. N. J. Biomol. Struct. Dynam. 1986, 3, 767. (19) Lee, Ch.-H.; Tinoco, Jr., I. Biophys. Chem. 1980, 11, 283. (20) Bouchemal-Chibani, N.; Lebrun, A.; Herve´ du Penhoat, C.; Ghomi, M.; Laigle, A.; Derouet, C.; Turpin, P. Y. J. Biomol. Struct. Dynam. 1994, 12, 695. (21) Saenger, W. Principles of Nucleic Acid Structure; SpringerVerlag: New York, 1984. (22) Imberty, A.; Pe´rez, S. Glycobiology 1994, 4, 351.

Fadrna´ and Kocˇa (23) The AMBER 4.0 package, especially the minmd and sander modules, have been used. Simulations ran with SHAKE on hydrogens, a 1 fs time step, temperature was increased continuously from 10 to 50 K, a 9 Å cutoff, nonbonded pair list was updated every 25 steps. As for minimization, the steepest descent method was switched to conjugate gradient after 50 cycles of minimization, the maximum of the minimization cycles was set to 2200. Calculations were performed for the distancedependent dielectric constant which mimics the presence of a high dielectric solvent (water). (24) Ghomi, M.; Victor, J. M.; Henriet, Ch. J. Comput. Chem. 1994, 15, 433. (25) INSIGHT II User Guide; Biosym/MSI: San Diego, 1995. (26) Fadrna´, E.; Kocˇa, J., J. Mol. Struct. (THEOCHEM), in press.