Chemically Accurate Relative Folding Stability of RNA Hairpins from

Oct 30, 2018 - Center for RNA Biology, University of Rochester, Rochester , New York 14642 , United States. J. Chem. Theory Comput. , Article ASAP...
0 downloads 0 Views 4MB Size
Subscriber access provided by The University of Texas at El Paso (UTEP)

Biomolecular Systems

Chemically Accurate Relative Folding Stability of RNA Hairpins from Molecular Simulations Louis G. Smith, Zhen Tan, Aleksandar Spasic, Debapratim Dutta, Leslie A. Salas-Estrada, Alan Grossfield, and David H Mathews J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00633 • Publication Date (Web): 30 Oct 2018 Downloaded from http://pubs.acs.org on October 30, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Chemically Accurate Relative Folding Stability of RNA Hairpins from Molecular Simulations Louis G. Smith,†,¶ Zhen Tan,†,¶ Aleksandar Spasic,†,¶ Debapratim Dutta,†,¶ Leslie A. Salas-Estrada,† Alan Grossfield,† and David H. Mathews∗,†,‡,¶ †Department of Biochemistry & Biophysics ‡Department of Biostatistics and Computational Biology ¶Center for RNA Biology E-mail: [email protected] Phone: (585) 275-1734. Fax: (585) 275-6007

Abstract To benchmark RNA force fields, we compared the folding stabilities of three 12-nucleotide hairpin stem loops estimated by simulation to stabilities determined by experiment. We used umbrella sampling and a reaction coordinate of end to end (50 to 30 hydroxyl oxygen) distance to estimate the free energy change of the transition from the native conformation to a fully extended conformation with no hydrogen bonds between non-neighboring bases. Each simulation was performed four times using the AMBER FF99+bsc0+χOL3 force field and each window, spaced at 1 Å intervals, was sampled for 1 μs, for a total of 552 μs of simulation. We compared differences in the simulated free energy changes to analogous differences in free energies from optical melting experiments using thermodynamic cycles where the free energy change between stretched and random coil sequences is assumed to be sequence independent. The differences between experimental and simulated ∆∆G◦ are on average 0.98 ± 0.66 kcal/mol, which is chemically accurate and suggests analogous simulations could be used predictively. We also report a novel method to identify where replica free energies diverge along a reaction coordinate, thus indicating where additional sampling would most improve convergence. We conclude by discussing methods to more economically perform these simulations.

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Keywords Molecular Dynamics, Umbrella Sampling, RNA Hexaloop, Folding Stability, Single Molecule Pulling, Optical Melting, Melting, Mechanical Denaturation, Force Field Benchmark

2

ACS Paragon Plus Environment

Page 2 of 47

Page 3 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Introduction RNA is central to life. 1 It has fundamental roles in the expression of genetic information as proteins, and also functions directly in other processes as non-coding RNA. 2–14 Molecular interactions are the first principles of biology, in that they provide mechanistic understandings of biology. Hence, all-atom simulations of RNA provide a framework for developing hypotheses and interpreting experiments . 15–19 All-atom simulations provide detailed information about the conformations sampled by molecules, and provide useful insights when simulations are carefully interpreted. 20 Molecular dynamics (MD) uses classical mechanics to model the motions of molecules. Snapshots of the simulated system recorded periodically during the simulation approximate the thermodynamic conformational ensemble that, with thorough sampling, can be compared to experimental measurements. Making this comparison, however, requires knowing the relationship between the coordinates in the snapshots and the experimental quantity measured. 21–26 MD simulations require an energy model known as a force field, which estimates the potential energy from the molecular conformation. The force field functional form and parameters are approximations of the quantum mechanics of the electronic structure underlying the molecular interactions. 20,27,28 There are a number of known limitations to the RNA force fields that lead to significant artifacts in the simulations when sampling is thorough. In simulations with the commonly used AMBER force field, FF99+bsc0+χOL3, 29–31 tetraloops do not retain a folded structure, 32,33 tetramers often lack contacts observed in proton NMR spectra and exhibit close contacts not observed in the spectra, 25,32 internal loops do not produce ratios of major and minor conformers similar to solution structures, 34,35 and some RNA-protein complexes are not stable for more than a microsecond. 36 It is widely believed that no available force field cheap enough to permit adequate sampling is accurate enough to fold structured RNA de novo. 27,28,37 The CHARMM 36 force field 38 and a revised Amber force field 39 neither correctly stabilize the native tetraloop structure nor accurately model tetramers. 32 Our recent work re-parameterizing backbone dihedrals improves

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

simulation accuracy for tetramers and internal loops, but not tetraloops. 40 Another revision of the AMBER force field and a new AMOEBA force field for nucleic acids were recently published, but remain to be tested by the field more broadly. 41,42 Here we provide a different framework for testing force fields and apply it to AMBER’s FF99+bsc0+χOL3 force field. One strategy for force field benchmarking is to compare thermodynamic quantities estimated by simulation to reference data. One such quantity with extensive reference data is the unfolding free energy change of small structured RNAs. 34,43 Benchmarking against free energy changes tests whether the simulated volume of the native conformational basin is accurate and whether the force field correctly represents the favorability of native contacts relative to those in the unfolded state. This contrasts to comparisons between simulations and native structures determined by NMR or crystallography, where it is difficult to assess whether the observed conformational flexibility is consistent with the structure unless the native conformation is catastrophically lost. Simulated free energy changes have been previously compared to in vitro reference data for base stacking stability, 44–47 helix stability, 48–51 and small hairpin stability. 38,52–55 This study compares the relative free energy changes of stretching RNA hairpin stem loops estimated by simulation to free energy changes determined by optical melting experiments using thermodynamic cycles. 53 Simulating the optical melting transition with good statistics is challenging because the denatured state has a huge volume in phase space. 56,57 Thus to focus our sampling, we simulate a transition along a distance from folded to stretched, in imitation of force denaturation by optical tweezers, hereafter referred to as pulling experiments. 53,58,59 To simulate this stretching transition we performed umbrella sampling along the 50 to 30 hydroxyl oxygen distance (the end to end distance). We performed these simulations in quadruplicate for three 12-nucleotide RNAs with known free energies of unfolding and with representative NMR structures (Fig. 1). We compared the simulated change in free energy of stretching to experimentally-determined free energy changes of melting for pairs of sequences with thermodynamic cycles. 53 The mean absolute difference between the reference and simulated ∆∆G◦ is 0.98 kcal/mol . Our analysis suggests that the residuals for each sequence, which on average have

4

ACS Paragon Plus Environment

Page 4 of 47

Page 5 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

an error of 0.66 kcal/mol (approximately kB T at 37◦ C), are accurate and precise enough to have predictive value, and set a new standard for precision in such simulations.

Methods Starting Structures A

B

C

Figure 1: The starting structures for the pulling denaturation simulations. Panel A depicts sequence 1, with loop sequence CAGUGC. Panel B depicts sequence 2, with loop sequence GUAAUA. Panel C depicts sequence 3, with loop sequence UUAAUU. Structures are shown as sticks with carbons in cyan, nitrogens in blue, hydrogens in white, oxygens in red, and phosphoruses in orange. The oxygens used to measure the end to end distance are shown as spheres, and their native distances are indicated with dashed lines and by the boldface number in Å. We used three hexamer RNA hairpin loops for which NMR structures and optical melting data were both available. For each NMR structure, we used the first conformer provided in the PDB. Below we provide the sequence, where underlining is used to indicate the residues in the loop, as well as references to the deposited coordinates and to the reported stability of each loop. 1. CGA CAGUGC UCG, corresponding to the loop structure with PDB ID of 1AQO (Fig. 1a). 60 We excised the hairpin loop and closing base pair from the larger structure (CAA CAGUGC UUG, residues 10-21). To stabilize the stem we changed the identity of the nucleotides in the closing helix to two GC pairs by manually changing atom types. This hexaloop caps the iron responsive element that regulates translation of iron associated tran5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

scripts by changing their availability to the translation machinery. The experimental free energy change of unfolding of the adjusted structure at 37◦ C is -0.21 ± 0.39 kcal/mol. 61–63 2. GGC GUAAUA GCC. We used the first conformer from a solution NMR structure reported by Fountain et al. (Fig. 1B). 64 The occurrences of this loop in rRNA tend to have a sugarHoogstein GA pair between the first and last residues of the loop, 65 however in the NMR structure this non-canonical interaction is only partially present. 64 The experimental free energy change of unfolding at 37◦ C is -3.46 ± 0.24 kcal/mol. 66 3. GCG UUAAUU CGC, corresponding to the solution NMR structure deposited in the PDB as 1HS3. 67 The experimental free energy change of unfolding at 37◦ C is -2.36 ± 0.32 kcal/mol. 68 We refer to these chemical systems by the sequence of their loops (for example, sequence 1 will be called CAGUGC). There are two approaches to estimate experimental uncertainty in folding free energy from optical melting experiments: 1. The average uncertainties observed for the ΔH◦ (12%) and ΔS◦ (13.5%) for the database of melting experiments following equation 11 of Xia et al., 69 which accounts for the correlation in ΔH◦ and ΔS◦ . 63,69 2. The uncertainty in the free energy change provided with the melt. To be conservative, we calculated both uncertainties for each sequence and took whichever was greater. We used the statistical method for CAGUGC and UUAAUU, and the uncertainty provided with the melt for GUAAUA. Sequence 1 was edited in the closing stem helix as compared to the sequence studied by NMR and optical melting experiments. Nearest neighbor rules were used to estimate the free energy of unfolding for the revised sequence: 69 the nearest neighbor stacking terms for the removed base pairs were subtracted from the experimentally-determined folding stability and the stacking terms 6

ACS Paragon Plus Environment

Page 6 of 47

Page 7 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

for the new base pairs were added. Because these modifications were made in an A-form helix, where the nearest neighbor rules are most accurate, the estimated error from the nearest neighbor parameter terms was 0.12 kcal/mol, which resulted in a 0.02 kcal/mol increase in the propagated error estimate of the optical melting experiment. 68 We used the error estimates, including parameter covariance, to estimate the overall error. 70

Simulation Protocol We chose to monitor and bias folding along the distance between the 50 and 30 hydroxyl oxygens of the terminal residues of each 12-mer. This simulates the effect of applying force to the hairpin through tethers attached to its 50 and 30 ends, as has been done in single-molecule experiments using optical tweezers. 59 Here we simulate this transition with umbrella sampling, whereby the molecule is restrained to occupy specific positions along the reaction coordinate in independent equilibrium simulations called windows. 27,71,72 The biased distributions of reaction coordinate positions are then unbiased to determine the Free Energy Curve (FEC) using the Weighted Histogram Analysis Method (WHAM) to iteratively minimize the statistical errors in overlapping regions from each simulation. 73 Simulations were run using the GPU code 74 from the AMBER 14 and AMBER 16 packages. 75,76 We simulated four replicas of each window for each molecule. Equilibrium distances were spaced in 1 Å increments from 15 to 60 Å, and a harmonic restraint with a force constant of 3 kcal mol−1 Å−2 was used to bias sampling at each distance. The native distance occupied by the NMR structures chosen was approximately 17 Å for each structure (Fig. 1). The molecular coordinates of the first windows, i.e. the 15 Å equilibrium distance window, were taken directly from the solution structures. Subsequent windows were generated by solvating the the coordinates from the last frame of a 2 ns solvated MD simulation on the previous window. This procedure was repeated independently for each replicate, so that the starting structures are different. Each window was built with truncated octahedral periodic boundary conditions with a 10 Å minimum distance between the solute and the periodic boundary. Each window was solvated with 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TIP3P water with neutralizing sodium and with an additional 1 M NaCl to simulate the conditions of the melting experiments. Each system was maintained at 310 K using Langevin Dynamics with a timestep of 2 fs and with bonds involving hydrogen constrained by SHAKE. 77,78 The collision frequency of the thermostat was 2 ps-1 and pressure was maintained using a Monte Carlo barostat with a 100 step attempt frequency. Each simulation was run for 1 μs. Because there were 46 windows per replica of each pulling simulation, and there were four replicas per molecule, this amounts to 4 × 46 × 3 = 552µs of aggregate simulation time.

Hydrogen Bond Averaging To compute the average number of base-base hydrogen bonds we used cpptraj from the AmberTools Suite. 75 We then post-processed the results using awk to exclude backbone and neighboring base interactions. We averaged this quantity across the four replicas for each window, and computed the standard error of its mean (SEM).

Free Energy Analysis For each window, the 30 to 50 distance was recorded every 10,000 steps (20 ps), leading to 200,000 sampled distances per trajectory. The first 200 ns of each simulation was discarded as equilibration. Umbrella sampling simulations were merged into one free energy curve along the reaction coordinate for each system using WHAM 73,79,80 with the following parameters: a minimum distance of 14.95 Å, a maximum distance of 60.05 Å, with 451 bins and a convergence criterion of 1.0×10−6 kcal/mol for maximum successive difference for the free energy change. 80 The reaction coordinate was broken into two states: slack and stretched divided at 45 Å. The slack state includes the native-like windows near 17 Å. We chose the distance for this division of the two states based on the average number of non-neigboring base-base hydrogen bonds (see the section titled Hydrogen Bonding and Reaction Coordinate Partitioning and Fig. 4).

8

ACS Paragon Plus Environment

Page 8 of 47

Page 9 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

We defined the free energy change of stretching a sequence A as: ! P i≥c Pi ln P (1) i