Residual Conformational Entropies on the Sugar ... - ACS Publications

Jul 28, 2015 - Levana L. Sani,. † and Amber N. Villa. †. †. Department of Chemistry and. ‡. Center of Applied Mathematical Sciences, Universit...
0 downloads 0 Views 6MB Size
Article pubs.acs.org/JPCB

Residual Conformational Entropies on the Sugar−Phosphate Backbone of Nucleic Acids: An Analysis of the Nucleosome Core DNA and the Ribosome Chi H. Mak,*,†,‡ Levana L. Sani,† and Amber N. Villa† †

Department of Chemistry and ‡Center of Applied Mathematical Sciences, University of Southern California, Los Angeles, California 90089, United States ABSTRACT: A nucleic acid folds according to its free energy, but persistent residual conformational fluctuations remain along its sugar−phosphate backbone even after secondary and tertiary structures have been assembled, and these residual conformational entropies provide a rigorous lower bound for the folding free energy. We extend a recently reported algorithm to calculate the residual backbone entropy along a RNA or DNA given configuration of its bases and apply it to the crystallographic structures of the 23S ribosomal subunit and DNAs in the nucleosome core particle. In the 23S rRNAs, higher entropic strains are concentrated in helices and certain tertiary interaction platforms while residues with high surface accessibility and those not involved in base pairing generally have lower strains. Upon folding, residual backbone entropy in the 23S subunit accounts for an average free energy penalty of +0.47 (kcal/mol)/nt (nt = nucleotide) at 310 K. In nucleosomal DNAs, backbone entropies show periodic oscillations with sequence position correlating with the superhelical twist and shifts in the base-pair-step geometries, and nucleosome positioning on the bound DNA exerts strong influence over where entropic strains are located. In contrast to rRNAs, residual backbone entropies account for a free energy penalty of only +0.09 (kcal/mol)/nt in duplex relative to single-stranded DNAs.



INTRODUCTION Base pairing and stacking are known to be major determinants of secondary and tertiary structures in nucleic acids.1 For RNAs, counterions such as Mg2+ also play key roles as they exert stabilizing influence on the fold.2 But much less is known about the role of entropy associated with the conformational fluctuations of the sugar−phosphate backbone. While the free energies corresponding to base−base interactions and counterions are accessible to experiments, there is no direct measurement for the backbone entropy, and quantitative information about the conformational entropy of the sugar− phosphate backbone remains scarce. Theory and numerical calculations can help fill this gap. Backbone conformational fluctuations in nucleic acids are associated with structure on several levels. We illustrate them in Figure 1 using a small RNA as an example. Because nucleic acids necessarily lose conformational freedom upon folding, their folding entropy, ΔSfold, from the open to the folded state is always negative. On the coarsest level, I, conformational freedom is lost whenever tertiary contacts are established. The consequence of the conformational constraints associated with tertiary contacts are depicted in Figure 1I, which shows the kinds of conformational motions typically freed up by the removal of these tertiary constraints. These large-scale shape fluctuations are the ones most commonly thought of as “conformational motions” in nucleic acids.1,2 But even after the tertiary structure of a nucleic acid is fully assembled, backbone © XXXX American Chemical Society

conformational motions continue to persist. On this intermediate level, II, conformational flexibilities remain, concentrated primarily within loop and junction regions, and these are illustrated in Figure 1II. Distorting the backbone to assemble secondary structural elements such as loops and junctions costs entropy even though the loop regions themselves may remain relatively flexible in the folded structure, and the formation of base pairs to construct helices making stem-loop structures costs entropy.3,4 These represent entropic losses on level II. Finally, on the third and finest level, III, residual backbone conformational fluctuations are still active. Figure 1III illustrates this type of residual backbone conformational motions. Even after the relative positions of all of the bases are fixed in the final folded structure, a large number of backbone conformational states still persist. While one may be accustomed to seeing a single backbone conformation between every pair of bases in crystallographic data, there are actually many conformations that are consistent with every configuration of the base positions inside a nucleic acid, and this multitude of backbone conformations is thermally populated at equilibrium. In this work, we refer to this set of conformational states of the sugar−phosphate backbone between any two consecutive bases on the sequence as the Received: May 20, 2015 Revised: July 27, 2015

A

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 1. Illustration of the three levels of nucleic acid backbone conformational motions and their associated entropies. (I) Large-scale shapeshifting motions are freed up by removal of tertiary contacts. (II) Conformational fluctuations primarily persist in loop and junction regions at the secondary structural level, but entropic penalties are associated with the formation of stems, loops, and junctions. (III) Residual backbone conformational motions remain even after the spatial configurations of the bases are fixed.

“residual” backbone conformations, and the corresponding entropy associated with these as the “residual backbone entropy” for each nucleotide segment. Defined in this manner, SIII entropies result from the remaining degrees of freedom on the backbone after secondary and tertiary structures have been fully established, and they are therefore a function of fixed base configurations. Our goal is to develop accurate computational methods to evaluate entropies on all three levels: SI, SII, and SIII. This work focuses on level III residual backbone entropies, SIII. At the outset, one may anticipate the difference in the residual backbone entropy between the unfolded and the folded states of any nucleic acid to be small. The canonical A-form helices in RNA structures and B-form helices in hybridized DNAs are stable, and they do not appear to induce excessive stress along the backbone. The base-to-base distance in folded RNAs is shorter on the average than in unfolded extended chains,5 and this compression may lead to a small entropy cost. There is also evidence that hydrogen bonding between different bases requires different backbone distortions,6 but this differential is also expected to be small. Despite this, there are good reasons for studying SIII residual backbone entropy. First, even though the change in residual backbone entropy upon folding may be small, a theoretical estimate for it would still be useful because their experimental measurements are not readily available. Moreover, small contributions accumulating from many nucleotide segments may render the total residual entropy change for the entire nucleotide sequence significant. Indeed, as we will show below, the residual entropy difference, ΔSIII, between an open RNA and its folded structure could be as large as ∼0.5 (kcal/mol)/nt (nt = nucleotide), which is not insignificant for large RNAs. Finally, since the entropy of folding ΔSfold is the sum of the entropy differences on levels I + II + III, the residual ΔSIII will provide a rigorous lower bound for the magnitude of the total ΔSfold. Nucleic acids derive almost all of their entropies from backbone torsions. While certain bond angles may be more pliable than others, they are of weaker significance than torsions. There is also little or no flexibility in the bond lengths. In each nucleotide, the six backbone torsion angles, α through ζ, represent six degrees of freedom (DOFs). One of the angles, δ, is strongly correlated with the ribose geometry through ν2.7

Including the rotation around the N-glycosidic bond and the slight bond angle flexibilities, the actual number of DOFs is close to 8/nt because some DOFs are only semiflexible. The total conformational volume associated with these DOFs given the configuration of the leading and trailing bases bounding them is a complex function of the relative positions and orientations of the two bases, and the variations in the backbone conformational entropy associated with these DOFs can be surprisingly diverse. In this work, we present a detailed numerical analysis of the backbone entropy in two representative nucleic acid systems: (1) rRNAs in the 23S ribosomal subunit of Haloarcula marismortui (H. marismortui)8,9 and (2) the 147-bp DNA duplex in the nucleosome core particle.10 At 2.20 Å resolution, the crystallographic structure of the 23S ribosomal subunit consists of close to 2900 nucleotides and provides a statistically significant sample in a single molecule for thorough population analysis. On the other hand, the DNA duplex in the nucleosome core particle with 1.67 turns of a helix wrapped around the histone octomer is available at 1.90 Å resolution and it is one of the largest duplex DNA structures that have been solved to date. While there are no direct experimental measurements of residual backbone entropies, we will show that the computed residual backbone entropy values can be indirectly validated through their close correlations to unique conformational features in these two high-resolution X-ray structures.



METHODS The closure algorithm recently reported in ref 11 was used to calculate the per-nucleotide (per-nt) residual conformational entropy along the sugar−phosphate backbone. Using the configurations of all of the bases (their relative positions and orientations) in a RNA or DNA as input, this numerical algorithm evaluates the conformational entropy for every backbone segment between each pair of bases along the sequence. Formally, the closure algorithm is equivalent to a complete direct enumeration of all backbone conformations, subject to bond length and bond angle constraints. The basic method in ref 11 has been extended to account for the segment-to-segment correlation in order to derive a more B

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 2. Geometry of the backbone closure problem for a RNA. (A) Two sequence-neighbor nucleobases fixed in space. Their C1′ and N (N9 for purines or N1 for pyrimidines) atoms are highlighted in blue. The closure algorithm is solved for all main-chain conformations consistent with the given positions of the two bases, while preserving all backbone bond lengths and bond angles along the main-chain segment between these two bases. The closure solution is carried out in two stages: (1) the ribose closure considers the C3′−C4′ of each ribose as a distance constraint (dashed lines) and solves for all torsion angles along the ring consistent with it, and (2) the main-chain closure considers the O3′···O5′ distance as a second constraint (dashed lines) and solves for all torsion angles along the backbone consistent with it. (B) Definition of the RETO variables, {r,η,θ,ω}, between two sequence-neighbor C1′−N units. (C) Schematic of the closure algorithm, showing the four input rotations angles: ψj, ψj+1 for the closure of the main chain and νj, νj+1 for the closure of the two ribose rings. In the main-chain closure, the O3′···O5′ distance (dashed line) is used as the constraint in the RSSR solution. All bond lengths and bond angles along the main chain are preserved by the closure algorithm, except for the P− O5′−C5′ angle, which is indicated by the small dashed arc. (D) Schematic of the ribose closure algorithm, showing for each ribose ring the input angle, ν1, and the output angle, τ. The C4′−C3′ distance (dashed line) is used as the constraint in the RSSR solution. All bond lengths and bond angles are preserved, except for the internal bond angles C3′−C4′−O4′ and C2′−C3′−C4′ indicated by the small dashed arcs. (E) Ribose closure in the context of the three-dimensional geometry of the backbone.

angles between the two C1′−N bonds can be obtained if the virtual torsion angles ψj ≡ t(O4′,C1′,Nj,Nj+1) and ψj+1 ≡ t(O4′,C1′,Nj,Nj+1) (see Figure 2A,C) as well as the torsion angle ν1 = t(O4′,C,C2′,C3′) of each of the two ribose rings (see Figure 2D,E) are specified. In other words, given each RETOj between two adjacent C1′−Nj···Nj+1−C1′, as well as the four torsion angles {ψj,νj,ψj+1,νj+1} (where νj ≡ ν1 for base j), the number of ways (or more precisely, the measure, since space is continuous) the main chain can remain intact while maintaining all chemical bond lengths and bond angles constraints along the sugar−phosphate backbone is given by the analytical closure solution C(ψj,νj;ψj+1,νj+1|RETOj), which is the number of distinct solutions to the closure problem. For each value of ν1 for each ribose there are actually two distinct ring closure solutions too,7 so including this index, f j = 1 or 2, of the ribose closure solution for each sugar j, the closure solution for any segment of the backbone between bases j and j + 1 is given by the function C(ψj,f j,νj;ψj+1,f j+1,νj+1|RETOj). Using the analytical solution for C, which will be described in greater detail later, we can compute the total volume, Ω, of the conformational space available to the entire sugar−phosphate backbone from the 5′ end of the molecule through its 3′ end, given the positions of all the nucleobases’ C1′−N bonds specified through their RETO variables {RETO1, RETO2, ...}, by

accurate formula for the entropy of the backbone. In the following we summarize the essential details, giving emphasis particularly to the extensions developed for the current work. The closure problem involving the main-chain atoms between two nucleobases is illustrated in Figure 2. For each pair of adjacent nucleobases, the relative positions of their C1′− N···N−C1′ atoms are specified by six internal coordinates (N = N1 for pyrimidines or N9 for purines). These C1′ and N atoms are highlighted in blue in Figure 2A. Of the six internal coordinates relating these four atoms, the two C1′−N bond lengths are largely rigid, but the remaining four internal coordinates are variable: the N···N virtual bond length, the (5′)C1′−N···N virtual bond angle, the N···N−C1′(3′) virtual bond angle, and the C1′−N···N−C1′ virtual torsion angle. These four variables are illustrated in Figure 2B for two sequenceneighbor C1′−N pairs. We denote these four variables r, η, θ, and ω, and we will refer to them collectively as the “RETO coordinates”. For notational simplicity, we define b(A,B) to be the bond length (real or virtual) between atoms A and B, a(A,B,C) the bond angle among atoms A, B, and C, and t(A,B,C,D) the torsion angle among A, B, C, and D. With these, RETOj ≡ {rj = b(Nj,Nj+1), ηj = a(C1′j,Nj,Nj+1), θj = a(Nj,Nj+1,C1′j+1), ωj = t(C1′j,Nj,Nj+1,C1′j+1)}. Given configurations of two sequence-neighbor nucleobases j and j + 1, their C1′ and N atom positions are fixed, defining the four RETOj variables between them. Employing native bond lengths and bond angles for the two ribose and the connecting PO4 group, unique solutions for all of the main-chain torsion

N−1

Ω=

∫ dX1 ∫ dX2... ∫ dXN ∏ C(ψj′, fj , νj; ψj+1, fj+1 , νj+1|RETOj) j=1

(1) C

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

ψj+1,f j+1,νj+1|RETOj) grid that represented geometries with steric collision(s) involving any pair of atoms between the two fragments were eliminated from C. This calculation was repeated for every backbone segment j between bases j and j + 1, for j from 1 to N − 1. The total conformational volume, Ω, available to the backbone of the entire chain containing N nucleotides was then computed by a discrete approximation of the integral in eq 1, evaluated as N − 2 consecutive matrix multiplications involving two (m × 2n) × (m × 2n) matrices at a time. With this matrix multiplication approach, the total number of backbone conformations which have been enumerated and screened by Ω is equivalent to (m × 2n)N−1. The conformational entropy over the entire chain is then given by SIII/R = ln Ω. The per-nucleotide entropy is defined to be the incremental SIII/R when one additional nucleotide is added to the end of the chain. In the original algorithm,11 the base-to-base coupling inherent in the C functions had been left out. Instead of performing the full integral in eq 1, the per-nucleotide entropy was computed for each backbone segment between the nucleobases j and j + 1 by the approximation Sj/R ≈ ln ∫ dXj ∫ dXj+1C(ψj′,f j,νj;ψj+1,f j+1,νj+1|RETOj) which made the computations simpler. Due to this, the entropy values reported with the original algorithm11 were different from the ones reported in this work because they had been computed against a different reference volume. The entropy values reported in this work include the correlations between sequence-neighbor nucleotides represented by the integrals in eq 1. Backbone Closure. Given the C1′−N···N−C1′ RETO variables {r,η,θ,ω} between two adjacent nucleobases on the sequence, the backbone entropy calculation begins with the mathematical solution of the chain closure problem. There are variations in how this problem is handled,7,11,13−15 but all of them essentially employ the mathematics of inverse kinematics to solve for the backbone torsion angles capable of maintaining bond connectivities along the backbone given the native bond lengths and bond angles in the intervening molecular structure. Our approach is based on a two-step procedure (see Figure 2), as follows. Step a: Ribose Closure. For each of the sugars, we can completely define its ring geometry given one input parameter, the ν1 torsion angle shown in Figure 2E. The closure solution for each sugar is illustrated schematically in Figure 2D. With the plane containing the O4′, C1′, and C2′ atoms as the reference, ν1 specifies the out-of-plane dihedral angle between the C3′ atom and the reference plane and thus defines the position of C3′ uniquely. The ring is closed by solving for the position(s) of C4′, defined by the dihedral angle t(C2′,C1′,O4′,C4′) denoted τ in Figure 2D. The proper solutions for τ are those that produce the correct b(C4′,C3′) bond distance, indicated by the dashed line in Figure 2D. This kind of kinematic closure is referred to as a RSSR problem in robotics, and its solution is wellknown16,17 (see details later). Notice that in this step all native bond lengths and bond angles are preserved, except for the two internal bond angles a(O4′,C4′,C3′) and a(C4′,C4′,C2′), indicated by the dashed arcs in Figure 2D. The general solution of this part of the closure problem has at most two discrete roots for τ given each ν1. The feasibility domain for ν1 is bounded by the two values ±ν*1 , where the two solutions for τ collapse into a double root. The same procedure is carried out for every ribose separately. The solutions to the ribose ring closure problem correspond physically to different puckering geometries.

where N is the total number of bases along the sequence and the integral over each differential volume element ∫ dXj is

+ν1* defined as ∑f2j=1 ∫ 2π 0 dψj ∫ −ν1* dνj, with [−ν1*,+ν1*] being the feasibility domain for the closure of each ribose.7,11 The rotation angles ψ′j and ψj+1 in eq 1 are referenced to their respective RETOj frame for each C. Because the two consecutive frames RETOj−1 and RETOj share nucleobase j, the rotation angle ψ′j referenced to RETOj is different from but related to the angle ψj referenced to the previous RETOj−1 via a simple offset ψj = ψ′j + ξj, where ξj ≡ t(N j−1,Nj,C1′j,Nj+1). Also, since the transformation from atomic coordinates of the mainchain atoms to backbone torsion angle space with fixed bond length and bond angle constraints involves an approximately constant Jacobian (see later), the integral Ω in eq 1 is identical to the conformational volume available to the sugar−phosphate backbone in Cartesian space to within a constant factor. Numerically, the integral in eq 1 can be computed by approximating each C(ψ′j,f j,νj;ψj+1,f j+1,νj+1|RETOj) on a grid. Because the C function involves interactions between nearestneighbor pairs only, the discretized integrals can be computed as a sequence of consecutive matrix multiplications. Since the nucleobases along the sequence have different configurations, the RETO variables {RETO1, RETO2, ...} which specify the relative positions and orientations between nearest-neighbor bases are in general distinct. As a result, the closure problem must be resolved for every main-chain segment between two bases j and j + 1 given its specific RETOj, yielding C as a function of the variables (ψ′j,f j,νj;ψj+1,f j+1,νj+1). Ω in eq 1 gives the total backbone conformational volume available to the sugar−phosphate backbone while chain connectivity is maintained subject to fixed bond length and bond angle constraints. But not all of these conformations are allowed because steric clashes may occur between backbone atoms. Within a backbone segment between bases j and j + 1, closure solutions involving any combination of (ψ′j,f j,νj;ψj+1,f j+1, νj+1) that produces collision(s) between any atom on the ribose attached to base j with any atom on the ribose attached to base j + 1 must be excluded from C. An analytical procedure for identify these sterically disallowed combinations will be described later. These offending combinations were first removed from the closure solution by setting C = 0 for them before eq 1 is employed to compute the conformational volume. To numerically evaluate the integral for Ω, all possible ribose geometries were first selected by the ribose closure solution7 and precomputed on a grid of m = 16 ν1 angles spanning its feasibility domain [−ν1*,+ν1*] and stored. This is possible because ribose closure solutions are independent of the base configurations (i.e., they are not functions of RETO). Standard nucleotide geometries12 were used for both the ribose closure and the closure of the remainder of the main chain. Whereas ribose closure can be precomputed, main-chain closure must be performed explicitly between every pair of nearest-neighbor bases j and j + 1 since the domain for RETOj is large and the functional form for C on it is complex. To do this, each of the two rotation angles ψ′j and ψj+1 was discretized on a grid with n = 128 points on its domain [0,2π], and the closure solution C(ψ′j,f j,νj;ψj+1,f j+1,νj+1|RETOj) was computed on the sixdimensional grid (ψ′j,f j,νj;ψj+1,f j+1,νj+1) point-by-point producing a (m × 2n) × (m × 2n) matrix. Using a collision distance of 3.2 Å for every pair of non-hydrogen atoms between the two atomic fragments on the two bases, points on the C(ψj,f j,νj;

D

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B Step b: Main-Chain Closure. With the ribose closure solutions defined in step a for the two sugars, νj and νj+1 in Figure 2C are specified and the ribose rings can be affixed onto the four-atom C1′−N···N−C1′ frame with its given RETO geometry. The second part of the closure problem is to append the O−P−O group onto the backbone in order to bridge the O3′ on the left sugar with the C5′ on the right sugar. To find the solution, the t(O4′,C1′,Nj,Nj+1) virtual torsion on the left (denoted ψj in Figure 2C) and the t(Nj,Nj+1,C1′,O4′) virtual torsion angle on the right (ψj+1 in Figure 2C) must first be specified. With these and the ribose solutions defined previously, the positions of all atoms shown in Figure 2C on the 5′ side up to the O3′ are fixed and all atoms on the 3′ side from the C5′ onward are also fixed. The variable γ torsion angle on the right in Figure 2C then defines the position of the O5′ atom on the 5′ side. Given fixed bond lengths and bond angle in the O−P−O fragment, the distance between the two oxygens O3′ and O5′, indicated by the thick dashed line in Figure 2C, is a constant. Solving for the values of γ that produce the correct b(O3′,O5′) distance is another RSSR problem, giving rise to a maximum of two roots. Once the proper solution for γ is found, the a(C3′,O3′,P) bond angle constraint is finally met by rotating the P atom around the O3′···O5′ axis to produce the correct angle, yielding again a maximum of two possible solutions. Combining these there are a maximum of four solutions for this part of the closure problem. Notice that all bond lengths and bond angles along the backbone starting from C5′ on the 5′ side through O3′ on the 3′ side, including C4′, C3′, O3′, P, O5′, C5′, C4′, and C3′, are preserved, except for the a(P,O5′,C5′) bond angle (indicated by the dashed arc in Figure 2C). To summarize, given the RETO variables {r,ηj,θj,ωj} between two adjacent nucleobases and after specifying the angles (ψj,νj,ψj+1,νj+1) as inputs, the procedure described previously produces up to 16 possible discrete solutions for the closure of the backbone, preserving all bond lengths and bond angles except for the three angles indicated by the dashed arcs in Figure 2C,D. The two nonconserved internal bond angles on the ribose fall largely within their expected range of values because the number of puckering solutions of the ring is actually rather limited. The nonconserved P−O5′−C5′ angle is also largely consistent with expected values because excluded volume between the two main-chain fragments strongly constrains the range of backbone closure solutions. RSSR Problem and Its General Solution. The analytical solutions for the closure of the ribose as well as the backbone were formulated as RSSR problems. Below we will show that the RSSR problem can also be used to solve for the conformational volumes excluded by main-chain atom steric collisions. Given its centrality, we summarize here some of the key features of the RSSR problem and its solution.16,17 In robotics, the RSSR mechanism describes a large class of kinematic assemblies formed by two independent rigid members each mounted onto a fixed frame via a revolute joint, but these two rotating elements are also connected by a fixed-length bar. An example of a RSSR assembly is shown in Figure 3, where each of the units on the left and right is internally rigid but is free to rotate about an axis dictated by its revolute (R) joint. The left and right elements are linked by a rigid bar of length L, connected to each via a spherical (S) joint. Because of the constraint L, the rotation angles χL and χR are coupled. The possible combinations {χL,χR} subject to this

Figure 3. Illustration of a RSSR problem. The left element is internally rigid but free to revolve about the rotation axis (dotted line) defined by its revolute (R) joint. The same is true for the right element. The relative positions and orientations of the two (R) joints are input parameters defining the RSSR geometry. The end points of the left and right elements are connected by a rigid rod via spherical (S) couplers. L, the length of the rigid rod, is the constraint. χL may be considered as the input variable, and χR, the output. For every input χL, there are at most two solutions for χR. The domain in χL within which real solutions for χR exist is called the “feasibility domain”. The boundaries of the χL feasibility domain are given by a quartic equation. (See text for details.)

constraint are given by the roots of an equation with the following general form: 0 = [A cos(χL ) + B sin(χL ) + C ] cos(χR ) + [D cos(χL ) + E sin(χL ) + F ] sin(χR ) + G cos(χL ) + H sin(χL ) + I

(2)

where the coefficients A through I are functions of the geometry of the problem and the constraint L. If we treat one angle, say χL, as the input, the solutions for the output angle χR are given by the following equation: k 2 cos(χR ) + k1 sin(χR ) + k 0 = 0

(3)

Using the half-angle formula, we can make the substitutions cos(χR) = (1 − sR2)/(1 + sR2) and sin(χR) = 2sR/(1 + sR2), with sR = tan(χR/2), to rewrite eq 3 as a quadratic equation in sR. The solution of this quadratic equation yields two roots for sR as long as its discriminant Δ is positive definite. It is easy to show that Δ = k22 + k12 − k02. The domain in χL within which any real solution for χR exists, the so-called “feasibility domain”, is defined by the condition Δ > 0. Again, using the half-angle substitutions cos(χL) = (1 − sL2)/(1 + sL2) and sin(χL) = 2sL/(1 + sL2) for χL in Δ = k22 + k12 − k02, Δ itself can be written as a function of sL. The boundaries s*L of the feasibility domain can then be located from the condition Δ(sL*) = 0, which is a quartic equation in sL*. In this way, the entire feasibility domain in χL can be determined without having to first solve for χR. For every point within the feasibility domain of χL, there are two real solutions for χR, given by sR = (−k1 ± Δ)/(k0 − k2). Corresponding to each solution, there is a Jacobian involved in the transformation from Cartesian coordinates to the coordinate system that is defined by the internal degrees of freedom and the constraints involved in the RSSR solution. The derivation of this Jacobian is rather tedious, but the result turns out to be just Δ−1/2. Over the entire feasibility domain in χL, the total Jacobian is therefore given by J = ∫ *0 2π dχL2/(k22 + k12 − k02)1/2, where * signifies that the integral is taken only over the interior of the feasibility domain in χL. When carried out over sL, this integral takes on the general form of an elliptic ∞ integral18 J = ∫ *−∞ dsL [1/(a4sL4 + a3sL3 + a2sL2 + a1sL + a0)1/2]. The coefficients a0 through a4 are functions of the coefficients A through I and constraint L, and they are defined solely by the E

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 4. Comparison between 200,000 statistically uncorrelated RETO samples from sequence-neighbor C1′−N···N−C1′ collected from an all-atom computer simulations of a short RNA strand against their predicted probabilities according to the entropies computed by the closure algorithm represented by contours, shown on various two-dimensional projections in (r,η,θ,ω) space.

the 3′ residue, we imagine they constitute the left and right elements in a RSSR mechanism such as the one depicted in Figure 3 and look for solutions to this RSSR problem with the constraint L set equal to the minimum-approach collision distance. For every ψj+1 there are at most two real RSSR solutions for ψj, and these mark the boundaries of the excluded region in (ψj,ψj+1) space. If no real RSSR solution exists, then either the entire domains for both ψj and ψj+1 are allowed or they are entirely disallowed. Repeating this for every pair of atoms AL and AR, the sterically forbidden domain for the mainchain segment between bases j and j + 1 is given by the union of the excluded regions for all of the (AL,AR) atom pairs. This sterically excluded domain is then removed from C(ψj,f j,νj; ψj+1,f j+1,νj+1|RETOj). Within this algorithm, the only difference between a RNA and a DNA is the absence of the O2′ hydroxyl in deoxyribose nucleic acids. The results below show that this distinction nonetheless gives rise to significantly different backbone entropies between DNAs and RNAs. The collision distance was chosen to be 3.2 Å between all non-hydrogen atoms. This value was selected to match the nonbonded van der Waals radius assumed in most force fields employed for typical biomolecular simulations, e.g., Amber19 or CHARMM.20 In the original algorithm,11 the collision distance was taken to be 2.5 Å. The more aggressive steric cutoff of 3.2 Å used here produced better agreement with results from allatom computer simulations, which will be discussed later. Using this larger collision distance also yields lower overall entropy values compared to the original algorithm.11 Validation of Entropy Algorithm against All-Atom Simulations. To validate our entropy algorithm, we compared them against results from an independent all-atom simulation using short single-stranded RNAs. We have carried out simulations on a number of two-nucleotide constructs flanked by two AA residues on both the 5′ and 3′ ends (6 nts in total),

geometry of the RSSR assembly. Again, the integral is carried out over the interior of the feasibility domain, where only real roots of the quartic equation 0 = a4sL4 + a3sL3 + a2sL2 + a1sL + a0 exist. In our computations, we have compared the entropies calculated using the closure solutions with and without the integrated Jacobian J. The difference between them is typically small. This is because the conformational volume is strongly dominated by regions within the feasibility domain where the Jacobian varies slowly. Close to the feasibility domain boundaries, the Jacobian begins to vary rapidly and eventually diverges, but the contribution of these boundary regions to the integral is of measure zero. Excluded Volume in Backbone Conformation Space due to Main-Chain Steric Collisions. The backbone closure solution described previously for finding C(ψj,f j,νj;ψj+1,f j+1,νj+1| RETOj) as a function of (ψj,νj;ψj+1,νj+1) given the RETOj frame for each main-chain segment does not account for steric collisions. Clashes between main-chain atoms from the two sugars attached to bases j and j + 1 will create excluded volume in conformational space. These disallowed regions must be eliminated from the function C(ψj,f j,νj;ψj+1,f j+1,νj+1|RETOj) for every backbone segment before C can be used to compute the total conformational volume, Ω, in eq 1. The procedure for identifying sterically disallowed regions in (ψj,νj;ψj+1,νj+1) space is based on the same solution to the RSSR problem described earlier. Collisions may arise from steric contacts between any atom (C5′ to O3′) on the 5′ residue shown on the left in Figure 2C with any atom (C5′ to O3′) on the 3′ residue on the right. For every choice of νj and νj+1, the positions of these atoms relative to their own rotation axis are fixed. To check for possible collision between an atom AL ∈ {C5′,C4′,O4′,C1′,C2′,O2′,C3′,O3′}j from the 5′ residue and another atom AR ∈ {C5′,C4′,O4′,C1′,C2′,O2′,C3′,O3′}j+1 from F

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B using the Loops all-atom simulation program21,22 with a modified Amber ff94 force field19 where all base−base interactions have been removed and main-chain LennardJones interactions replaced by its purely repulsive part. This modification to the Lennard-Jones potential was done in order to eliminate contaminations from other energy terms in the force field such as base stacking and pairing, electrostatics, and solvation and try to isolate the intrinsic effects of steric interactions on the backbone conformational entropy. Equilibrium ensembles of these constructs were generated by exhaustive sampling. Since the Loops algorithm is able to strictly preserve bond lengths and bond angles while carrying out large-scale loop reconfigurations,21,22 the ensemble of equilibrium structures obtained from these simulations therefore reflects the steric interference of the main chain with itself in its influence on the intrinsic backbone conformational entropy of these strands. Details of these calibration simulations can be found in ref 11. 200,000 statistically uncorrelated sequence-neighbor C1′− N···N−C1′ RETO samples were collected from a number of long simulations. The statistical frequency of occurrences of different combinations of RETO variables yielded estimates for the backbone entropic volume. Estimating entropy directly from simulation data requires histogramming the samples. But since the entropy is a complex function of the four RETO variables, obtaining its value for specific combinations of {r,η,θ,ω} is numerically difficult. Instead of directly comparing the calculated entropies to estimates from the simulation, we instead examined the statistical occurrence of RETO geometries from the simulations and compared them against their predicted probabilities according to the entropies calculated from the closure algorithm, P ∝ exp(SIII/R). Figure 4 shows distributions of the RETO geometries collected from the simulations, projected onto various planes in (r,η,θ,ω) space, with their predicted probabilities, P, shown as contours. Comparing simulation samples from Cartesian space to their distribution functions in RETO space requires a Jacobian J = r2 sin(η) sin(θ), which has been accounted for in the comparisons presented in Figure 4. Monte Carlo Simulations Based on Backbone Entropies. The backbone entropy formula described previously can also serve as a basis for the computer simulations of RNAs and DNAs. This approach was used to establish an estimate for the per-nucleotide entropy values for unfolded RNAs shown in Figures 6 and 7, and also for unhybridized single-stranded DNAs in Figure 10. In these Monte Carlo simulations, the nucleic acid backbone was described by a free energy function F0 = −RT ln Ω, where Ω is the total conformational volume of the polymer backbone from eq 1. The positions of the N nucleobases specify the N − 1 RETOj values in eq 1, as well as the N − 2 ξj variables. After performing the integrals in eq 1 by the closure algorithm, F0 becomes a function of the nucleobase configuration, μ ≡ {RETO1,ξ2,RETO2,ξ3,...,ξN−1,RETON−1}. The probability of observing a particular nucleobase configuration μ in the canonical ensemble is then given by the Boltzmann weight exp[−F0(μ)/RT]. To describe interactions among the bases, an additional energy term Ebb was added to F0. Since the objective of our simulations was to sample unfolded chain configurations with no base stacking or base pairing, only steric interactions among the bases were included in Ebb. Representing the bases by explicit atoms, only the repulsive branch in the van der Waals

(vdW) interactions was retained by using the Week− Chandler−Andersen (WCA) modification,23 with their atomic parameters taken from Amber ff94.19 Each base j, assumed to have standard geometries12 and no internal motions, branches off from the main-chain configuration μ via its (C1′−N)j bond with a free rotation angle, ϕj. The entire configuration of all of the nucleobases, including these base rotations, is then described by the augmented state vector μ′ = {RETO1, ξ2,RETO2,ξ3,...,ξN−1,RETON−1;ϕ1,ϕ2,...,ϕN}. Since only the repulsive branch of the nonbonded atomic interactions between bases had been included in these simulations, the results were relatively insensitive to the precise vdW parameters selected. We have verified using vdW parameters from different force fields that the results represented in Figure 7 are in fact robust. Main-chain−main-chain interactions were included via a second energy term Emm. Whereas the steric collisions between main-chain atoms within a segment between sequenceneighbor nucleobases have been accounted for in the backbone entropy algorithm, the possible collisions between distal segments on the chain must be added to the free energy function explicitly. To do this, we assigned a collision distance of r0 = 3.2 Å to every pair of non-hydrogen atoms (Aj,Ak), where Aj ∈ {C5′,C4′,O4′,C1′,C2′,O2′,C3′,O3′}j and |j − k| > 1, using the WCA potential EWCA(r) = 1.0 kcal/mol [(r0/r)12 − (r0/r)6 + 1] where r is the distance between Aj and Ak in angstroms. Since the positions of the main-chain atoms have been integrated over according to the closure solution to produce the backbone entropies, backbone atoms (except for the C1′−N atoms whose positions are defined by the {RETOj} variables) are no longer explicit (see illustration of SIII conformational fluctuations in Figure 1). Mean positions of the main-chain atoms derived from the backbone entropy were used as their averaged atomic coordinates for the computation of Emm. Similarly, the collisions between base atoms and the main chain were accounted for by a third energy term, Ebm, again using the WCA potential EWCA for each pair of atoms (Aj,Bk), where Aj ∈ {C5′,C4′,O4′,C1′,C2′,O2′,C3′,O3′}j are the main-chain atoms and Bk are the explicit atoms on base k. For the Monte Carlo moves, unbiased translations and reorientations were used for the (C1′−N)j bonds and random rotations employed for the base rotation angles ϕj. These were used to generate trial states μ′ = {RETO1,ξ2,RETO2,ξ3,..., ξN−1,RETON−1;ϕ1,ϕ2,...,ϕN} which were accepted using the standard Metropolis criterion based on the Boltzmann weight exp(−F/RT) with the free energy function F = F0 + Ebb + Emm + Ebm. All simulations were carried out at 310 K. The nucleobases were modeled as explicit atoms whereas the mainchain atoms were described by the closure algorithm according to the entropy formulation discussed earlier. Starting from a hairpin configuration, the unfolding time for the 24-nt RNA reported in Figure 7 was roughly 100 MC passes. After this, the system equilibrated rapidly. The ensemble of chain conformations derived from this Monte Carlo simulation was compared against the calibration data from all-atom simulations shown in Figure 4, and the distributions from the two different simulations were consistent with each other. The much higher ergodicity of the entropy-based MC simulations allowed us to generate a much more reliable entropy distribution for unfolded RNAs, which is shown in Figure 6 as the red dashed curves and used in our analysis of the rRNA entropy data described later. This Monte Carlo simulation was also used to generate the unhybridized singlestranded DNA entropy distribution shown in Figure 10. G

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 5. Per-nucleotide residual backbone conformational entropy, SIII, plotted in units of R, the gas constant, as a function of sequence position on the 23S rRNAs. The white dashed line indicates sample mean −1.75. The red dotted line indicates the open-chain value −0.98.

Figure 6. Distributions of per-nucleotide residual entropy values in the 23S rRNAs and various subpopulations. (A) All 23S residues as the black dashed line (shaded) and distributions for unfolded RNAs as red dashes in all four panels. (B) Watson−Crick-paired nucleotides (solid black line, shaded), non-Watson−Crick-paired nucleotides (blue dashes), and unpaired nucleotides (purple solid line). (C) Residues with surface accessibility > 250 Å2 (dotted dashed line, shaded) and those with surface accessibility < 100 Å2 (green solid line). (D) Nucleotides participating in A-minor motifs − A-minor forming A residues (dashed line, shaded) and the B/C nucleotides (orange solid line).



RESULTS AND DISCUSSION Residue Backbone Entropies in the 23S Ribosomal Subunit. Contribution of Residual Backbone Entropy to the Folding Free Energy of RNAs. The large ribosomal subunit in the X-ray structure of H. marismortui9 (PDB ID 1VQO) consists of close to 2900 nucleotides (nts). The per-nt backbone entropies are shown in Figure 5 in units of R = 1.987 cal/mol K, along the sequence. In these units, two nucleotide segments with an entropy difference of ΔSIII/R = 1/ nt have a 2.7-times difference in their accessible conformational volumes. The horizontal white dashed line in Figure 5 represents the mean per-nt value of S/R = −1.75/nt over the entire 23S subunit. A histogram of the entropy values (bin size = 0.25) is shown in Figure 6A as the dashed line (shaded). The standard deviation of this distribution is σ = 1.15/nt. The distribution is not Gaussian and is asymmetric especially on the

low-entropy tail, but there are almost as many samples on the negative side of the mean compared to the positive side. Analysis shows no strong correlations in the per-nt entropy values along the sequence, indicating that any observed entropic strain on a residue is primarily localized to that nucleotide segment. The rest of the panels in Figure 6 show distributions for various subpopulations, and they will be discussed shortly. First, we want to derive an estimate for the cost in residual backbone entropy required when a typical RNA folds. Given the distribution of the per-nt residual entropy values SIII in the 23S subunit, we can evaluate the difference in the residual entropy, ΔSIII, between an unfolded chain and the folded 23S structure. Since there are no X-ray data for unfolded chains, we generated an equilibrium ensemble of unfolded structures numerically using a Monte Carlo simulation (see Methods). H

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

quite large. Including entropic contributions SI and SII would increase ΔS further. Based on ΔSIII alone, if a RNA is to fold, the stabilization coming from enthalpic contributions to the free energy must be at least −0.47 kcal/mol or larger per nucleotide on the average to counteract the residual backbone entropic losses. Figure 6A compares the distribution of per-nt SIII values in the 23S subunit (black dashed line, shaded) against unfolded chains (red dashed line). The distribution of the folded 23S structure is bimodal. There is a high density of SIII/R values around the mean of −1.75/nt, and a small shoulder on the high-entropy side at ∼−0.2/nt. The unfolded chains, on the other hand, have a smooth distribution. Its low-entropy tail extends quite far to the left below −4/nt, approximately two standard deviations from the mean. Noticeably, the low-entropy tail of the 23S population is quite similar to the unfolded chains. This suggests that extremely low-residual-entropy nucleotide segments occur in the folded structure at roughly the same frequency as in the unfolded structure, and their presence in folded RNAs does not necessarily have any physical significance for the fold stability. Correlating Residue Backbone Entropy Values to Specific Structural Features in the Large Subunit. To examine the 23S SIII distribution more closely, we divided the population into subclasses corresponding to structural features. The three different classifications we considered were base pairing, surface accessibility, and tertiary interactions. The majority of residues in the 23S subunit are located within helices. We can use this subclass to estimate the residual backbone entropic costs associated with base pairing. Using the base pairing annotations from the NDB database24,25 for the H. marismortui 23S subunit to identify the Watson−Crick (WC) and non-Watson−Crick (nonWC) pairs, we have analyzed the entropy of the backbone segments associated with these pairs. Figure 6B shows the histogram of the entropies of the cis or trans Watson−Crick-paired bases as the black line (shaded). Not surprisingly, the distribution of SIII values for these WC bases is similar to the distribution over the entire sample (Figure 6A, shaded area). The WC distribution, however, is definitively narrower and more peaked at the sample mean of −1.75/nt, while the high-entropy shoulder around −0.2 has largely disappeared in the WC subpopulation. To within statistical uncertainty, the mean SIII/R value for the WC subpopulation is similar to the mean over all residues at ∼−1.75/nt. Therefore, the cost in residual backbone entropy SIII/R going from an unfolded chain into a helix with canonical base pairs is about 0.7/nt. Comparing the WC distribution (shaded area, Figure 6B) to the unfolded chains (red dashed curve, Figure 6B), the lowentropy tails of both distributions are almost identical. This suggests that nucleotides involved in canonical base-pair interactions have an intrinsic low-entropy population that arises naturally from thermal fluctuations, and this low-entropy population does not seem to contribute materially to helix stability. The SIII/R distribution for nonWC paired bases (dashed blue line, Figure 6B) shows a depletion in the peak around the sample mean of −1.75, and a concomitant enhancement in the high-entropy shoulder around −0.2. Comparing the WC and nonWC bases to those that are not involved in any base pairing at all (nonBP, solid purple line, Figure 6B), the non-base-paired residues have a much more prominent high-entropy shoulder. The high-entropy leading edge of the nonBP residues (purple

Simulating the entire 23S sequence is computationally too expensive, but one can obtain an accurate distribution of the per-nt SIII values using essentially any sequence, provided the chain is long enough to eliminate end effects. In Figure 7, we

Figure 7. Per-nucleotide residual backbone entropy in a 24-nt RNA stem-loop structure during a Monte Carlo simulation. (A) Entropy value for a single nucleotide, showing rapid fluctuations in its SIII/R value. (B) Average per-nt entropy over the sequence in the same simulation as a function of Monte Carlo passes. Snapshots show unfolding of the helix with an initial average SIII/R value of ∼−2/nt, to the final equilibrium value ∼−1/nt.

show typical Monte Carlo results for a 24-nt RNA sequence AGUCU CAGGG GAAAC UUUGA GAUA. Figure 7A shows representative SIII values for one particular nucleotide during the simulation, while Figure 7B shows the average per-nt SIII over the entire sequence. At the beginning of the simulation shown in Figure 7, the initial conformation of the chain started with a stem-loop structure. All interactions other than backbone entropy and excluded volume were turned off, and the chain unfolded during the simulation (Figure 7B shows snapshots at different points). Figure 7B shows that the average per-nt SIII value increased during the simulation, from the initial value of SIII/R ∼ −2/nt to −1/nt as the chain unfolded. Using a long Monte Carlo trajectory, a large equilibrium ensemble of 160,000 unfolded structures was obtained. Analysis of this ensemble then yielded a mean per-nt value of SIII/R = −0.98/nt (uncertainty < 0.1%) for the unfolded chains. This open-chain average is indicated on Figure 5 by a horizontal red dotted line. This value is slightly higher than the average SIII/R of −1.75/nt in the folded rRNA, indicated by the white dashed line in Figure 5. The corresponding histogram of the unfolded per-nt equilibrium SIII values is displayed in Figure 6A as the red dashed line. Comparing the mean value of SIII/R for the unfolding chain (−0.98/nt, red dotted line, Figure 5) and the 23S subunit (−1.75/nt, white dashed line, Figure 5), we arrive at an estimate ΔSIII/R ∼ −0.77/nt when the RNA folds from an open structure into its tertiary fold. At T = 310 K, this entropy difference corresponds to a free energy penalty −TΔSIII of approximately +0.47 (kcal/mol)/nt. Since the 23S subunit consists of thousands of nucleotides, the total entropic penalty ΔSIII due to the residual backbone entropy alone can indeed be I

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 8. Per-nt residual entropies in the nucleosomal DNAs. (A) Entropies for the two strands across each complementary pair and (B) the weak correlations between them. (C) Entropies from A averaged over every 5 points showing oscillatory components, and (D) power spectrum showing the most prominent repeats highlighted in cyan. (E) Entropy for one strand plotted with the other strand reflected across the dyadic axis (white triangles), and (F) the strong correlations between them. (G) Base-pair-step shifts calculated in ref 36 and their correlations with chain J entropies in panel H and chain I in panel I.

highest surface exposure. Figure 6C shows the per-nt SIII values in the subpopulation with 250 Å2/nt in the shaded area under the dotted−dashed line. Those nucleotides that are highly exposed clearly have much higher entropies also, and they significantly enrich the peak near −0.2. This provides strong evidence that the residual backbone entropic strains on peripheral residues in a folded RNA are significantly milder than residues in the core. However, we should point out that the 23S subunit is associated with a number of ribosomal proteins inside the 50S structure, but we have applied the surface accessibility calculations only to the RNA subunits, ignoring the influence of the proteins. Despite this, the correlation between high surface accessibility and low entropic strain is clearly quite strong. It was suggested recently31 that a pseudo-free-energy term could be deduced from SHAPE assays to guide RNA structural determination, and a quantitative comparison between SIII backbone entropies and SHAPE reactivities may prove valuable for these experimental studies, and this study is currently under way. Complex Relationships between Tertiary Interactions and Residual Backbone Entropies in Profile of A-Minor Motifs. Previously, we have made the conjecture that entropic strains on the backbone may be associated with tertiary structure formation; i.e., in assembling the requisite tertiary interactions

solid line, Figure 6B) is almost identical to the leading edge of the unfolded chain (red dashed line, Figure 6B), indicating that the residual backbone conformations corresponding to the high-entropy population of the nonBP residues are largely similar to those that arise naturally from thermal fluctuations in unhindered, open chains. Strong Correlation of Residual Backbone Entropy and Surface Accessibility in the 23S Subunit. While there are thermodynamic experiments for studying melting and loop formations in RNAs,1,3,4,26−28 there is as yet no direct experimental method available that can measure the backbone entropy associated with individual nucleotides. The conformational motions associated with loop formation belong to level II entropies, SII. But recently SHAPE assay29−31 has emerged as a powerful structural tool for determining the surface accessibility of individual nucleotides within a folded structure. It has been suggested that the reactivity of the SHAPE reagent with the nucleic acid backbone may possibly be related to the increased entropic freedom at those residues with the highest surface accessibility.31 If this is true, SHAPE reactivities may provide a semiqualitative measure of the per-nt backbone entropy. To ascertain whether this conjecture is correct, we have computed the surface accessibility of every residue in the 23S subunit using the Naccess program32 and analyzed the entropy distribution of those subpopulations of nucleotides with the J

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

The per-nt backbone entropies along the two DNA strands, chains I and J, are shown in Figure 8A as the red line (open circles) and blue line (closed diamonds), respectively. Notice that the SIII/R values in Figure 8A are plotted for chain I (red) in the 5′ to 3′ direction from left to right (indicated by the arrow on the bottom of the diagram), while they are plotted for chain J (blue) in the 5′ to 3′ direction backward from right to left (indicated by the top arrow). Hence, Figure 8A shows SIII/ R values for the two strands according to their base-paring complementarity. Figure 8A indicates that the overall residual backbone entropy values of both strands seem to show a sawtooth pattern, with very distinct alternating high and low values, but the alternations are also interrupted at specific positions, where the alternation changes phase. But noticeably, the entropy values of the two strands do not correlate with each other. While in some neighborhoods the alternating patterns of the two curves are in phase, their values in some other regions are out of phase, yet in others the alternating pattern of both are muted (e.g., between positions −54 and −38). Figure 8B shows a scatter plot of the SIII/R values in chain I (x-axis, values read from 5′ to 3′) against chain J (y-axis, value read from 3′ to 5′). Plotted this way, the SIII/R values should show a positive correlation of +1 between the strands if a nucleotide segment on chain I bears entropic strain identical to its complementary nucleotide segment on chain J. Figure 8B shows no such correlation at all. Instead, there appears to be a very weak negative correlation between the two strands (indicated by the dotted line), with a correlation coefficient of −0.12. This implies that residual entropic stresses on the DNA backbone are not generally shared between the two complementary strands per base-pair step. On the contrary, there appears to be a weak anticorrelation between them. As we will show below, this is consistent with a detailed geometric analysis of the nucleosomal DNA,36 which reveals that the superhelical twisting of the duplex is uneven along its contour. Other than the alternating patterns of entropies evident in Figure 8A, there are more subtle longer-range oscillations in the residual entropy values along the backbone. This is made more clear in Figure 8C when the entropy values from Figure 8A are replotted using a running average over five nucleotides at a time. To obtain a frequency spectrum of the SIII/R values, we have performed a Fourier transform of the per-nt entropy values in Figure 8A. Figure 8D shows the resulting power spectrum. The spectrum reveals a number of prominent repeat frequencies (highlighted in cyan, Figure 8D) along the sequence, with periodicity of 5, 11, 16, and 21 nucleotide units. A periodicity of 11 nts is consistent with the standard repeats found in dsDNAs37 and also observed in the twist of the superhelical curvature of the duplex around the histone core identified by Richmond and Davey.36 16-nt repeats are associated with histone-DNA contact points, because the interactions between the H3/H4 histones with the dsDNA spans approximately 30 nts from positions 1 to 30, whereas the duplex is exposed to histones H2A/H2B between positions 33 and 62. From Figure 8A we see that the amplitude of oscillations in the SIII/R values between positions 1 and 30 where the DNA contacts H3/H4 are stronger than in the region where H2A/H2B contact occurs. Because there is an approximate C2 symmetry axis about the NCP dyad, the entropy values of one strand read along the 5′ to 3′ direction should be roughly the same as those read from the other strand in its 5′ to 3′ direction. The position of the

that are needed to stabilize the tertiary fold, the backbone may have to sacrifice entropic freedom in order to capitalize on the free energy advantages procured through the formation of key tertiary contacts.11 To ascertain whether this is true in the 23S subunit, we have examined the A-minor interactions in particular. A-minor platforms are one of the most prevalent tertiary motifs in the 23S subunit.33 In an A-minor interaction, an adenine base inserts itself into the minor groove of another base pair to form a tertiary interface with them. If our conjecture is correct, there should be entropic signatures in the nucleotides involved in A-minor motifs. Using the NASSAM structural annotation program,34 we have identified 144 Type I and II A-minor interactions in the 23S subunit (a slightly different but similar set was found by the authors of ref 35). Figure 6D shows a histogram of the entropies of the backbone segments associated with the Aminor “A” (adenine) residues as the dashed line (over the shaded area), while the solid orange line shows the composite SIII distribution of the other two (“B” and “C”) residues participating in A-minor interactions. From the data in Figure 6D, it is clear that the B and C residues in the A-minor platforms display entropic signatures almost identical to those of the WC bases shown in Figure 6B (shaded area under solid black line, Figure 6B). This is not at all unexpected, indicating that the B/C bases have typical WC base pair characters. But Figure 6D also reveals that the entropy distribution of the Aminor-forming A-residues shifts to lower values, implying that they are under excess entropic stress. The mean entropy value of the backbone segments involving A-minor A bases (dashed red line) is SIII/R ≈ −1.88/nt, which is lower than the mean in the entire 23S subunit by 0.13/nt. However, due to the small sample size (144 identified A-minor motifs), the statistical accuracy of this estimate is insufficient to assign a definitive value to the SIII entropic penalty incurred in the formation of these interactions. We are in the process of analyzing all the high-resolution X-ray structures of known RNA folds in order to construct a database of backbone entropy values to try to more accurately assess the residue backbone entropic costs associated with different kinds of tertiary interactions. While there are identifiable correlations between the computed per-nt backbone entropies in the 23S subunit with structural features on the secondary and tertiary levels as well as surface accessibilities, there is no clear relationship between residual backbone entropies and the B factors in the X-ray structure. A product-moment analysis shows that the Pearson correlation coefficient between the computed per-nt entropies and the B factors is −0.02. Backbone Entropies in the DNA Duplex in the Nucleosome Core Particle. Correlation of Backbone Entropies with Known Structural Details of the HistoneBound Nucleosomal DNA. The double-stranded (ds) DNA cocrystallized with the histone proteins in the nucleosome core particle (NCP)10 (PDB ID 1KX5) is one of the largest DNA duplexes that have been solved to date. The remarkable 1.9 Å resolution of this structure provides exquisite atomic details, and it allows us to study how the bases and the histone exert residual entropic stress on the sugar−phosphate backbone of the DNA. The DNA in the NCP consists of 147 base pairs (bp) wound around the histone core in 1.67 turns. The DNA sequence is palindromic, and there is an approximate C2 symmetry axis through the dyad. Following ref 10, we number the bases on the sequence with the nucleotide positioned at the dyad as 0. K

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B NCP dyad is depicted in Figure 8A,C,E,G by the two white triangles with a vertical dotted line connecting them, indicating that if the numerical data are reflected across this line, there should be an approximate dyadic symmetry in the entropy values. In Figure 8E, the same data from Figure 8A have been replotted with chain I in the same direction (red line, open circles), but the SIII/R values on chain J are now plotted in reverse as the orange line (closed diamonds, Figure 8E) with the 5′ end on the left and 3′ end on the right. Plotted in this way, the SIII/R values from the two strands coincide almost exactly. This is further confirmed in Figure 8F, where a scatter plot of the SIII/R values from Figure 8E reveals a significant positive correlation of the two chains when both are read in the 5′ to 3′ directions, with a correlation coefficient of +0.71. This strong correspondence between the entropy values of the two chains is consistent with the expected dyadic symmetry. It also demonstrates that the computed residual backbone entropy values are sufficiently accurate to quantitate atomic-level details along the sugar−phosphate backbone. Close Association of Backbone Entropy Profile with BasePair-Step Geometries. In their detailed analysis of the NCP structure, Richmond and Davey36 have computed a number of base-pair-step geometric parameters of the nucleosomal DNAs (rolls, tilts, shifts, slides, and twists). The oscillations in the entropy values we have identified in Figure 8E are consistent with the pattern in the roll and tilt of the base-pair-steps they have found, with the characteristic 11-nt repeats. But most remarkably, the residual backbone entropy values we have computed show a very strong correlation with one particular geometric parameter reported by Richmond and Davey,36 namely, the base-pair-step shifts. Their calculated shift parameters are reproduced in Figure 8G for half of the duplex (due to dyadic symmetry, the other half can be deduced by reflecting the data across the dyadic axis). Large shifts in the base-pair-step impart strong distortions to the backbone introducing extra entropic stress on the backbone conformational freedom. These stress points are responsible for a significant portion of the low-entropy strains stored in the NCP DNA sugar−phosphate backbone. It is not clear at this point whether the base-pair-shifts are the cause or the effect of the entropic stresses. To definitively answer this question, one will need to examine different NCP structures with alternative nucleosome positioning. The strong correlation between residual backbone entropies along chain J with the base-pair shifts calculated by Richmond and Davey are demonstrated in the scatter plot in Figure 8H. This graph was produced by plotting each SIII/R value from Figure 8A in chain J (blue line, closed diamonds) against the shift parameter at the same sequence position from Figure 8G (green line). Figure 8H shows that chain J entropies are significantly anticorrelated with the shift at each base-pair step, with a correlation coefficient of −0.72. On the other hand, Figure 8I reveals the opposite is true for chain Ithe residual backbone entropies on I are positively correlated with the shift parameter of each base-pair step, with a correlation coefficient of +0.69. We can inspect the spatial relationship between the lowentropy stress points on the DNA with its absolute positioning around the histone core. The stereograms in Figure 9A,B reveal where the lowest-entropy segments are located on the duplex. From Figure 9 it is clear that the high-strain segments do not always appear on both strands across the same base-pair step. This is consistent with the findings in Figure 8B as we have

Figure 9. (A) Stereoview of the nucleosome core particle, showing locations of the low-entropy backbone segments on the DNA duplex with lower than two standard deviations from the mean in red, and (B) the side view, showing that the DNA sequence that is contacting the H2A (green) and H2B (blue) subunits of the histone bears almost no entropic stress.

discussed previouslwhile the entropic stresses are concentrated on base-pair-steps with large shifts, strain is often not borne equally by both strands. In fact, Figure 9 shows that in some regions the two complementary strands alternate in entropy values to compensate for the stress imparted to the other strand (e.g., between positions +22 and +30). This is especially true in regions where the base-pair-step shifts are strong. And as we have also described earlier (see Figure 8H,I, and the discussions surrounding them), the SIII/R values of the two complementary strands are strongly correlated with the shifts, but in opposite directions. Residual Backbone Entropy in Folded RNAs versus Duplex DNAs. Finally, we want to derive an estimate for the residual backbone entropy costs required when two single-stranded (ss) DNAs hybridize into a duplex. Similar to the approach used for analyzing the rRNAs, we used Monte Carlo simulations to derive an estimate for the SIII/R value of an unpaired ssDNA. The average SIII/R obtained from a large equilibrium ensemble of simulated naked ssDNAs was −0.82/nt. This value is just slightly higher than the per-nt SIII/R value for unfolded RNAs of −0.98/nt. The presence of the 2′ OH on the ribonucleotides creates extra excluded volume on the RNA backbone compared to the deoxyribonucleotides on a ssDNA, and this causes a depression of the residual entropy on RNAs relative to DNAs. However, the difference in residual backbone entropy SIII/R between a naked ssDNA and an unfolded RNA seems to be rather small, ∼−0.16/nt. The distribution of SIII/R values in unhybridized ssDNAs from the Monte Carlo simulations is shown in Figure 10 as the blue solid line, with the corresponding distribution for unfolded RNAs from Figure 6A shown as the red dashed line. The similarities between these two distributions indicate that both RNAs and DNAs appear to have sufficient open space along the backbone in their unfolded states so that the addition 2′ OH in the ribonucleotides does not seem to give rise to a lot more L

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

generally have lower strains. Not counting entropy loss associated with the formation of loops and junctions, the residual backbone entropy alone accounts for an average +0.47 (kcal/mol)/nt free energy penalty at 310 K in the 23S subunit. In the nucleosome core DNA, backbone entropies show persistent periodic oscillations as a function of sequence position, correlating strongly with the superhelical twist and shifts in the base-pair-step geometries. Nucleosome positioning on the bound DNA exerts strong influence over its residual backbone entropic strains. In contrast to rRNAs, the residual backbone entropy accounts for a free energy penalty of only +0.09 (kcal/mol)/nt in duplex DNAs.



Figure 10. Distribution of per-nucleotide residual entropy values in the nucleosomal DNAs (black dotted−dashed line, shaded), compared to the distributions for unhybridized ssDNAs (solid blue line) and unfolded RNAs (red dashes, reproduced from Figure 6).

AUTHOR INFORMATION

Corresponding Author

*Tel.: 213-740-4101. Fax: 213-740-3972. E-mail: cmak@usc. edu. Author Contributions

excluded volume collisions compared to the deoxyribonucleotides. Figure 10 also shows the distribution of SIII/R values in the NCP dsDNAs as the dotted− dashed line (shaded). Comparing this to the same distribution in Figure 6A for rRNAs in the 23S subunit, the nucleosomal DNAs in their hybridized form appear to have residual backbone entropies much more similar to those in free ssDNAs. The mean SIII/R value for the nucleosomal DNAs is −0.97/nt. Referenced to free ssDNAs, this corresponds to an entropy loss of only ∼−0.15/nt between ssDNAs and a duplex. On the other hand, the entropic penalty for RNAs going from the unfolded to the folded structure, as we have discussed earlier, is much larger, at ∼−0.77/nt. In terms of free energy, these entropy differentials translate to a + 0.09 (kcal/mol)/nt penalty at 310 K for DNAs, versus +0.47 (kcal/mol)/nt for RNAs. This suggests that whereas the residual backbone entropy costs for dsDNA duplex formation appears to be nearly neutral, the residual backbone entropy costs in assembling a folded RNA is much larger. This is likely due to the congestion among atoms on the RNA backbone with its extra 2′ OH in the geometry required to assemble an A-form helix, compared to a fairly relaxed DNA backbone without the 2′ OH inside a B-form helix.

C.H.M. designed the research, numerical algorithms, and simulations and wrote the manuscript. L.L.S. and A.N.V. performed the statistical analyses. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant No. CHE-0713981. L.L.S. acknowledges support from USC as a Provost Undergraduate Research Fellow. We thank Ryan Khan for his assistance with the data analysis.



REFERENCES

(1) Turner, D. H. Thermodynamics of base pairing. Curr. Opin. Struct. Biol. 1996, 6, 299−304. (2) Woodson, S. A. Compact intermediates in RNA folding. Annu. Rev. Biophys. 2010, 39, 61−77. (3) Diamond, J. M.; Turner, D. H.; Mathews, D. H. Thermodynamics of three-way multibranch loops in RNA. Biochemistry 2001, 40, 6971−6981. (4) Mathews, D. H.; Turner, D. H. Prediction of RNA secondary structure by free energy minimization. Curr. Opin. Struct. Biol. 2006, 16, 270−278. (5) Murphy, M. C.; Rasnik, I.; Cheng, W.; Lohman, T. M.; Ha, T. J. Probing single-stranded DNA conformational flexibility using fluorescence spectroscopy. Biophys. J. 2004, 86, 2530−2537. (6) El Hassan, M. A.; Calladine, C. R. Conformational characteristics of DNA: Empirical classifications and a hypothesis for the conformational behaviour of dinucleotide steps. Philos. Trans. R. Soc., A 1997, 355, 43−100. (7) Mak, C. H. RNA conformational sampling. I. Single-nucleotide loop closure. J. Comput. Chem. 2008, 29, 926−933. (8) Ban, N.; Nissen, P.; Hansen, J.; Moore, P. B.; Steitz, T. A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 2000, 289, 905−920. (9) Schmeing, T. M.; Huang, K. S.; Kitchen, D. E.; Strobel, S. A.; Steitz, T. A. Structural insights into the roles of water and the 2′ hydroxyl of the P site tRNA in the peptidyl transferase reaction. Mol. Cell 2005, 20, 437−448. (10) Davey, C. A.; Sargent, D. F.; Luger, K.; Maeder, A. W.; Richmond, T. J. Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 Å resolution. J. Mol. Biol. 2002, 319, 1097−1113. (11) Mak, C. H.; Matossian, T.; Chung, W. Y. Conformational entropy of the RNA phosphate backbone and its contribution to the folding free energy. Biophys. J. 2014, 106, 1497−1507.



CONCLUSIONS Secondary and tertiary structure formation in nucleic acids is directed by the folding free energy, but while base pairing, stacking, and charge interactions have been studied extensively, little is known about the intrinsic entropy of the backbone. Residual conformational fluctuations persist along the sugar− phosphate backbone of any nucleic acid even after the assembly of its secondary and tertiary structure. Here we extend a recently reported numerical algorithm to calculate the residual entropy for every nucleotide along the phosphate−sugar backbone in a RNA or DNA given the configuration of its bases. The difference in residual conformational entropies between the folded and unfolded states provides a rigorous lower bound for the magnitude of the entropy of folding, ΔSfold. Applying this analysis to the crystallographic structures of RNAs in the 23S subunit and the duplex DNA in the nucleosome core particle, we examined the origin of backbone entropic strains and their relationship to structure. In the 23S rRNA, higher entropic strains are concentrated in helices and certain tertiary interaction platforms, while residues with high surface accessibility and those not involved in base pairing M

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B (12) Olson, W. K.; Bansal, M.; Burley, S. K.; Dickerson, R. E.; Gerstein, M.; Harvey, S. C.; Heinemann, U.; Lu, X. J.; Neidle, S.; Shakked, Z.; et al. A standard reference frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. 2001, 313, 229−237. (13) Ulmschneider, J. P.; Jorgensen, W. L. Monte Carlo backbone sampling for nucleic acids using concerted rotations including variable bond angles. J. Phys. Chem. B 2004, 108, 16883−16892. (14) Coutsias, E. A.; Seok, C.; Jacobson, M. P.; Dill, K. A. A kinematic view of loop closure. J. Comput. Chem. 2004, 25, 510−528. (15) Sklenar, H.; Wustner, D.; Rohs, R. Using internal and collective variables in Monte Carlo simulations of nucleic acid structures: Chain breakage/closure algorithm and associated Jacobians. J. Comput. Chem. 2006, 27, 309−315. (16) Chung, W. Y. Mobility analysis of RSSR mechanisms by working volume. J. Mech. Design 2005, 127, 156−159. (17) Chung, W. Y. Type determination and analysis of mobility region for bimodal linkages. Proc. Inst. Mech. Eng., Part C 2008, 222, 2495−2503. (18) Abramowitz, M., Stegun, I. A., Eds. Handbook of mathematical functions with formulas, graphs, and mathematical tables; Dover Publications: New York, NY, USA, 1965. (19) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A 2nd generation force-field for the simulation of proteins, nucleic-acids, and organic-molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. (20) MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (21) Mak, C. H. Loops MC: An all-atom Monte Carlo simulation program for RNAs based on inverse kinematic loop closure. Mol. Simul. 2011, 37, 537−556. (22) Mak, C. H.; Chung, W. Y.; Markovskiy, N. D. RNA conformational sampling: II. Arbitrary length multinucleotide loop closure. J. Chem. Theory Comput. 2011, 7, 1198−1207. (23) Weeks, J. D.; Chandler, D.; Andersen, H. C. Role of repulsive forces in determining equilibrium structure of simple liquids. J. Chem. Phys. 1971, 54, 5237−5247. (24) Berman, H. M.; Olson, W. K.; Beveridge, D. L.; Westbrook, J.; Gelbin, A.; Demeny, T.; Hsieh, S. H.; Srinivasan, A. R.; Schneider, B. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 1992, 63, 751−759. (25) Coimbatore Narayanan, B.; Westbrook, J.; Ghosh, S.; Petrov, A. I.; Sweeney, B.; Zirbel, C. L.; Leontis, N. B.; Berman, H. M. The nucleic acid database: New features and capabilities. Nucleic Acids Res. 2014, 42, D114−122. (26) Kool, E. T. Hydrogen bonding, base stacking, and steric effects in DNA replication. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 1−22. (27) Lai, J. S.; Qu, J.; Kool, E. T. Fluorinated DNA bases as probes of electrostatic effects in DNA base stacking. Angew. Chem., Int. Ed. 2003, 42, 5973−5977. (28) Kim, T. W.; Kool, E. T. A series of nonpolar thymidine analogues of increasing size: DNA base pairing and stacking properties. J. Org. Chem. 2005, 70, 2048−2053. (29) Merino, E. J.; Wilkinson, K. A.; Coughlan, J. L.; Weeks, K. M. RNA structure analysis at single nucleotide resolution by selective 2′hydroxyl acylation and primer extension (SHAPE). J. Am. Chem. Soc. 2005, 127, 4223−4231. (30) Mortimer, S. A.; Weeks, K. M. A fast-acting reagent for accurate analysis of rna secondary and tertiary structure by SHAPE chemistry. J. Am. Chem. Soc. 2007, 129, 4144−4145. (31) Deigan, K. E.; Li, T. W.; Mathews, D. H.; Weeks, K. M. Accurate SHAPE-directed RNA structure determination. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 97−102. (32) Hubbard, S. J.; Thornton, J. M. NACCESS, Version 2.1.1; Department of Biochemistry and Molecular Biology, University College London: London, 1993.

(33) Nissen, P.; Ippolito, J. A.; Ban, N.; Moore, P. B.; Steitz, T. A. RNA tertiary interactions in the large ribosomal subunit: The A-minor motif. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 4899−4903. (34) Hamdani, H. Y.; Appasamy, S. D.; Willett, P.; Artymiuk, P. J.; Firdaus-Raih, M. Nassam: A server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules. Nucleic Acids Res. 2012, 40, W35−41. (35) Xin, Y.; Laing, C.; Leontis, N. B.; Schlick, T. Annotation of tertiary interactions in RNA structures reveals variations and correlations. RNA 2008, 14, 2465−2477. (36) Richmond, T. J.; Davey, C. A. The structure of DNA in the nucleosome core. Nature 2003, 423, 145−150. (37) Klug, A.; Lutter, L. C. The helical periodicity of DNA on the nucleosome. Nucleic Acids Res. 1981, 9, 4267−4283.

N

DOI: 10.1021/acs.jpcb.5b04839 J. Phys. Chem. B XXXX, XXX, XXX−XXX