Flory Theory of the Folding of Designed RNA Molecules - American

Feb 25, 2009 - secondary structure1,2 is a planar graph of the pairing pattern, showing linear complementary sequences decorated with un- paired “bu...
1 downloads 0 Views 193KB Size
3880

J. Phys. Chem. B 2009, 113, 3880–3893

Flory Theory of the Folding of Designed RNA Molecules† David Schwab and Robijn F. Bruinsma* Department of Physics and Astronomy, The UniVersity of California at Los Angeles, Los Angeles 90095-1596, California ReceiVed: August 1, 2008; ReVised Manuscript ReceiVed: December 4, 2008

We construct a Flory theory for the folding of RNA molecules that have a designed, linear ground state. I. Introduction The structural organization of an RNA molecule is determined by the pairing pattern of complementary nucleic acid bases. The secondary structure1,2 is a planar graph of the pairing pattern, showing linear complementary sequences decorated with unpaired “bubbles” and side branches that end in “stem loop” structures. Figure 1 shows a typical secondary structure of a 1400 base section of a viral genome RNA molecule, as computed by the Mfold program, in the form of a linear backbone with side branches. For a large RNA molecule such as this there may be, at room temperature, on the order of 102 or more competing secondary structures of this form within a few kBT of the ground state. The tertiary structure of an RNA molecule refers to the folding of the secondary planar structure into a three-dimensional structure by the pairing of nucleotides that were left unpaired in the secondary structure, as shown in Figure 2. Tertiary folding depends delicately on the physicochemical parameters of the surrounding solution. Tertiary folding has to overcome the electrostatic self-repulsion between the negatively charged double-helical sequences, and a modest change in salt concentration produces a significant loss of tertiary structure.3 An RNA molecule can maintain a well-defined secondary structure even if it loses its tertiary structure.4,5 If the strength of the pairing interaction is weakened, the secondary structure of RNA can melt into a “molten-globule” state6 with fluctuations between alternative secondary structures. The enthalpic freeenergy cost incurred by the reduced number of complementary paired bases is compensated by the gain in entropic free energy associated with the enlarged configurational space. Finally, if the pairing interaction is weakened even more, then the RNA molecule can degenerate into a fully denatured linear polymer. This hierarchical folding of RNA molecules contrasts with protein folding, where the secondary structure generally develops during the three-dimensional folding. The paper develops a mean-field theory for the folding of a “designed” RNA molecule under conditions of thermal equilibrium. The statistical mechanics of the melting of branched RNA molecules without tertiary interactions, self-repulsion, or a designed ground state has already been studied extensively. In a seminal paper, Pierre-Gilles de Gennes (PGG)7 showed that the partition function of a branching RNA molecule with a uniform, nonspecific pairing energy depends on the number N of nucleotides as exp(QN)/N3/2, with Q (minus) as the free energy per base in units of kBT, while the radius of gyration scales according to the Zimm and Stockmayer (ZS)8 relation †

Part of the “PGG (Pierre-Gilles de Gennes) Memorial Issue”. * To whom correspondence should be addressed.

Figure 1. Secondary structure of one of the two single-stranded RNA molecules of a 4800 base viral genome (NodaViridae) computed using the Mfold program.

Figure 2. Tertiary contacts. (a) Secondary structure of two hairpins emerging from a bubble. The backbone is indicated by a solid line, and complementary pairing is indicated by dashed lines. There are no tertiary contacts. (b) A possible tertiary contact between two hairpins, indicated by an arrow.

R(N) ∝ N1/4. When the pairing energy is reduced in the PGG theory, the molten-globule state continuously evolves into a fully denatured state, with no intervening phase transition. The

10.1021/jp8068893 CCC: $40.75  2009 American Chemical Society Published on Web 02/25/2009

Flory Theory of the Folding of Designed RNA Molecules melting of a designed RNA molecule was studied by Bundschuh and Hwa (BH),9 who generalized the PGG theory by allowing for specific pairing interactions that stabilized a designed “hairpin” ground-state structure (“Goˆ model”). They encountered a second-order melting transition between the designed state and the PGG molten-globule state at a critical value of the specific pairing energy. The PGG/BH description provides a useful framework for statistical mechanical studies of RNA folding, and it has been used to address a number of problems,10 but it does not include the electrostatic self-repulsion between complementary sections of the secondary structure11 nor the attractive tertiary pairing contacts that determine the spatial structure of folded RNA molecules. The absence of an excluded-volume interaction leads in fact to a serious physical inconsistency. A consequence of the ZS scaling relation R(N) ∝ N1/4 for branched polymers without self-interaction is that the radial density F(r) grows linearly with radial distance r from the center of the molecule without limit. Parisi and Sourlas (PS) showed that if excludedvolume interactions are properly included then, in three dimensions, the density F(r) of the polymer actually decreases with the radial distance, as 1/r, while the radius of gyration R(N) actually scales as N1/2. In this paper, we extend the PGG/BH description by including self-repulsion and tertiary pairing interactions at the level of Flory-Huggins mean-field theory. It is based on earlier work of Gutin, Grosberg, and Shaknovitch (GGS),12 who showed that the Flory theory of branched polymers with an excluded-volume interaction produces a scaling exponent for the radius of gyration of “annealed” branched polymers, that is, with freely fluctuating secondary structure, which is quite close to the exact PS result. The aim of the present theory is not the prediction of the RNA tertiary folding structures from the primary nucleotide sequence but rather the development of a physical understanding of how thermodynamic parameters such as salinity or temperature affect tertiary folding of an RNA molecule. Specifically, we want to understand under which conditions tertiary interactions produce a compact, folded structure that still has a finite fraction of the nucleotides in the designed ground state. Our results can be summarized in two mean-field phase diagrams for RNA folding (Figures 5 and 6) that are expressed in terms of the specific pairing energy ε* that stabilizes the folded state and the strength of the tertiary interaction Vm between nucleotides that are not paired as part of the designed ground state. Figure 5 shows the case that the interaction between nucleotides that are complementary paired and those that are not is repulsive. If Vm is positive, then as specific pairing energy ε* is reduced, a continuous melting transition takes place between a swollen molecule with a nonzero order parameter in terms of the number of complementary paired nucleotides in the designed state and a swollen molten-globule state where the order parameter vanishes. The critical exponents of the transition are altered from the BH mean-field value by the selfrepulsion between nucleotides. If Vm is negative, a first-order transition takes place from a collapsed molten-globule and the linear state. The line of first-order transitions is flanked by two spinodals that merge at a tricritical point. There is no compact, folded structure that still has a finite fraction of the nucleotides in the designed ground state. Figure 6 describes the case in which the interaction between nucleotides that are complementary paired and those that are not is attractive. The line of second-order transitions now extends into the regime of negative Vm. The line ε* ) ε(Vm) marks a line of continuous transitions from a fully collapsed molten-

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3881

Figure 3. Branching tree. If this is viewed as a schematic representation of an RNA secondary structure, then the end points of the branches are the stem loops of the secondary structure. A typical “spanning path” linking two end points, indicated as a heavy line, contains on the order of N1/2 links, with N being the number of nodes of the tree.

Figure 4. Melting transition. (a) Example of the optimal pairing sequence for a linear “hairpin” structure. Adding “errors” in the sequence effectively reduces the specific pairing energy ε*. (b) At high temperatures, the hairpin structure melts into a molten-globule.

globule with zero-order parameter to a partially collapsed state with a non-zero-order parameter. The line ε* ) ε˜ (Vm) marks a line of continuous transitions from the partial collapsed state to the fully swollen state, both with non-zero-order parameter. The intermediate phase ε(Vm) < ε* < ε˜ (Vm) describes a compact, folded structure with a finite fraction of the nucleotides in the designed ground state. This is the phase that, presumably, would describe functional, folded RNA molecules such as ribozymes (though, in actuality, ribozymes have branched secondary structures that are more complex than the linear ground state studied in this paper). Surprisingly, this folded phase is not the phase with the highest density. As the strength of the attractive interaction is increased, the two lines merge at a multicritical point beyond which a single first-order transition separates a swollen molecule with non-zero-order parameter and a fully collapsed molten-globule with zero-order parameter. Collapse thus triggers melting of the designed ground state. According to the mean-field theory, the physicochemical parameters describing the tertiary interactions must be very carefully tuned if one wants to produce the intermediate phase

3882 J. Phys. Chem. B, Vol. 113, No. 12, 2009

Schwab and Bruinsma II. Flory Theory of Branched Polymers First, assume a randomly branched polymer with a fixed (“quenched”) branching structure. The mean segment distance between adjacent branch points will be denoted by l. The segments are assumed here to be rigid, while the branch points are fully flexible. The typical number L of segments that connect two remote end points of the polymer along some spanning path scales with the total number N of segments as N1/2, which can be understood by viewing the spanning path as a random walk confined to the branched polymer (see Figure 3). In the absence of self-interaction, the radius of gyration scales according to the ZS relation R0(N) ∝ lN1/4. The Flory free energy for a branched polymer is the sum of the entropic stretching energy of a linear polymer with this radius of gyration plus a second virial term that describes short-range repulsion between the “monomers”

FQ(R)/kBT ≈ Figure 5. Phase diagram in terms of the specific pairing energy ε* and the nonspecific affinity Vm between nucleotides that are not complementary paired. The interaction between nucleotides that are complementary paired and those that are not is repulsive. For positive Vm, a continuous melting transition takes place at ε* ) εc (thin solid line) from a swollen coil to a swollen molten-globule state. For negative Vm, a first-order melting transition takes place from a swollen coil state with at least partial complementary pairing to a collapsed moltenglobule state with no complementary pairing (indicated as a solid sphere). The transition line is indicated as a fat solid line. Dashed curves are spinodal lines.

of Figure 6. First, the interaction between nucleotides that are complementary paired and those that are not must be repulsive or the intermediate phase will be absent. Second, the interaction between nucleotides that are not in the complementary paired ground state must be attractive, which is reasonable, but this attraction cannot be too strong or the intermediate phase will be overwhelmed by the collapsed globule with zero-order parameter. Finally, the specific pairing energy ε* must be close to the melting threshold. If this condition is not obeyed, then, according to Figure 6, the designed ground state would be too stable; the excluded-volume interaction/electrostatic self-repulsion prevents folding. There has to be a sufficient number of unpaired nucleotides so that the tertiary attraction between unpaired nucleotides can overcome the self-repulsion. The paper is organized as follows. In section II, we review the GGS Flory theory of branched polymers and discuss how, on a heuristic basis, a melting transition from a designed ground state into a molten-globule state could be included in Flory theory. The free energy of a polymer is, in Flory theory, the sum of an entropic elastic free energy, computed without selfinteraction, plus an interaction energy expressed as a virial expansion. Section III is devoted to the computation of the entropic elastic energy of a designed, stretched RNA molecule near the melting point using the PGG formalism (so, without self-interaction). In the first part of section IV, we construct the Flory free energy of a branched polymer with a designed ground state in the presence of excluded-volume interaction, while in the second part, we include tertiary interaction energies between paired nucleotides, unpaired nucleotides, and between paired and unpaired nucleotides. In section V, we construct the phase diagram that follows from this expression.

( ) R R0(N)

2

+V

N2 R3

(2.1)

Here, V is the excluded volume per monomer. It is assumed here that the distribution of side branches does not change as the molecule is stretched by the excluded-volume interaction. Minimization of F with respect to R then gives R(N) ∼ V1/5l2/5N1/2. This agrees with the exact scaling relation of PS, though that was derived for an “annealed” branched polymer case discussed below. This scaling relation implies that excluded-volume interactions cause the branches to stretch by a considerable amount since R(N) scales with N as the “chemical distance” lN1/2 between two end points of the network in a planar graph. Note that, within the Flory theory, one can write the excludedvolume term as τ(R)R, with the quantity τ having the dimension of a tension. This internal tension is then on the order of

τ∼

( )

kBT 1/5 -3/5 V l l

(2.2)

independent of N. A molten RNA globule is however not in a quenched state; it can alter its branching pattern and, hence, the value of L, in response to the stretching produced by the excluded-volume interaction. GGS showed, by summing over all possible tree structures, that the number Ω of configurations of a branched polymer of N monomers with L monomers separating two end points is proportional to exp(-L2/N). The Flory energy of an annealed branched polymer can be expressed as

FA(L, R)/kBT ≈

( )

R2 L2 N2 + + V N l2L R3

(2.3)

The first term of eq 2.3 is the usual entropic stretching energy of a linear Gaussian chain having L links. The second term is the entropic free energy, ln Ω, associated with the number of branching configurations Ω ∝ exp(-L2/N). The number L of segments between two end points is to be treated here as a second variational parameter (in addition to R). Minimization of F with respect to L gives

FA(R)/kBT ∝

( ) R2 l2N1/2

2/3

+V

N2 R3

(2.4)

Flory Theory of the Folding of Designed RNA Molecules

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3883

Figure 6. Phase diagram in terms of the specific pairing energy ε* and the nonspecific affinity Vm between nucleotides that are not complementary paired. The interaction between nucleotides that are complementary paired and those that are not is attractive. An intermediate phase appears for larger values of Vm that has partial complementary pairing and that is partially collapsed. The two transition lines separating the intermediate phase from the swollen coil and collapsed phases mark continuous transition-phase transitions. As Vm is further reduced, the two transition lines cross and transform to the spinodal lines bordering a first-order transition directly from the swollen coil to the collapsed phase (fat line).

Note that the entropic stretching energy of an annealed branched polymer (the first term) is not that of a harmonic spring. Minimization of eq 2.4 with respect to R gives the scaling relation R(N) ∝ N7/13. Note that the scaling exponent still is close to the exact PS result, which was derived for an annealed system. The internal tension τ(N) of an annealed branched polymer now vanishes in the thermodynamic limit as τ(N) ∝ N-2/13. For the following, it is useful to note that we also can use eq 2.4 to evaluate the effect of external tension τ on an annealed branched polymer. Without the excluded-volume interaction

∆Ff(R)/kBT ∝

( ) R2 l2N1/2

2/3

-

τR kBT

(2.5)

annealed branched polymer in the molten-globule state of RNA from a designed native state with a secondary structure of the type shown in Figure 1, that is, with a secondary structure that is dominated by one or a few spanning paths that contain on the order of N nucleotides. The parameter φ is thus an order parameter that describes a broken symmetry among the many different possible spanning paths of a branched polymer. The “high-temperature” phase, where the different possible spanning paths all have comparable likelihoods, have a φ that must vanish in the large N limit, as φ ∼ (R/N)2/3, while in the “lowtemperature” phase, the order parameter must be finite in the large N limit. A natural extension of eq 2.5 that accomplishes this would be to simply introduce a field h conjugate to φ. Physically, h could be a measure of the specific pairing interaction that favors a designed secondary structure. The variational free energy would then be

After minimization with respect to R

∆Ff /kBT ∼ -N

( ) τl kBT

4

G(φ, R)/kBT ≈ N(φ2 - hφ) +

(2.6)

( )

R2 N2 + V l2Nφ R3

(2.9)

Minimization of G with respect to R gives This must be compared with the elastic free energy of a linear polymer under tension. A linear polymer under tension acts as a harmonic spring with spring constant kBT/Nl2; therefore, the elastic free energy has a quadratic dependence on tension

( )

τl ∆Ff /kBT ∼ -N kBT

2

(2.7)

We now will write the GGS Flory energy in terms of the fraction φ ) L/N of the number of links that connect two end points of the branched polymer along a typical spanning path

FA(φ, R)/kBT ≈

( )

R2 N2 + Nφ2 + V 3 l Nφ R 2

(2.8)

According to eq 2.8, the value of φ that minimizes the free energy vanishes in the large N limit as φ ∼ (R/N)2/3. We propose to reinterpret φ as an order parameter that can distinguish an

R(N) ∼

{

(Vh)1/5N3/5 R . ξ V3/13N7/13 R , ξ

(2.10)

where ξ(h) ∝ h-7/4. For large N, the scaling exponent of the radius of gyration with N reproduces the Flory exponent of a linear polymer with excluded-volume interaction, while for small N, the scaling exponent is that of the Flory theory of an annealed branched polymer. This suggests that we can view ξ(h) as a correlation length. For a continuous melting transition, this correlation length should diverge at the critical point, indicated by h ) 0 in the present case. III. Entropic Elasticity of a Designed RNA Molecule The first step in constructing a Flory theory must be the determination of the entropic elasticity in the absence of interaction between the monomers. In this section, we will apply the Green′s function method of PGG to extend his calculation of the entropic elasticity of an RNA molecule in the molten-

3884 J. Phys. Chem. B, Vol. 113, No. 12, 2009

Schwab and Bruinsma

globule state to the case where the RNA molecule has a designed ground state. A designed ground state can be specified as a certain list S of complementary base pairs that are paired in the ground state. Our designed ground state will be taken to be that of BH, a long linear, complementary-paired chain (see Figure 4A). A possible design for such a structure could start from a circular chain of 2N bases, with half of the loop as a sequence of N randomly picked bases followed, in reverse order, by the N complementary bases. The binding energy ε˜ in units of kBT for two complementary bases that appear in the list S (“native pair”) will be assumed to be larger than the binding energy ε of any complementary pair that does not appear in the list S (“non-native pair”). At finite temperature, linear sequences of “designed pairs” are expected to alternate with “droplets” of molten, sidebranching material (see Figure 4B). The statistical weight of a linear sequence of P designed pairs will be given by q˜P, with q˜ ) exp(ε˜ ), while the statistical weight for a sequence of Q nonnative pairs would be qP, with q ) exp(ε). In Appendix A, we show that if a sequence is under a tension f, then the binding energy ε˜ (f) depends on the external tension f as ε˜ (f) ∼ ε˜ + c*f 2 for native pairs, with a similar relation for non-native pairs. Here, c* ∼ bld/3l2s , with b as the base pair spacing and ld as the persistence length of double-stranded RNA. As noted at the end of the last section, the coefficient of an f2 term in the free energy per monomer of a chain corresponds to the compliance per monomer. Note that c* ∝ ld is the familiar result that the spring constant of a semiflexible, linear polymer is inversely proportional to the persistence length. The reason for introducing an external tension is that, later on, we must fix the radius of gyration R in order to compute the elastic properties of the molecule, which will be done by introducing a Lagrange multiplier term in the Hamiltonian of the form -τR, with the Lagrange multiplier τ acting as an external tension. The elastic free energy is then obtained by an inverse Legendre transformation. The dimensionless tension is f ) τls/kBT, with ls twice the persistence length of single-stranded RNA. In the following, f will always be small compared to one. Next, we wish to define the statistical weight of the molten bubbles. Let the size of such a bubble be 2M, that is, the bubble is composed of two strands of the same length M whose initial and final bases form two native pairs. Inside of this molten section, only nonspecific pairing with interaction strength ε will be allowed. That produces an awkward constraint; any two complementary bases that occur in the list S and that are part of the bubble would be forbidden to pair because that would separate the bubble into two smaller bubbles. Bundschuh and Bruinsma13 recently showed that this constraint can be circumvented by allowing all complementary bases inside of such a bubble to pair with the nonspecific pairing interaction ε. Let Σ(2M,q) be the statistical weight of such a uniform bubble of size 2M, which now only depends on q and which can be computed by the methods of PGG. The partition function Z(2N) of the whole chain can be written in terms of Σ(2M,q) as

Z(2N) )

∑ (q˜ - q)|S′| ∏ Σ(2LS′, q)

S′⊂S

(3.1)

{2LS′}

Here, S′ is any subset of the list S of designed pairs with |S′| paired bases, while the LS′ variables are the number of base pairs of the various loops that connect the complementary paired sequences of S′. We will not reproduce here the proof, but it is easy to check some limits. If there is no difference in binding strength between the native and non-native pairs, that is, q )

q˜, then only the null set of S contributes to the sum with L ) N. The partition function now simply equals Σ(2N,q), which is the correct answer. A loop with only one designed pair that divides the loop into two equal sections of length N should have, according to eq 3.1, a partition function Σ(2N,q) + (q˜ - q)Σ(N,q)2. This is the correct answer since we must subtract from Σ(2N,q) the one improper term qΣ(2N,q)2, which has the native pair paired with the incorrect statistical weight q. In the low-temperature limit where all bases are paired, Σ(2N,q) ≈ qN. Application of the binomial theorem then shows that the partition function equals (q˜)N, as it should. The partition function eq 3.1 can be summed by applying a discrete Laplace transform to both sides and writing it as a Dyson recursion relation

Z(P) ) Γ(P) + Γ(P)Σ(P, q)Z(P)

(3.2)

Here ∞

Γ(P) )

∑ exp(-2NP)(q˜ - q)P

N)0

1 ) 2P - ε*( f )

(3.3a)

is a discrete Laplace transform that has a simple pole at P ) ε*(f)/2 with

ε*(f ) ) ln(q˜ - q) ) ln(exp(ε˜ (f )) - exp(ε(f )) ≈ ε*+c*f 2

(3.3b)

We defined ε* ≡ ε*(f ) 0). Note that in the limit q˜ f q, ε* will become negative. Other discrete Laplace transforms are ∞ [exp(-2NP)]Z(2N) and defined in the same way, Z(P) ) ∑N)0 ∞ [exp(-2NP)]Σ(2N,q). The solution of eq 3.2 is Σ(P) ) ∑N)0

Z(P) )

1 Γ(P) - Σ(P)

(3.4)

-1

The partition function Z(2N) is then obtained by an inverse Laplace transform

Z(2N) )

1 2πi

+i∞ dP exp[(2PN)]Z(P) ∫-i∞

(3.5)

The integration path runs along the imaginary axis to the right of the dominant singularity P† of Z(P), that is, the singularity that has the largest real value. In the large N limit, the integral is dominated by this singularity, and the free energy per base in units of kBT equals -P†. In Appendix B, we compute the self-energy Σ(P) of a moltenglobule under an external tension f. It is given by

( σβ )Σ(P) ≈ (d

( )

1

- d2(P - Q)1/2) +

( )(

)

c3 f 2 c3 f 2 c3 f 2 ln(d3 /(P - Q)) + G (3.6) β β β(P - Q)1/2

Flory Theory of the Folding of Designed RNA Molecules in the low-tension limit f , β, with β as a dimensionless parameter of order 1. The numerical constants d1-4 are all positive in this expression. The dimensionless parameter σ, the Boltzmann weight of the two “forks” at the two end points of a bubble of unpaired sequences, will be assumed to be small compared to 1. The first term in eq 3.5 is the self-energy of a molten-globule that is not under tension, as obtained by PGG. The dimensionless parameter Q ) [ε + (ε2 + 8σ)1/2]/4 marks the end point of a square root branch cut. We shall see that it is the free energy per monomer of a molten-globule that is not under tension. In the second, f-dependent, term, c3 ) c* + d2(σ/ δQ3), with δQ ∼ Q, is an effective compliance that is the sum of the compliance linear chain and molten-globule contributions in series. The scaling function G(z) will play a key role in the following and is defined in Appendix B. We will only need here the results that G(z) ∝ z for z , 1 and that G(z) diverges at a critical zc on the order of 1. We can now examine the effect of different levels of tension on the free energy per monomer. (A) Zero Tension. For the f ) 0 limit, the Laplace transform of the partition function equals

Zf)0(P) ∝

1 2(P - Q) - (ε*-εc) + d2(σ/β)(P - Q)1/2 (3.7a)

where εc ) 2Q - (σ/β)d1. The function Zf)0(P) has a square† † , with Pf)0 root branch cut at P ) Q and a simple pole at Pf)0 † as the solution of Γ(P)Σ(P) ) 1 (only solutions with Pf)0 ∼ Q are physically relevant). A square-root branch cut in the partition function Z(P) is the mathematical signature of a branched polymer, while a simple pole is the signature of a linear polymer. † ∼ Q + (ε* - εc)/2 is For ε* . εc, the pole located at Pf)0 the dominant singularity; therefore, the molecule has the configuration of a linear polymer corresponding to the designed ground state. Recall that ε* decreases to zero and even becomes negative as the native pairing energy approaches the non-native pairing energy (that is, q˜ f q). Before that happens, however, at ε* ∼ εc, the pole approaches the branch cut at P ) Q. Close to this point † Pf)0 =Q+

( )

β 2 (ε* - εc)2 d2σ

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3885 (B) Linear Response: f , (β/c3)1/2(ε* - εc)1/2. If cf 2 , β(P - Q)1/2, then the scaling function term in eq 3.6 can be neglected, and Z(P) reduces to

Z(P) ∝

1 2(P - Q) - (ε*( f ) - εc) +

(σ/β){d2(P - Q)1/2 - d4(c3f 2 /β) ln(d3 /(P - Q))} (3.8) If the specific pairing energy ε* is large compared to the critical point value εc, then the leading pole is located at P†f = Q + (ε*(f) - εc)/2. Apart from a constant shift, the free energy per monomer is that of a complementary-paired chain as discussed in Appendix A (see also eq 3.3b). The quadratic dependence on tension is again the signature of a linear polymer. Near the BH critical point, the pole is located at P†(f) - P†(f ) 0) ) β 2 cf 2 2 (ε*-εc) c* + 2d4σ 2 ln(σ√d3 /β(ε*-εc)) f 2 (3.9) σd2 β

( )

(

( )

The elastic energy retains the form of a linear polymer, but the effective compliance, the coefficient of the f 2 term, vanishes at the BH critical point. The development of the large moltenglobule bubbles along the chain near the critical point increases the entropic elasticity, and the resulting stiffening reduces the compliance. Alternatively, one can say that the effective persistence length vanishes at the critical point. (C) f . (β/c3)1/2(ε* - εc)1/2. If c3f2 exceeds β(P - Q)1/2 in eq 3.6, then we must include the scaling function term. If, for a fixed tension, we approach the critical point, this will happen when (β/c3)1/2(ε* - εc)1/2 , f. Because the scaling function G(z) diverges at zc, the equation Γ(P)Σ(P) ) 1 for fixed f always has a solution P†f larger than the branch cut location Q; the pole and the branch cut do not fuse. This means that there is no melting transition at finite tension levels. The location of the pole is determined by the divergence of the scaling function. Setting the argument c3f2/[β(P - Q)1/2] of the scaling function equal to the critical value zc for the divergence of the scaling function gives the pole location

(3.7b)

For ε* < εc the branch cut singularity at Q is the dominant † = Q. It follows that the free energy singularity; therefore, Pf)0 per monomer has a mathematical singularity at ε* ) εc and that the singular contribution to the free energy is proportional to N(ε* - εc)2. This is the signature of a second-order phase transition, which is in fact the BH melting transition of a designed linear sequence. The free energy per monomer in the molten phase is thus equal to Q. For positive ε . σ1/2, Q is on the order of ε/2. The nonspecific pairing interaction would be strong enough in that case to produce (nearly) fully double-helical paired chains. For negative ε , -σ1/2, Q is on the order of σ1/2. In this case, the chain is nearly fully denatured in the molten state. The moltenglobule state proper corresponds to the cross-over region where -σ1/2 , ε , σ1/2, with Q on the order of σ1/2. In the following, we will assume that we are in this range. An important length scale here is ξss ) ls/Q, the correlation length of the Greens function of the chain. Physically, it corresponds to the size of the unpaired root of a branching “tree”.

)

Pf†

( )

c3 f 2 ∼Q+ βzc

2

(3.10)

Note that eq 3.10 is independent of the complementary pairing energy ε*. Above the critical point, eq 3.10 applies when (ε* - εc) , (c3/β)f2, while it remains valid in the molten state with ε* < εc. Recall that a quartic dependence of the elastic free energy on external tension is characteristic of the elastic response of a branched polymer. From a comparison between eqs 3.11 and 2.8 and 2.9, it follows that the radius of gyration R2(N) ∼ N1/2(l2s c3/βzc) indeed has the scaling form of an annealed branched polymer without self-interaction. (D) Inverse Legendre Transformation. In order to find the entropic elastic free energy, we transform the free energy F(f) ) -NkBTP†(f) for fixed tension f to a free energy F(R) for fixed extension R. These two free energies are related by the inverse Legendre transformation

F(R) ) F(f(R)) + kBTRf(R)/ls

(3.11)

3886 J. Phys. Chem. B, Vol. 113, No. 12, 2009

Schwab and Bruinsma

The dependence of f(R) on R is obtained by inverting the radius of gyration R(f) given by

R(f) ) Nls

dP†(f) df

(3.12)

Using the two limiting forms for P†(f), eqs 3.9 and 3.10, we just found

{

2lsceff(ε* - εc)f f , (β/c3)1/2(ε* - εc)1/2

2 R(f)/N ∼ 4lsc3

β2z2c

f3

f . (β/c3)1/2(ε* - εc)1/2

(3.13)

with ceff ) 2(β/σd2)2[c* + 2d4σ(cf2/β2) ln(σ(d3)1/2/β(ε* - εc))]. Note that ceff still has a weak logarithmic dependence on (ε* εc) through the dependence of the second term on the order parameter. Carrying out the inverse Legendre transformation finally gives the required entropic elastic free energy

{

( ) ( )

R2 βc3/2 eff Nls(ε* - εc)3/2 (3.14a) 2 R , ceff(ε* - εc)Nls c 2/3

( )

( )

R 2 RξBH g RG(N) RG(N)2

R4/3 N1/3l4/3 s

R.

βc3/2 eff Nls(ε* - εc)3/2 (3.14b) c

(where we left out some uninteresting numerical constants). Equation 4.5 gives two limiting forms of the elastic response near the critical point. Equation 3.14a describes the stretching energy for small extensions R, which has the form of the entropic stretching energy of a linear polymer. The effective persistence length a ∼ (ls/σ)(ceff(ε* - εc))1/2 vanishes at the critical point; therefore, the Gaussian radius of gyration, that is, in the absence of tertiary interaction, is given by

ls RG(N) ∼ (ceff(ε* - εc))1/2N1/2 σ

IV. Flory Free Energy of a Designed RNA Molecule (A) Excluded-Volume Interaction. The Flory free energy of a macromolecule is the sum of the entropic elastic free energy of the molecule and the interaction energy between “monomer” units expressed in the form of a virial expansion. In this section, we focus on the case of a generic repulsive excluded-volume interaction between the monomers. The Flory free energy then takes the simple form

( )

cσ (ε* - εc)-1/2 βc1/2 eff

N2 R3

(4.1)

with V as the excluded volume per monomer. In the two limiting regimes of eq 3.15

{

∆FF(R)/kBT ∼ σ2 N2 R2 /Nl2s + V 3 R , RG(N)2 /ξBH (4.2a) ceff(ε*-εc) R β 2/3 R4/3 N2 +V 3 R . RG(N)2 /ξBH (4.2b) c N1/3l4/3 R s

()

Minimization of eq 4.2a with respect to R reproduces the usual 3/5 Flory scaling exponent of linear polymers with an excludedvolume interaction

R(N) ∝ {Vc(ε* - εc)l2s /σ2}1/5N3/5

(4.3a)

(3.15)

The origin of this critical point stiffening lies in the contraction of the molecule due to increasing side branching near the critical point. Also, note from eq 3.13 that right at the critical point, and also in the molten state, R(f) ∝ N f 3; therefore, the compliance [dR( f )/df]|f)0 vanishes at the critical point. Equation 3.14b describes the stretching for large extensions. It has the form of the entropic stretching energy of an RNA molecule in the molten state, as given by the PGG theory. We can write the stretching energy in a more economical form. Define the length scale as

ξBH /ls ∼

(3.17)

The scaling function g(x) goes to 1 for small x and to x-2/3 for large x. It follows from eq 3.17 that near the melting transition, the elastic response of a molecule of size N is that of a linear polymer if the radius of gyration exceeds ξBH, while the elastic response is that of a branched polymer if the size exceeds ξBH. We thus can identify ξBH as the correlation length of the melting transition (without tertiary interaction). A scaling exponent of 1/2 for the correlation length is consistent with the BH theory of the melting transition.

∆FF(R) ) Fs(R) + V

∆Fs(R)/kBT ∼

( βc )

∆Fs(R)/kBT ∝

(3.16)

The stretching energy can be expressed in terms of ξBH and the radius of gyration as

Minimization of eq 4.2b with respect to R reproduces the GSS scaling exponent of annealed branched polymers with an excluded-volume interaction

R(N)/lc ∝ (V/l3s (β/c)2/3)3/13N7/13

(4.3b)

Cross-over from eq 4.3a for larger N to eq 4.3b for smaller N takes place at N ) N* with N* ∝ (ε* - εc)-13/4. The size ξF ≡ R(N*) of the molecule scales as

ξF ∝ (ε* - εc)-7/4

(4.4)

If the size of the molecule is small compared to ξF, then the GGS scaling relation applies, while conventional Flory scaling applies if the size exceeds ξF. Following the same reasoning as before, we can interpret ξF as the typical size of a molten droplet inside of a linear, designed RNA molecule with an excluded-volume interaction, that is, as the correlation

Flory Theory of the Folding of Designed RNA Molecules length for the interacting system. The correlation length still diverges at ε* ) εc; therefore, the melting transition remains continuous in the presence of excluded-volume interactions. The correlation length exponent ν ) 7/4 is nonclassical; the exponent agrees with eq 2.10, obtained by earlier heuristic arguments. Physically, the increase of the exponent ν is related to the fact that the excluded-volume interaction suppresses thermal fluctuations. (B) Melting Order Parameter. As a necessary preliminary for the next section, we will first cast this Flory energy in the form of a variational expression for the order parameter θ of the melting transition, that is, the mean fraction of complementarypaired nucleotides. First note that since the Boltzmann factor for complementary pairing equals exp(Mε*), it follows that θ ) -2(d/dε*)(F/N), where

{ ( )

F ) -NkBT Q +

}

β 2 (ε* - εc)2 + ∆FF(R) 2 d2σ

(4.5)

Now, we add a field h to the specific pairing energy ε* so we can write θ(h) ) -2(d/dh)(F(h)/N), where ε* is replaced by ε* + h in the free energy. A variational expression for the order parameter is constructed by again carrying out a Legendre transformation

1 G(θ) ) F(h(θ)) + Nθh(θ) 2

(4.6)

Here, h(θ) is the solution of eq 4.5. It is easy to check that ∂G/∂θ ) 0 at the physical value of h ) 0. Carrying out these steps leads to the following result for G in the critical regime (ε* - εc) , (σ2/β)2

(( )

G(R, θ)/kBT = N e1

)

σ2 2 2 θ + (εc - ε*)θ + β 2 2 β 2 R /ls N2 e2 + V 3 (4.7) σ ceff(θ)Nθ R

( )(

)

with e1,2 as numerical constants and with ceff(θ) ) g3c*(β/σ)2 + g4c ln [(4d2(d3)1/2β)/(θσ2)]. Minimization of G(R,θ) with respect to θ reproduces the two limiting cases of the Flory free energy eq 4.5. Note that G(R,θ) has the same form as eq 2.9 if one neglects the logarithmic dependence of ceff on θ (as we will do from here on). Following the discussion in section II, we can interpret φ ) (σ2/β)θ as a line density of branch points along the favored spanning path and h ) (β/σ2)(ε* - εc) as an ordering field conjugate to the order parameter. The Gaussian radius of gyration (that is, in the absence of an excluded-volume interaction) is

( βσ )(Nθ/e )

R0(N, θ) ) lsc

1/2

2

(4.8)

In terms of this expression, the variational free energy simplifies to

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3887

(( )

)

σ2 2 2 θ + (εc - ε*)θ + β N2 R2 + V (4.9) R0(N, θ)2 R3

G(R, θ)/kBT = N e1

(C) Tertiary Interactions and Flory-Huggins Theory. Tertiary-level folding of RNA molecules is determined by the competition between the complementary pairing of nucleotides that were not paired as part of the secondary structure with electrostatic self-repulsion. We will distinguish between nucleotides that have adopted the designed ground-state conformation and those that have not. The interaction between nucleotides that are part of the designed, double-stranded, linear ground state will be treated as a repulsive, excluded-volume interaction. The interaction between nucleotides that are not part of a designed section, and thus are either unpaired or paired via the weak, nonspecific pairing interaction ε, will be treated as a weak, generic, nonspecific attractive interaction. In order to include both forms of the interaction, we will apply the Flory-Huggins mean-field theory for binary mixtures. That means that we treat nucleotides that are part of the designed ground state and those that are not as two different molecular species confined in a volume of radius R. In the Flory-Huggins theory of a binary mixture with respective concentrations (or mole fractions) φA and φB, the contribution to the free energy density from bimolecular interactions takes the form

1 1 ∆g2 /V0kBT ) χAAφA2 + χBBφB2 + χABφAφB (4.10) 2 2 (where V0 is a molecular volume). The coefficients χAA, χBB, and χAB are determined by molecular pair potentials. In the present context, “A” refers to complementary paired nucleotides in the designed ground-state conformation with a concentration of φA ) Nθ/R3 and “B” to nucleotides with a concentration of φB ) N(1 - θ)/R3 that are not in the ground-state conformation. The first term of eq 4.10 then describes the repulsive excludedvolume interaction between specifically paired nucleotides, which will be assumed to be repulsive. The second term of eq 4.10 describes the attractive interaction between nucleotides that are not part of a designed section. The third term describes the interaction between specifically paired nucleotides and nucleotides that are not. In general, the value of χAB is intermediate between χAA and χBB. In the present case, χAB is expected to be repulsive, assuming that paired nucleotides in the designed states are stable. For a coil with a volume proportional to R3, the total second virial contribution to the free energy is then of the form

∆G2 /V0kBT ∝

( )(( N2 R3

1 1 χAB - χBB θ + χBB(1 - θ) + 2 2 1 (χ + χBB - 2χAB)θ2 (4.11) 2 AA

)

)

In Flory-Huggins theory, the first two terms are normally absorbed in a redefinition of the respective chemical potentials of the two components, and only the last term is of importance, but in the present case, the first two terms will play an important role as well and must be retained. We can simplify eq 4.11 as ∆G2/kBT ) (N2/R3)B(θ) with

3888 J. Phys. Chem. B, Vol. 113, No. 12, 2009

1 B(θ) ) Vcθ + Vm(1 - θ) - V0χθ2 2

Schwab and Bruinsma

(4.12)

Here, Vm ) (1/2)V0χBB, Vc ) V0(χAB - (1/2)χBB), and χ ) (2χAB - χAA - χBB) is the “Flory χ” parameter. In the present case, with χAA > χAB > χBB, Vc > Vm, and Vm < 0. The function B(θ), the second virial coefficient of the theory, then has a positive slope over the interval 0 < θ < 1, a boundary-value minimum at θ ) 0, and a zero in the interval 0 < θ < 1. Because ∆G2 can be negative for smaller values of θ, the RNA molecule can condense into a compact, globular state. This condensation or folding-up will play an important role in the following. In order to allow for condensation, higher-order virial terms have to be included as well. We will include only the third virial term

∆G3 /kBT ) C

() N3 R6

(4.13)

For semiflexible chains, with persistence length λ, the dimensionless ratio y0 ) C/λ6, is known to be of order 1 for flexible chains with persistence lengths on the order of the chain diameter, while y0 is small compared to 1 for stiff chains. Singlestranded RNA material has a persistence length ls that is a few times the chain diameter. We will assume here that C ) y0ls6, with y0 on the order of 10-1-10-2. The expression for the elastic free energy must be extended to allow for collapse. Equation 4.5 gave the entropic free energy for the case of swelling, that is, with the actual radius of gyration exceeding the Gaussian radius of the noninteracting system. For collapse, we must allow for the opposite case, namely, a radius of gyration that is less than the Gaussian radius. The associated confinement energy Fc(R) can be obtained from scaling arguments. Assume that Fc(R) only depends on two length scales apart from R: the radius of gyration RG(N) ) aN1/2 of the noninteracting system with a ∼ (ls/σ)(ceff(ε* - εc))1/2 and the BH correlation length ξBH ∼ (ls/β1/2)(ε* - εc)-1/2 of the noninteracting system. We thus assume for the confinement free energy a scaling expression of the form

Fc /kBT ∼ Γ

(

R R , R0(N) ξBH

)

(4.14)

with Γ(x,y) as a dimensionless function of x and y. The entropic energy cost of compression is proportional to the number of “collisions” of the chain with the confining sphere of radius R. For R small compared to R0(N), the number of these collisions must be proportional to the total number of monomers N; therefore, the confinement energy must be proportional to N. It follows that the scaling function must be of the form Γ(x,y) ) x-2H(y), with H(y) as a scaling function of one variable

Fc /kBT ∼

2

( )

R Na H ξBH R2

the correlation length, the confinement free energy should be that of a linear polymer confined inside of a sphere. Since the confinement energy of a linear polymer is inversely proportional to the square of R, it follows that limyf∞ H(y) ∝ const. As a reasonable interpolation formula, we will use H(y) ) a1 + a2y-2. The confinement energy is then of the form

Fc /NkBT ∼ a1

ceffl2s (ε* - εc)

+ a2

σ2R2

ceeffl4s βσ2R4

(4.16)

The second term of eq 4.16 is consistent with an earlier result of Grosberg et al.,14 who showed that the confinement energy of a branched polymer without self-interaction in a sphere is proportional to N and inversely proportional to R4. Adding the entropic free energy to the second and third virial contributions and converting to the fixed θ ensemble, we obtain a variational free energy for the melting of a large, designed RNA molecule in the presence of tertiary interactions, the central result of this paper

(( )

( )( )

a1

ceff

βσ2

( )(

a3

)

σ2 2 2 θ + (εc - ε*)θ + β N σ 2 ceff Nθ + a2 + 4 β (R/ls) (R/ls)2

∆G(R, θ)/kBT ≈ N e1

β σ

2

)

( )(

)

() ()

(R/ls)2 N2 N3 + B(θ) 3 + C 6 (4.17) ceff Nθ R R

Note that the first part of the variational energy depends only on the order parameter; the second part is the sum of the confinement and stretching energies, while the last parts are the first two terms of the virial expansion. V. Phase Diagram The phase diagram can now be constructed by minimizing the variational free energy obtained at the end of the previous section. The parameters of the phase diagram will be the specific pairing energy ε* and the strength, Vm, of the tertiary pairing interaction. We will construct the phase diagram by considering the stability of minima of the free energy where the size R of the molecule scales with the number of monomers N as a linear polymer with excluded volume, a branched polymer with excluded volume, and a condensed globule of uniform density. (A) Linear Polymer. For a linear polymer with an excludedvolume interaction, the radius of gyration R is proportional to N3/5 in Flory theory. In that case, only the first term of ∆G(R,θ)/N is nonzero in the thermodynamic limit. The order parameter is then determined by the minimum of

( )

F(θ) ) e1

σ2 2 2 θ + (εc - ε*)θ β

(5.1)

(4.15)

Limiting forms of the function H(y) are determined by demanding that for R small compared to the correlation length ξBH, the system must behave as a molten-globule; therefore, the confinement free energy must be independent of the complementary pairing energy ε*. The dependence on ε* will cancel only if limyf0 H(y) ∝ y-2. On the other hand, for R large compared to

The order parameter θ1 that minimizes F(θ) vanishes linearly at the critical value of the specific pairing energy

θ1 )

( )

1 β 2 (ε*-εc) 2e1 σ2

The variational energy then takes the form

(5.2)

Flory Theory of the Folding of Designed RNA Molecules

( ) ( )

∆G(R)/kBT ≈ NF(θ1) + a2

Nl2(θ1) R2

+ a3

εc. In that case, the dominant contributions to the variational free energy in the limit of large N are of the form

R2 + Nl2(θ1)

() ()

N2 N3 + C 6 (5.3) 3 R R

B(θ1)

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3889

( )(

( )

∆G(R, θ)/kBT ≈ Ne1

σ2 2 2 β θ + a3 β σ

2

)

(R/ls)2 + ceffNθ

() () 2

N N3 + C (5.8) R3 R6

Vm

Here

( βσ )√θc

l(θ) ) ls

(5.4)

eff(θ)

is an effective persistence length that vanishes at the critical point, as we saw already. This form for ∆G(R) happens to be the mean-field free energy of a polymer near a coil-to-globule transition as the solvent quality is reduced. The “solvent quality” is here determined by the sign of the second virial term B(θ). The minimization of ∆G(R) is simplified by defining the dimensionless “swelling coefficient” R and the dimensionless virial parameters x and y

After minimization with respect to θ

∆G(R)/kBT ≈

( ) β ceff

2/3

() ()

R4/3 N2 N3 + V + C m N1/3l4/3 R3 R6 s

(5.9) Equation 5.9 can be brought into scaling form by defining the scaling variables

R ) R/lsN5/11 R ) R/N l(θ1) 1/2

(5.5a)

x ) K1B(θ)N /l(θ1)

(5.5b)

y ) K2C/l(θ1)6

(5.5c)

1/2

3

with K1,2 numerical factors. The variational energy is now independent of N in these units

1 ∆G(R)/kBT ) NF(θ1)/kBT + (R2 + R-2) + 2 1 -3 1 xR + yR-6 (5.6) 3 6 For positive x, eq 5.6 is minimized by R ∼ x1/5, corresponding to the scaling relation R(N) ∝ N3/5 of a swollen coil, while for negative x, R ∼ (y/|x|)1/3, corresponding to the scaling relation R(N) ∝ N1/3 of a condensed, compact globule. Numerical analysis of ∆G(R), for different values of x, shows that this condensation transition has a van der Waals type liquid-gas critical point at yc ) 1/60. If y is less than yc, then R undergoes a discontinuous jump from a value of order 1 to a value of order 0.1. If y exceeds yc, then the reduction of R as a function of x is continuous. Close to yc, the transition is still quite rapid, but it turns progressively smoother as y increases. We conclude that the condition B(θ1) ) 0 defines a line ε˜ (Vm) in the ε* - Vm phase diagram, where the Flory scaling relation for a linear polymer breaks down. Using eq 5.2, the stability limit B(θ1) ) 0 for Flory scaling is thus given by

( ()

) ( ()

1 β 2 (ε˜ (Vm) - εc) (Vc + |Vm |) 2e1 σ2 1 β 2 1 (ε˜ (Vm) - εc) V0χ 2 2e1 σ2

)

( )

) |Vm | (5.7)

(B) Branched Polymer. If the RNA molecule is an annealed, branched polymer, then the radius of gyration R should scale as N7/13. The branched polymer state can only be stable if ε* )

-2/3

(5.10)

The quality of the solvent is now determined by the sign of Vm. The scaling variable x changes from a large positive to a large negative value when we change the sign of Vm. In scaling form, the free energy is

∆F(R)/N1/3kBT ≈

( ) β ceff

2/3

{R4/3 + xR-3 + yR-6}

(5.11)

For x large and positive, eq 5.11 is minimized by R ∼ x3/13, which corresponds to a branched polymer, while for x large and negative, R ∼ (y/(-x))1/3, which corresponds to a dense globule. Now, there is no discontinuity, only a cross-over between these limiting values. A molten-globule is already collapsed, even in good solvent. Reducing solvent quality only produces a progressive shrinking. We conclude that the line Vm ) 0 does not mark a condensation transition. (C) Condensed Globule. Finally, for a condensed globule, the radius of gyration R is proportional to N1/3. The first term and the last two terms of the variational free energy are then the leading terms in the large N limit. Define the density F ) N/R3 so the last two terms of the variational free energy are of the form B(θ)F + CF2. Minimization of B(θ)F + CF2 with respect to F, for B(θ) negative, and solving for F produces the following variational energy for the order parameter

(( )

F(θ)/NkBT ≈ e1 2

( )

VmN4/11 β x) ceff l3s -2/3 C β y) 6 ls ceff

)

σ2 2 2 1 θ + (εc - ε*)θ B(θ)2 β 4C (5.12)

The first term of eq 5.12 is equal to eq 5.1 with a minimum at θ1 ) (1/2e1)(β/σ2)2(ε* - εc). The second term has a maximum at the zero of B(θ) plus a boundary-value minimum at θ ) 0. The boundary-value minimum corresponds to a condensed

3890 J. Phys. Chem. B, Vol. 113, No. 12, 2009

Schwab and Bruinsma

globule with no native contacts. The stability condition for the boundary minimum is that ε* < ε(Vm) with

ε(Vm) ) εc +

1 |V |(V + |Vm |) 2C m c

(5.13)

If the line ε(Vm) in the ε* - Vm phase diagram lies to the “left” of the line ε˜ (Vm) where B(θ1) ) 0, then F(θ) has a negative slope at θ ) 0 and a positive slope at θ ) θ1. If the specific binding energy lies in the interval ε(Vm) < ε* < ε˜ (Vm), then the minimum of F(θ) lies between θ ) 0 and θ ) θ1. This state corresponds to condensed globule with native contacts. The slope of F(θ) at θ ) 0 becomes less negative as ε* approaches ε(Vm) and vanishes continuously at ε* ) ε(Vm). This suggests that the line ε* ) ε(Vm) marks a continuous melting transition between two condensed globular states, one with a finite value of the order parameter and one with θ ) 0. If, on the other hand, ε(Vm) exceeds ε˜ (Vm), then the θ ) 0 boundary minimum is locally stable over the interval ε˜ (Vm) < ε* < ε(Vm). However, since B is positive over that same interval, there also is a competing swollen coil free-energy minimum according to section V(A). The θ ) 0 condensed globule minimum disappears at the upper bound ε* ) ε(Vm). The line ε* ) ε(Vm) is here not a line of phase transitions but rather a spinodal line where the moltenglobule state loses local stability. Similarly, along the line ε* ) ε˜ (Vm), the swollen coil free-energy minimum loses stability; therefore, this also is a spinodal line where the swollen state loses local stability. In between these two limits, there runs a line where the free energy of a swollen coil with finite order parameter equals that of a condensed globule with zero order parameter. This clearly is a line of first-order transitions. The general topology of the phase diagram is thus determined by the relative location of the two lines ε˜ (Vm) and ε(Vm). There are two cases: (i) Wc > (4Ce1)1/2(σ2/β). If the parameter Vc obeys the condition Vc > (4Ce1)1/2(σ2/β), then ε(Vm) is less than ε˜ (Vm) for negative Vm. Two spinodal lines border a line of first-order phase transitions from a swollen state, with a nonzero value of the order parameter and Flory scaling of the radius of gyration, to a condensed globule where the order parameter is 0. The three lines fuse at Vm ) 0 with the line of continuous melting transitions for Vm positive at a tricritical point. (ii) Wc < (4Ce1)1/2(σ2/β). If Vc < (4Ce1)1/2(σ2/β), then the two curves ε(Vm) and ε˜ (Vm) have an intersection point at Vm ) V†, where the two spinodal lines merge. For |Vm| less than |V†|, the free energy has a single minimum in the interval ε(Vm) < ε* < ε˜ (Vm), and the order parameter is nonzero. The order parameter goes to 0 at a continuous melting transition ε* ) ε(Vm). In this case, the line of second-order phase transitions extends into the regime where the swollen state is unstable. The phase diagrams that combine the results of the previous section are shown in Figures 5 and 6. Experimental studies of the folding of “model” RNA molecules with a linear, designed ground state that were aimed at testing the predicted phase diagrams would be based on the fact that the strength Vm of the tertiary interaction can be changed by changing the concentration of divalent ions in solutions, such as Mg2+.3 The theory predicts that there are two possible scenarios. If the phase diagram of Figure 5 applies, then there should be a threshold Vm ) V† below which a swollen RNA molecule with nonzero order parameter undergoes a first-order phase transition directly into a condensed globule with zero order parameter. If the phase diagram of Figure 6 applies, then an intermediate phase should be encountered that is folded and that has a nonzero order parameter as

Vm is reduced. The transformation to the condensed globule with zero order parameter now should be a continuous transition. Which of the two scenarios applies is a delicate question. The second scenario should apply for Vc < (4Ce1)1/2(σ2/β), with Vc ) V0(χAB - (1/2)χBB). If we use our estimate C ) y0ls6 with y0 on the order of 10-1 to 10-2, then this in effect means that Vc has to be negative. Since χBB is negative, it follows that χAB has to be negative; the interaction between complementarypaired nucleotides and unpaired nucleotides must be attractive for a designed RNA molecule to fold into a condensed stat,e and this attractive interaction must be comparable in strength to the attractive interaction between unpaired nucleotides. Acknowledgment. We would like to thank the NSF for support under DMR Grant No. 0404507. Appendix A: Propagator of Single-Stranded and Double-Stranded Sequences under Tension In this section, we will construct the propagators for both singleand double-stranded sequences under tension. The singlestranded sequences are allowed top branch. (1) Double-Stranded Sequence. We will model a doublestranded RNA sequence as a freely jointed chain (FJC) with linker length (or Kuhn length) ld. It is well-known that wormlike chain description is rather more accurate for double-stranded nucleic acid strands than the FJC. However, for low tensions, the two descriptions produce the same force-extension relation provided we identify ld as twice the persistence length ξ (about 64 nm for double-stranded RNA). The number N of segments of the FJC is thus related to the total number of base pairs k by 2Nξ ) kb, with b as the spacing between base pairs. The partition function of the FJC under tension (without selfinteraction) is given by

∫ d bRe

Z(k) ) eεk

3

b F·b R/kBT

(

N

) eεk )e

3

N

i

i)1

sinh(Fld /kBT) Fld /kBT

)

(3)

i

i)1

∫ ∏ 4π1 sin θ dθ dφ exp

(

εk

)( [∑( ) ]

N

∫ ∏ 4π1 d bl δ(|bl | - l) δ ∑ bl - bR i

i

kb/ld

N

i)1

i

i)1

)

Fld cos θi kBT

(A.1)

Here, b F is the tension, and ε is the free energy per pair in units of kBT in the absence of tension (we do not distinguish here native from non-native pairs). The Laplace transform, or propagator, is then ∞

Π(P) )

∑ exp(-2kP)Z(k)

k)0 ∞

)

∑ exp[-2kP + kε(F)]

(A.2)

k)0



1 2P - ε(F)

with k as the statistical weight of the end points. The forcedependent binding energy is here

ε(F) ) ε +

(

sinh(Fld /kBT) b ln ld Fld /kBT

)

(A.3)

In the limit of low tensions, where Fld / kBT , 1, ε(F) reduces to

Flory Theory of the Folding of Designed RNA Molecules

ε(F) ≈ ε +

bldF2 6(kBT)2

(A.4)

We will always express our results in terms of the dimensionless tension f ) Fls/kBT, with ls as the Kuhn length of single-stranded RNA. In that case, ε(f) ≈ ε + b3f 2, where b3 ) bld/6l2s is a dimensionless number that is large compared to one. b) (2) Single-Stranded Sequences. Let the propagator G(P,k of a single-stranded RNA sequence be defined as the FourierLaplace transform, with respect to N and b R, of the statistical b) of a chain of N monomers. The initial monomer weight Z(N,R is located at the origin and the Nth monomer at b R. The PGG propagator for an RNA strand is

G0(P, b k) =

δQ2

+

(l2c /6)k2

mentary-paired sequences of equal length. The self-energy Σ(P) is then the discrete Laplace transform of H(N,N). The partition function of the bubble is decomposed as a sequence of “separable” bubbles. Inside of a separable bubble, each strand pairs only with itself, but there is no pairing between the two strands. The separable bubbles are linked by linear sequences where the two strands are nonspecifically associated. Each strand of a separable bubble carries half of the applied tension, while the linear sequences carry the full tension. Define H0(N,M) to be the partition function for a separable bubble with N monomers in the top strand and M in the bottom strand. Let Γε(2n) be the statistical weight of a linear paired sequence of 2n monomers, bound with the nonspecific binding energy ε. Summing over all possible separable bubble insertions gives for the full partition function of a molten-globule

H(N, M) ) H0(N, M) +

1 + 2c(ε)δQ(P - Q)1/2

(A.5) Here, ls is the Kuhn length of a single-stranded sequence and Q ) ε + (ε2 + 8σ)1/2/4 is the location of the singularity of the propagator and thus the free energy per monomer. The dimensionless parameter σ is, effectively, the statistical weight of a “fork”, where two single strands combine to form a double strand. When a tension force b F is applied to the two end points, we must include the mechanical work of the tension in the partition function. The tension-adjusted propagator is



J. Phys. Chem. B, Vol. 113, No. 12, 2009 3891

∑ H0(n, m)Γε(2l)H(N - n - l, M - m - l) (B.1)

n,m,l

Define the double discrete Laplace transform with respect to both N and M as ∞

H(P1, P2) )



e-P1N-P2MH(N, M)

(B.2)

N,M)0

with a similar definition for the double Laplace transform H0(P1,P2) of a separable bubble. In terms of the Laplace transforms, the recursion relation eq B.1 is solved as

b b b b

3 R)eik · R+F · R/kBT G(P, b k ) ) d RG0(P, b b /kBT) ) G0(P, b k - iF 1 ) Ω(f) + (l2c /6)k2 - (1/3)iflckx+

(A.6)

2c(ε)δQ(P - Q)1/2 b with f ) Fls/kBT and Ω(f ) ≡ δ2Q - (1/6)f 2. The propagator G(P,k )0) has both a pole and a branch cut singularity. At a critical force, defined by Ω(fc) ) 0, a phase transition takes place when the pole and branch cut fuse, as investigated by Muller et al.15 Beyond the critical force, tension destroys side branching. For us, the crucial feature of eq A.6 is that the location of the branch cut at P ) Q does not have any dependence on the tension. The tension only is exerted on the unpaired “root” of the propagator, which contains a finite number of bases, while the side branches, which contains on the order of N bases, are “shielded” from the tension, as long as we remain below the critical force. The free energy per monomer of the side branches thus remains proportional to Q.

H(P1, P2) )

H0(P1, P2) 1 - Γε((1/2)[P1 + P2])H0(P1, P2)

(B.3)

By performing a double inverse Laplace transform, we enforce an equal number of monomers N in the two strands that make up the molten bubble

H(N, N) )

( 2πi1 ) ∫ dP e 2

NP1

1

∫ dP2eNP H(P1, P2) (B.4) 2

The Bromwich integrals over P1 and P2 along the imaginary axis must be performed with the real parts of P1 and P2 to the right of any singularity. The desired self-energy of a molten bubble insertion is the Laplace transform of this result ∞

Σ(P) )

1 ∑ e-2NP( 2πi ) ∫ dP1eNP ∫ dP2eNP H(P1, P2) 2

1

2

N)0

(B.5)

Appendix B: Calculation of Σ(P) In this Appendix we calculate the self-energy insertion Σ(P) of a molten section inside of a linear double-stranded RNA molecule under tension. We start from H(N,M), the partition function of a molten section, or “bubble”, of two strands of unequal lengths N and M held together at their end points. We must allow only nonspecific pairing of both each strand with itself and between the two strands, as explained in section III. The two strands are under the same (dimensionless) tension f. We will require afterward that N ) M because the bubble was formed from two single strands that originally were comple-

In order to carry out these steps, we express H0(N,M) in terms b) of an RNA strand with fixed ends given of the propagator G(P,k by eq A.6

H0(P1, P2) )

σl3s 3

2(2π)

∫ d3kG(P1, bk )G(P2, -bk )

(B.6)

where we include a Boltzmann weight for the end points of the bubbles. The inverse Laplace transform, eq B.3, is dominated

3892 J. Phys. Chem. B, Vol. 113, No. 12, 2009

Schwab and Bruinsma

b)s; by the branch cut singularities near P ) Q of the G(P,k b therefore, we expand the G(P,k) propagators in eq B.6 at around P1 ∼ Q and P2 ∼ Q. The result is

dP 1 exp[-x(β√P - Q + (P - Q)) + N(P - Q)] = ∫ 2πi √

[

H (P1, P2) ) R(f) - β[√P1 - Q + √P2 - Q] (B.7) 0

Here, in the notation of Appendix A, β ) c(ε)(σ/2QδQ) is a dimensionless parameter that is of order 1 in the molten-globule regime. The function R(f) is defined as

R( f ) )

σ 6 3δ 2(2π) Q

∫ d3y

|

1 1 + (y b - ibf /δQ2√6)2

|

Equation B.3 now reads



R( f )(2Q - ε( f ))

∑ H(N, N)e-2NP ∝ σ ∫1

dN exp[-2N(P N3 ∞ (xβ)2 (B.15) Q)] 0 dx(xβ)2 exp c3xf2 2N

N)0

It was shown by PGG that R(f ) 0) ) (2Q - ε(f)0)). The divergence of H(P,P) at P ) Q leads to the N1/4 scaling law of the radius of gyration. In the limit of weak but finite tensions, we expand

3

2

H(P1, P2) = R( f )(2Q - ε( f ))

∫0∞ dx exp(-x{P1 + P2 -

Q)]

dΣ(P)f)0

2

)]

2

]

{exp[c3xf 2] - 1 - c3xf 2}

σ (d - d2(P - Q)1/2) β 1 c3 ∝ σ 2 ln(d3 /(P - Q)) β

Σ(P)f)0 ∝ df 2

(P - Q)) + N(P - Q)

[

exp[-2N(P ∫0∞ dN N3

After performing the integrals over x and N, the first and second terms are on the order of

We obtain

[∫

f2 ∝ σ

(B.16)

dΣ(P)f)0



df

2

∫0∞ dx(xβ)2 exp - (xβ) 2N

ε( f ) - (R( f ) - β(√P1 - Q + √P2 - Q))} (B.11)

H(N, N) = R( f )(2Q - ε( f ))exp(2NQ) × ∞ dP exp(-x(β√P - Q + dx exp(x(R( f ) + ε( f ) - 2Q)) 0 2πi

]

For f ) 0, the second integral (over x) is proportional to N3/2. It follows that, in the first integral, the small N cutoff cannot be set to 0 since the first integral (over N) then would diverge for small N. To avoid this divergence, we add and subtract the first two terms of a Taylor expansion of the exponential in the force, after which the integral over N is convergent

Σ(P) - Σ(P)f)0 -

24

[



(B.10)

with d2 ) [6 /4(2π) ]∫ d y[(1 + (2/3)y )/|1 + y | ] as a positive number. For nonzero tensions, H(Q,Q) no longer diverges at P ) Q. The double inverse Laplace transform eq B.5 is performed by first expressing H(P1,P2) as 3

]

-(xβ)2 + c3f2x 2N (B.14)



P1 + P2 - ε( f ) - (R( f ) - β(√P1 - Q + √P2 - Q)) (B.9)

1/2

exp

For this expression to be valid, we must demand that the integral over x converge for x values that are small compared to N, which is the case as long as f is small compared to β. The expression for the self-energy becomes

Σ(P) )

H(P1, P2) =

σ 2 f δQ3

[

2

∫0∞ dx (xβ) N3

2

(B.8)

R( f ) = (2Q - ε(f ) 0)) + d2

]

for N large compared to x. If we define c3 ) b3 + d2(σ/δQ3), we obtain

H(N, N) ∝ σe2NQ 3/2

π xβ exp(-(xβ)2 /4N) (B.13) 2N3/2

()

(B.17)

with d1-4 as positive numerical constants. The last term can be expressed as a function G(z) of the scaling variable z ) (c3f 2)/[β(P-Q)1/2]

(B.12)

In the small f limit, we can neglect the tension dependence of the prefactor, which is on the order of σ in the molten-globule regime. The tension dependence makes a crucial contribution to the integral of the exponential over x. The Bromwich integral inside of the square brackets of eq B.12 can be performed by the stationary phase method

( σβ )Σ(P) ≈ (d

( )

1

- d2(P - Q)1/2) +

( )(

)

c3 f 2 c3 f 2 c3 f 2 ln(d3 /(P - Q)) + G (B.18) β β β(P - Q)1/2

The scaling function G(z) is here defined by

Flory Theory of the Folding of Designed RNA Molecules

G(z) )

1 4πz

[ ] 2

∫0∞ VdV3/2 exp[-2V] ∫0∞ dq q2 exp - q2

×

{exp[qV1/2z] - 1 - qV1/2z} (B.19) For z small compared to 1, G(z) ∼ z, while G(z) diverges as 1/(zc - z) for a zc on the order of 1. References and Notes (1) For a review of RNA folding, see: Brion, P.; Westhof, E. Annu. ReV. Biophys. Biomol. Struct. 1997, 26, 113. (2) Higgs, P. G. Q. ReV. Biophys. 2000, 33, 199. (3) Das, R.; Kwok, L. W.; Millett, I. S.; Bai, Y.; Mills, T. T.; Jacob, J.; Maskel, G. S.; Seifert, S.; Mochrie, S. G.; Thiyagarajan, P.; Doniach, S.; Pollack, L.; Herschlag, D. J. Mol. Biol. 2003, 332 (2), 311–9. (4) Russell, R. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 155.

J. Phys. Chem. B, Vol. 113, No. 12, 2009 3893 (5) For a contrary view, see: Thirumalai, D. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 11506. (6) Higgs, P. G. J. Phys. I (France) 1993, 3, 43. (7) de Gennes, P. G. Biopolymers 1968, 6, 715. (8) Zimm, B. H.; Stockmayer, W. H. J. Chem. Phys. 1949, 17, 1301. (9) Bundschuh, R.; Hwa, T. Phys. ReV. Lett. 1999, 83, 1479. (10) For example, see: Mukhopadhyay, R.; Emberly, E.; Tang, C.; Wingreen, N. S. Phys. ReV. E 2003, 68, 041904. (11) For a review of the role of electrostatics in the folding of RNA, see: Misra, V. K.; Draper, D. E. Biopolymers 1998, 48, 113. (12) Gutin, A. M.; Yu, A.; Shaknovich, E. I. Macromolecules 1993, 26, 1293. (13) Bundschuh, R.; Bruinsma, R. Phys. ReV. Lett. 2008, 100, 148101. (14) Grosberg, A. Y.; Gutin, A. M.; Shakhnovich, E. I. Macromolecules 1995, 28, 3718. (15) Muller, M.; Krzakala, F.; Mezard, M. Eur. Phys. J. E 2002, 9, 67.

JP8068893