Sequence Dependence of Viral RNA Encapsidation - The Journal of

Apr 26, 2016 - Phone: +1-310-825-8539. This article is part of the William M. Gelbart Festschrift special issue. Cite this:J. Phys. Chem. B 120, 26, 6...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/JPCB

Sequence Dependence of Viral RNA Encapsidation Joshua Kelly,† Alexander Y. Grosberg,‡ and Robijn Bruinsma*,†,¶ †

Department of Physics and Astronomy, University of California, Los Angeles, California 90095, United States Department of Physics and Center for Soft Matter Research, New York University, New York, New York 10003, United States ¶ Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095, United States ‡

ABSTRACT: We develop a Flory mean-field theory for viral RNA (vRNA) molecules that extends the current RNA folding algorithms to include interactions between different sections of the secondary structure. The theory is applied to sequence-selective vRNA encapsidation. The dependence on sequence enters through a single parameter: the largest eigenvalue of the Kramers matrix of the branched polymer obtained by coarse graining the secondary structure. Differences between the work of encapsidation of vRNA molecules and of randomized isomers are found to be in the range of 20 kBT, more than sufficient to provide a strong bias in favor of vRNA encapsidation. The method is applied to a packaging competition experiment where large vRNA molecules compete for encapsidation with two smaller RNA species that together have the same nucleotide sequence as the large molecule. We encounter a substantial, generic free energy bias, that also is of the order of 20 kBT, in favor of encapsidating the single large RNA molecule. The bias is mainly the consequence of the fact that dividing up a large vRNA molecule involves the release of stored elastic energy. This provides an important, nonspecific mechanism for preferential encapsidation of single larger vRNA molecules over multiple smaller mRNA molecules with the same total number of nucleotides. The result is also consistent with recent RNA packaging competition experiments by Comas-Garcia et al.1 Finally, the Flory method leads to the result that when two RNA molecules are copackaged, they are expected to remain segregated inside the capsid.

I. INTRODUCTION The genome of small RNA viruses is typically encoded on linear chains of 2000−4000 RNA nucleotides (2−4 kilobase or kb). The nucleotides of these chains can pair with each other in an enormous number of tree-like arrangements: the “secondary structures”. The free energy of a secondary structure is commonly computed using folding algorithms such as the Vienna package.2 Figure 1 shows two minimum free energy secondary structures obtained using Vienna. The first (Figure 1, top) is for the 4.1kb vRNA molecule of the Q β phage virus, which will serve as our representative vRNA molecule. The second is that of an isomer of this same molecule obtained by randomly shuffling the sequence of nucleotides (Figure 1, bottom). The secondary structure of the randomized isomer clearly is much less compact than that of the vRNA molecule itself. The computed minimum free energy structures belong to a large, nearly degenerate ensemble of structures. For example, Vienna finds about 3 × 105 secondary structures within 0.17 kBT of the ground state. Nevertheless, the same observation holds for typical members of the ensemble. Secondary structure motifs encoded in the primary sequence of vRNA molecules play important roles in the lifecyle of viruses and are evolutionary conserved.3 Examples are protein and ribosome binding sites. By applying folding algorithms to the genome molecules of various viral species, Yoffe et al.4 provided evidence that the overall compactness of vRNA molecules also is encoded in the genome. During viral assembly, © XXXX American Chemical Society

Figure 1. (Top) Minimum energy secondary structure of the vRNA genome molecule of the Q β virus. (Bottom) Same for the case of an isomer of the same molecule with a randomized sequence of nucleotides.

genome molecules of the “T = 3” class of small ss RNA viruses must be packaged into a (roughly) spherical capsid whose inner Special Issue: William M. Gelbart Festschrift Received: February 25, 2016 Revised: April 23, 2016

A

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

concentration inside the Q β virus with its 4 kb genome is not much below that value. At these high densities, the free energy is very likely dominated by the interaction between segments of the secondary structure. Our method is based on applying Flory mean-field theory.7 The Flory variational free energy is the sum of the configurational free energy Fconf(R) of an ideal polymer, with no interaction among the monomers, plus the free energy Fint(R) of a solution of N disconnected monomers that are confined to a sphere of radius R. This feature allows us to continue using folding algorithms for the computation of Fconf(R). The sequence-dependence of the free energy is confined to Fconf(R), whereas the dependence on the interaction of RNA nucleotides with each other and with solvent molecules is confined to Fint(R). A flowchart of the method is shown in Figure 2.

volume has a radius of about ∼10 nm. On the other hand, cryo-EM images of the 2.7 kb vRNA molecules of a T = 3 virus (CCMV) reveal5 that vRNA molecules in solution adopt a variety of extended, branched structures with a maximum extent of the order of 40 nm. The Q β genome vRNA molecules presumably have an even greater extent. A significant amount of compaction is thus required during encapsidation. Yoffe et al.4 argued that the work of compression that has to be performed is reduced by increasing the compactness and amount of branching of the secondary structure, which could provide a selective advantage toward compactification. This is supported by field-theory techniques and simple scaling arguments about the effects of branching on assembly.6−8 It was proposed that selection toward more compact secondary structures can be achieved through synonymous mutations that exploit Watson−Crick coding redundancy.9 Finally, it was found experimentally that packaging of polymers by virus-like particles becomes more efficient when they are branched10 (see also simulation11). The dependence of the work of compression on the sequence of RNA molecules is also expected to play an important role in “packaging competition” experiments. The assembly of a virus in the cytoplasm of an infected cell takes place under conditions where the concentration of cellular mRNA molecules can be much higher than that of vRNA molecules. For example, a mammalian cell typically contains about 105 to 106 mRNA molecules12 while the number of vRNA molecules inside an infected cell typically is of the order of 103 or less. For in vivo viral assembly to be an effective process, the probability of packaging mRNA material instead of vRNA material must be very low. Selective packaging of vRNAs over mRNAs is conventionally attributed to the presence of short, virusspecific, vRNA sequences that bind to capsid proteins (“hairpin packaging signals”).3 Recent in vitro packaging competition experiments by Comas-Garcia et al.1 that took place under condition of thermal equilibrium reported the presence of a strong, entirely nonspecific bias in favor of packaging single large non-native RNA molecules over multiple copies of non-native smaller RNA molecules. (Note: typical mRNA molecules have an average size in the range of 1−2 kb, so multiple mRNAs are required if they are to substitute for the vRNA molecule.) One explanation might be that this is due to the Law of Mass Action according to which there is an entropic bias to maximize the number of molecules in solution.13 Elucidating this nonspecific bias toward packaging large RNA is one of the goals of this paper. In the next sections, we develop a quantitative theory for the sequence-dependence of the work of compacting vRNA molecules from solution into viral capsids. Directly applying RNA folding algorithms to compute the work of compression is not possible because they do not include interactions between different sections of the secondary structure, such as electrostatic repulsion between different complementary paired sequences of the secondary structure, as well as pairing between nucleotides that are unpaired as part of the computed secondary structure. On the one hand, these interactions determine the spatial, three-dimensional (“tertiary”) structure of a vRNA molecule in solution. (Note: see, for example, ref 14.) On the other hand, a spherical volume with a 10 nm radiusroughly the inner radius of the capsid of a small RNA viruscan maximally accommodate about 5000 RNA bases in the form of a hydrated crystal of duplex RNA fragments.15 This amounts to a monomer concentration of about 1.2 nucleotides per nm3. The nucleotide

Figure 2. Flowchart of the method proposed in this paper to obtain a sequence-dependent Flory variational free energy for viral RNA molecules.

Starting from a vRNA molecule with known primary sequence, we first determine the low-energy secondary structures using the Vienna folding algorithm. The second step is to apply the coarse-graining method of refs 4,16, which transforms secondary structure diagram into branched polymers in the form of tree graphs. The third step is to apply the Kramers−Fixman matrix theory17 to compute the configurational free energy of the branched polymer. The Kramers matrix of a branched polymer is determined by its geometrical connectivity. The trace of the Kramers matrix gives the radius of gyration the noninteraction branched polymer, which is known as Kramers’ Theorem. Fixman17 showed that the probability distribution of the radius of gyration, and hence the configurational free energy, is determined by the eigenvalues of the matrix. We use a formulation of the Kramers−Fixman theory developed by one of the present authors and his co-workers18,19 that allows us to explicitly compute the probability distribution. The probability distribution turns out to be dominated by the largest eigenvalue. The fourth step is the inclusion of a sequenceindependent monomer−monomer interaction free energy to construct the full variational free energy. The minimum of the variational free energy F(R) = Fconf(R) + Fint(R) then produces the equilibrium radius of gyration R(N). Using this method, we compute in Section II the sequence dependence of the work of compression of RNA molecules. In Section III, we further develop the method and apply it to packaging competition between RNA molecules under condition of thermal equilibrium. We will encounter in that section an important defect of Flory theorythe fact that the Flory free energy is not extensiveand develop a method to circumvent the problem. B

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

II. FLORY THEORY FOR VIRAL RNA MOLECULES The vRNA molecule of the Q β virus and its randomized isomers will be the test molecules of the method. The coarsegraining procedure that transforms their secondary structures (see Figure 1) into tree graphs is illustrated in Figure 3.

Figure 3. Coarse-graining method. Bubbles composed of sequences of unpaired nucleotides are collapsed into end points, connector points and branch points represented by the red dots. The dots are connected by complementary paired sections represented as straight lines.

Figure 5. Each loop of unpaired RNA nucleotides of this secondary structure represents a different monomer type. Solid lines are covalent sugar−phosphate bonds between RNA nucleotides. Light gray lines are hydrogen bonds between complementary paired nucleotides. Hairpin loops that contain a single hydrogen bond correspond to end points. Internal loops contain more than four sugar−phosphate groups. Connector points are internal loops that contain two hydrogen bonds. Branch points are loops that contain three or more hydrogen bonds (indicated as multiloops). Reproduced with permission from ref 20. Copyright 2005 Springer.

Loops of unpaired nucleotides are collapsed into points and sequences of paired nucleotides into bonds connecting the points. Points can be classified as end points attached to one bond (four in Figure 3), connector points attached to two bonds (one), and branch points connected to three or more bonds (one). Figure 4 shows the tree graphs generated from the two secondary structures of Figure 1. (Note: For the tree graphs, we used the rendering method of https://graph-tool.skewed.de/. Our plotting software is given in http://matplotlib.org/citing.html.) We will treat the points of the tree graphs as the “monomers” of a branched polymer, linked by bonds composed of complementary-paired sequences. The tree graph of a 4 kb vRNA molecule, like the Q β genome, has about N ≃ 300 of monomers and bonds. Note that two RNA molecules with the same number of nucleotides but different secondary structures in general will have different N values. About one-third of the nucleotides of a typical vRNA molecules are left unpaired in Vienna and thus are absorbed into monomers. This means that the N bonds are composed of the remaining two-third of monomers, leading to an average bond size of about 4.5 pairs of nucleotides per bond. Figure 5, adapted from ref 20, shows how the different types of monomers can be classified in terms of the number of unpaired RNA nucleotides per loop and of the number of noncovalent hydrogen bonds per loop. The mean-field theory of the next sections will treat monomers as statistical objects. Though bonds can be treated as rigid, this is not the case for the monomers. If unpaired sequences are stretched out to their maximum length, then the average spacing between monomers

is about 3.0 nm while the average spacing is about 1.5 nm if the monomers are treated as geometrical points. Below, we will refer to the points of the graph as “monomers” and to the links connecting the points as “bonds”. It will be assumed that bonds can freely swivel around connector points and branch points. Configurational Free Energy. The next step is to obtain the probability distribution of the radius of gyration for a given tree graph. Start by evaluating the N−1 by N−1 Kramers matrix G(k,m) defined in Figure 6. Let the N vectors ri⃗ be the position vectors of the N monomers. The square of the radius of gyration radius is then 1 R2 = ∑ ( ri ⃗ − rj⃗)2 2N 2 i , j (1) If there is no interaction between the monomers, then the bond vectors that connect adjacent monomers are uncorrelated random variables, reflecting the statistical variation of the different monomer types as well as the statistical variation of the bond lengths. In Appendix A, we show that if these bond vectors are uncorrelated Gaussian random variables, then the

Figure 4. Tree graphs obtained from the vRNA genome molecule of Figure 1 (left) and from the shuffled isomer (right), using the method shown in Figure 3. C

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 7. Eigenvalues λp of the connectivity matrix G(k,m) for the Q β genome and a randomly shuffled isomer. The eigenvalues are numbered from low to high.

Figure 6. Definition of the Kramers matrix G(k,m). The bonds k and m divide the graph into three subgraphs. The subgraph K that is marked pink is removed from the full graph if bond k is cut. Let it contain K(k) monomers. Similarly, the subgraph M is removed from the full graph if bond m is cut. It contains M(m) monomers. The Kramers matrix is defined as G(k,m) = K(k) M(m)/N2. In this particular example, the matrix element G(k,m) = 7 × 11/362. From ref 19.

mean radius of gyration is proportional to the trace of the Kramers matrix: N−1

⟨R2⟩ = a 2

∑ G (m , m) m=1

(2)

2

Here, a is the mean square of the bond vectors, which will be treated as a fitting parameter. This is Kramers Theorem. (Note: As a check on the numerics, we computed the trace of the G matrix for the case of a tenth generation dendrimer. An analytical expression is available for dendrimers21 with excellent agreement.) The full probability distribution of the radius of gyration is given by 2 1 P(R2) = e−iz(R / a) 2(z)dz (3) 2π



Figure 8. (Top) Probability distributions obtained from eq 3 for the Q β genome (blue) and the shuffled isomer (green). Inset: Kramers Theorem radii of gyration. (Bottom) Conformational free energy of the Q β genome (blue) and of a shuffled isomer (green). The dashed lines (red) show the asymptotic limits eq 6 for the two cases.

where N−1

2(z) =

∏ p=1

−3/2 ⎛ 2λp ⎞ z⎟ ⎜1 − i 3 ⎠ ⎝

(4)

The λp are here the N−1 eigenvalues of the Kramers matrix, whereas the entropic configurational free energy is then Fconf(R) = −kBT ln P(R2). The probability distribution P(R2) of the radius of gyration is thus determined by the eigenvalues. These eigenvalues are all real and can be computed numerically. The distribution of eigenvalues for the Q β genome and its shuffled isomer are shown in Figure 7. Most eigenvalues are quite small. The largest eigenvalue λm for Q β is about 2.7, whereas the largest eigenvalue of a shuffled isomer is about twice as large. For the shuffled isomers, the maximum eigenvalue was found to scale with the number of monomers as λm(N) ∝ N0.6. The Kramers Theorem radius of gyration of the Q β genome is ⟨R2⟩ ≃ 7.8 a2 while for the randomized isomer, ⟨R2⟩ is about twice as large. For the physical range of values for a, these values are much lower than the measured solution hydrodynamic radius of the Q β genome (about 12 nm22). Numerical computation of the complete probability distribution is delicate because of the rapidly oscillating integrand of eq 3. We developed a statistical sampling method, based on the fact that the integral is the convolution of Gamma probability distributions, as discussed in Appendix B. We show the

resulting probability distribution function for the Q β genome in Figure 8 (top) together with that of a randomized isomer. The most probable value for R2 is close to the Kramers Theorem radius of gyration. The corresponding configurational free energies βFconf(R) = −ln P(R2), plotted in Figure 8 (bottom), have minima at the most probable value of R2. The increase of the numerical error that is visible for larger R2 is due to the fact that P(R2) decays very rapidly for large R, which means that the statistical sampling method becomes less reliable. In the large R2 limitthe regime where the sampling method breaks downwe use an analytical asymptotic expression. For large R2, the integral in eq 3 is dominated by the singularity of 2(z) that is closest to the origin in the complex z plane, which is at z1 = −3iλm/2 with λm the largest eigenvalue of the G matrix. Using the steepest descent method to compute the integral leads to a Γ probability distribution function for (R/a)2: ⎛ R2 ⎞ P⎜ 2 ⎟ ⎝a ⎠ D

≃ R →∞

⎛ 3R2 ⎞ ⎜− 2 ⎟ exp Γ(3/2)(2λm /3)3/2 ⎝ 2a λm ⎠ (R /a)Cλ

(5)

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B where Cλ = ∏p≠m(1−λp/λm)−3/2. The corresponding conformational free energy is βFconf (R ) ≃

⎛R⎞ 3R2 − ln⎜ ⎟ + constant ⎝a⎠ 2a 2λm

f int(ρ) the free energy density of a uniform bulk solution of free monomers with a concentration ρ minus the translational entropic free energy, which is separately represented by Fconf. For the present case, ρ = N/V with N the number of monomers of the branched polymer and V the volume occupied by the vRNA molecule, which we will approximate as (4/3)πR3 with R the radius of gyration. (Note: The number of segments N of a given primary sequence generated by the coarse graining method varies depending on secondary structure. The interaction energy of Flory theory makes more sense if the same value of N is used for all secondary structures corresponding to the same primary sequence. For the Q β primary sequence, we use for the interaction energy N = 327 segments for 4125 nt, which corresponds to a conversion factor of 0.079 segments .) nucleotide In principle, we require an expression for f int(ρ) that is valid for a vRNA molecule in solution and for a vRNA molecule packaged in the interior of an assembled virus. For polymers in solutions for which the monomers are soluble, the interaction free energy density can be expanded in a virial expansion of the form:

(6)

In Figure 8 (bottom), we compare the numerical method with the asymptotic expression (red dashed line). For the randomized molecule, the asymptotic expression approaches the numerical result when (R/a)2 is larger than about 10. For the vRNA molecule, the asymptotic expression approaches the numerical value when (R/a)2 is larger than about 25. Eigenvalue Distribution. We already noted that the folding algorithms produce a large number of secondary structures that are nearly degenerate. The largest eigenvalue of the branched tree graph must be closely similar for the large majority of lowenergy states in order for eq 6 to have meaning. We evaluated the maximum eigenvalue for the 2 × 104 secondary structures of a 720 nt fragment of the Q β genome that were within 20 kBT of the ground state. We sampled the entire distribution using Boltzmann sampling. The results are shown as a scatter plot in Figure 9: The average maximum eigenvalue for states that are

βfint (ρ) ≃ vρ2 + wρ3 + ...

(7)

Here, v is the second virial coefficient and w the third virial coefficient. Since vRNA molecules are soluble under physiological conditions, both v and w should be positive under such conditions (as for other highly charged macromolecules, validity of the virial expansion for vRNA is limited to high salt concentrations23). Because of the complexity of the monomers in our problem, it is not practical to try to compute v and w, and they will be treated as fitting parameters. In the limit that the vRNA molecule is highly swollen, the third virial term can be neglected, in which case the total Flory free energy density reduces to β F (R ) ≃

3R2 N2 v + 2a 2λm (4/3)πR3

(8)

Since the Kramers Theorem radius of gyration is much smaller than 10 nm, the asymptotic expression eq 8 for the configurational free energy is used in the Flory theory. Minimization with respect to R leads to

Figure 9. Statistical distribution of the maximum eigenvalue of low energy secondary structure of a 720 nt fragment of the Q β genome, showing that the vast majority of structures have closely similar maximum eigenvalues. Horizontal axis: total Vienna free energy in kcal/mol. Vertical axis: maximum eigenvalue. The average maximum eigenvalue 1.13 ± 0.03 is indicated by the dashed line. The blue highlight marks the minimum energy state.

R eq ≃ ((3/8π )va 2N 2λm)1/5

(9)

The corresponding equilibrium free energy is βF(R eq) ≃ 9.1

within 20 kBT of the ground state is 1.13 ± 0.03. Repeating this exercise for shuffled Q β isomers, we find an average maximum eigenvalue of 2.05 ± 0.05. The statistical width is sufficiently narrow for us to conclude that for an ensemble of states that are within 20 kBT of the ground state, the mean eigenvalue can be treated as representative of the eigenvalues of the ensemble. However, if one significantly increases the 20 kBT cutoff above the ground state, then the width of the distribution of eigenvalues dramatically increases to the point that this is no longer the case. We will assume that thermal fluctuations that exceed the cutoff can be excluded on laboratory time scales, in effect a constrained equilibrium state. Free Energy of Interaction. In Flory theory, the free energy of interaction Fint of a polymer has the general form f int(ρ)V, with V the volume occupied by the molecule and

N 4/5v 2/5 a6/5λm3/5

(10)

Recall that λm(N) ∝ N0.6 for shuffled isomers of vRNA molecules. The free energy is then proportional to N0.44. The Flory free energy is thus not extensive, a problem we will revisit in a later section. Next, a spherical volume with a 10 nm radiuscomparable to the inner radius of a T = 3 capsidcan maximally accommodate about 5000 RNA bases in the form of a hydrated crystal of duplex RNA fragments.15 This amounts to a very high monomer concentration of about 1.2 nucleotides per nm3. The nucleotide concentration inside the Q β virus with its 4 kb genome is not much below that value. The equation of state of RNA material at these high concentrations has actually not yet been explored, but it is expected to be rather complex. Measurements of the equation of state of solutions of DNA E

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B duplex sequences with fixed length report a sequence of transitions associated with the development of the liquid-crystal and solid states as the nucleotide concentration increases.24 The equation of state of RNA material is probably no less complex. Though we thus lack an expression for f int(ρ) that applies at high monomer concentrations concentration range, the Flory− Huggins (FH) interaction free energy density βfint (ρ) = (1/b)(1 − bρ)ln(1 − bρ) + χρ(1 − bρ) (11)

has been used to describe solutions of flexible polymers at higher densities with good results.25,26 The first term of the FH expression is the mixing entropy of the solvent molecules with 1/b the maximum concentration when all solvent molecules have been squeezed out. The second term takes into account monomer−monomer, monomer−solvent, and solvent−solvent pair interactions with χ known as the Flory parameter. By comparing the low concentration limit of the FH free energy density with the virial expansion, it follows that v = b(1−2χ)/2 and w = b2/6. Since for soluble monomers v must be positive, χ must be less than 1/2 in the present case. Range of Parameters. The variational free energy has three fitting parameters in the FH approximation: the RMS bond length a, the maximum concentration 1/b, and the Flory χ parameter. The maximum concentration can be estimated directly. Let NT be the total number of nucleotides of the molecule. If these nucleotides are packed in duplex form as a hydrated crystal, then they occupy a volume Vmin equal to NT/1.2 nm3. Let N be the number of bonds of the tree graph. If the maximum bond concentration 1/b is equated to N/Vmin then 1/b ≃ 1.2(N/NT) nm−3. For Q β, this leads to b ≃ 10 nm3. (Note: the maximum concentration 1/b and the other two fitting parameters are in general different for different RNA molecules.) Estimating the Flory χ parameter and a2 is problematic in the general case. We can only state that since vRNA molecules are soluble in solution, χ must be less than 1/2 though probably not much less than 1/2 given the fact that RNA molecules condense rather easily.27 Estimating the value of a2 also is difficult, though we saw that physical range for a is between 1.5 to 3.0 nm. If the radius of gyration has been measured then one can be more explicit. For example, the radius of gyration of the CCMV1 vRNA molecule has been measured by neutron scattering to be about 18 nm.28 By minimizing the Flory free energy in the dilute limit with respect to R, and demanding that the value Req at the minimum equals the measured value, one obtains a relation between a and the second virial coefficient and hence χ. The relation is plotted in Appendix C. Restricting a to the physical range between 1.5 to 3.0 nm then this leads to the condition that for CCMV χ should lie in the interval [−1.8, − 0.1]. The value of all three fitting parameters could in principle be deduced from a measurement of the probability distribution of the radius of gyration obtained from a scattering experiment. The solution radius of gyration of Q β has not yet been measured. For the numerical estimates below, we will assume χ = 0.1 and a = 1.5 nm, with a corresponding equilibrium radius of gyration Req = 15 nm. Work of Compression. The Flory free energy of Q β for these parameters is shown in Figure 10. The figure compares the F(R) computed using the FH free energy density with the F(R) computed using the van der Waals (vdW) free energy

Figure 10. Flory free energy for the Q β genome (color online). Solid blue straight line: entropic configurational free energy (eq 6). Dashed green curve: Flory−Huggins interaction free energy (eq 11). Dashed blue line: van der Waals interaction free energy. Solid red curve: Flory free energy for the Flory−Huggins case. Solid black curve: same for the van der Waals case. Vertical red dashed line: radius corresponding to the maximum concentration. Vertical green dashed line: inner capsid radius of a typical T = 3 RNA virus.

density βfint (ρ) = ρ ln

1 (1 − bρ)

− Aρ2 with A adjusted to produce

the same radius of gyration as the FH free energy density. For large R, the two cases differ by an unimportant constant offset but for small R there are rather substantial differences between the two free energies (of the order of 50 kBT). The work of compression ΔW = β(F(Rc) − F(Req)) is defined as the work required to compress the vRNA molecule into a capsid of radius Rc starting from its equilibrium configuration in units of the thermal energy. Figure 11 shows

Figure 11. Work of compression of the Q β genome (green) and that of one of its randomly shuffled isomers (black). The dashed lines uses the Flory−Huggins interaction energy and the solid curves the van der Waals interaction energy. The inset shows the differential work of compression using the same notation.

the work of compression for the Q β genome (green) and that of one of its randomly shuffled isomers (black) for the FH case. If we take Rc = 10 nm, then the work of compression of the Q β genome is of the order of 100 kBT. Note that there are substantial differences between FH and VdW casesin the range of 50 kBTfor higher levels of compression. There is no reason to believe that either the FH or the VdW interaction free energies are very reliable at high packing densities. In Section II, we mentioned that the Vienna algorithm produces a very large number of secondary structures with differences in free energy that are only a fraction of the thermal energy. Since the work of compression is much larger than the thermal energy, it seems likely that the secondary structure with F

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

At high densities, Fint(Rc) is strongly dominated by the sequenceindependent interaction free energy, which depends only on the monomer density. Since the two molecules are isomers they have the same monomer densities so the i and j interaction energy terms at Rc cancel. ΔΔW(Rc) can then be written as

the lowest free energy is not same under compression as it is in the absence of compression. To address this question, we compressed the secondary structures of a 720 nt fragment that were within 20 kBT of the ground state. Figure 12 shows what

ΔΔW (R c)i , j = −[Fint(R eq)i − Fint(R eq)j ]+ [Fconf (R c) − Fconf (R eq)]i − [Fconf (R c) − Fconf (R eq)]j

(13)

If ΔΔW(Rc) is computed for two equations of state that have the same second and third virial coefficients but that differ in the high density limit, then the contributions coming from the configurational free energiesthe second line of eq 13are the same. Any difference between the two equations of state thus must come from the term [Fint(Req)j − Fint(Req)i]. If the two equations of state have the same virial coefficients, and thus produce the same radii of gyration, then this term is the same as well for the two equations of state. We conclude that the differential work of compression is nearly independent of the nature of the equation of state at high densities. The differential work of compression ΔΔW(Rc)i,j can be readily evaluated in terms of the virial coefficients, the maximum eigenvalues and the equilibrium radii of gyration of the two isomers:

Figure 12. Compressing the different secondary structures of a 720 nt fragment. Blue crosses indicate minima of the Flory free energy with respect to the radius. The purple and blue lines represent the smallest and largest eigenvalues of the sampled secondary structures. The cyan and green lines are representative states from the ensemble. The green and blue lines are an example of two secondary structures that undergo level crossing. The minimum free energy secondary structure (thick red line) is however the same for all values of R/a.

⎛ 3(R 2 − R 2) ⎞ N2 N3 eq c ΔΔW (R c)i , j ≃ −Δi , j ⎜⎜ + v 3 + w 6 ⎟⎟ 2 R eq 2a λ R eq ⎠ ⎝

happens for the 1000 states with the lowest eigenvalue and what happens with the 1000 states with the largest eigenvalues. There is indeed a complex pattern of level crossing, but the minimum free energy secondary structure remains the same for all R. In Flory theory, there apparently is no compressioninduced change in the secondary structure. Differential Work of Compression. We define the difference ΔΔW(Rc)i,j between the work of compression of two molecules i and j to be the differential work of compression. The sequencedependence of the work of compression is determined by this quantity. The inset in Figure 11 shows the differential work of compression computed for the Q β genome molecule and for a shuffled isomer, which is of the order of 20 kBT. The difference between ΔΔW(Rc)i,j values computed using FH and vdW is only about 5% when Rc = 10 nm, which indicates that the differential work of compression can be determined with much better accuracy than the work of compression itself. In a packaging competition experiment where capsid assembly takes place in a solution that contains equal concentrations of Q β genome molecules and of a randomized isomer without any specific interactions between the RNA molecules and the capsid proteins, the concentration ratio of packaged isomers and packaged Q β molecules equals the Boltzmann factor exp(−βΔΔW(Rc)i,j) under equilibrium conditions. For βΔΔW(Rc)i,j ≃ 20, this is negligible. The notion that the differential work of compression is nearly independent of the form of the equation of state at higher density can be demonstrated analytically:

(14)

where Δi,j(..) stands for the difference between the expression in brackets for molecules i and j (i.e., using in each case the appropriate maximum eigenvalue λ and the appropriate Req obtained by minimization of F(R) in eq 8) using again the appropriate maximum eigenvalue. The dependence of ΔΔW(Rc)i,j on Rc involves only the first term so ΔΔW(Rc)i,j should be proportional to (R/a)2. If, in addition, the solvent quality is so high that (i) one can neglect the third virial coefficient and (ii) R ≫ Rc, then one can use eq 13 to obtain a closed expression for the differential work of compression ⎛ N 4/5v 2/5 ⎞ ΔΔW (R c)i , j ≃ −Δi , j ⎜9.1 6/5 3/5 ⎟ ⎝ a λ ⎠

(15)

Though two isomers in general have different values for N, v, and a, we expect that for two vRNA molecules with the same number of monomers, the difference between the two maximum eigenvalues will dominate the differential work of compression.

III. EQUILIBRIUM PACKAGING COMPETITION Finally, we apply the Flory method to examine packaging competition when arrangements with different numbers of packaged RNA molecules are in competition. Assume that a solution contains capsid proteins (“CPs”) as well as three species of RNA molecules: full-sized RNA molecules composed of N monomers, which we will call RNA(N), and two types of smaller molecules, which we will call RNA(N1) and RNA(N2) composed of N1 and N2 monomers, respectively. The primary sequences of the two smaller RNA molecules are composed of two sections of the primary sequence of the larger RNA molecule such that together they would form a complete primary sequence, which means that N1 + N2 = N. The two smaller

ΔΔW (R c)i , j = = [F(R c) − F(R eq)]i − [F(R c) − F(R eq)]j = [(Fconf (R c) + Fint(R c)) − (Fconf (R eq) + Fint(R eq))]i − [(Fconf (R c) + Fint(R c)) − (Fconf (R eq) + Fint(R eq))]j (12) G

DOI: 10.1021/acs.jpcb.6b01964 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

unconfined polymer in half is then equal to ΔFs(N) = θ ln (N/4). The free energy cost of scission of a polymer confined to a container can be obtained as well because in that case correlations are strongly screened.30 The effect of the scission is then to introduce an extra degree of freedom, so ΔFcs = −ln (Vc/b) with Vc the container volume. Assume that the three types of RNA molecules also are all linear and can be treated as SARWs. The combination molecule is a linear molecule obtained by joining one end of the RNA(N1) molecule to one end of the RNA(N2) molecule. The packing ratio is then

molecules equilibrate in solution to adopt their own minimum free energy secondary structure. The capsid proteins can assemble into capsids whose interior has a nonspecific affinity for RNA monomers. This provides the thermodynamic drive for the capsid to package RNA. Capsids can package either one RNA(N) molecule or one RNA(N1) plus one RNA(N2) molecule. We will use a simple lattice model with lattice constant a where RNA monomers and CPs both occupy one site of the lattice. (Note: see ref 12 for other examples of the use of lattice models for dilute solutions of macromolecules.) Let ψi be the fraction of sites occupied by RNA molecules of species i that are free in solution and ϕα the fraction of sites occupied by proteins that are part of a capsid that contains a particular combination α of RNA molecules. For the present case, α = N refers to a capsid that contains an RNA(N) molecules and α = N1 +N2 to a capsid that contains an RNA(N1) plus an RNA(N2) molecule. In Appendix C, it is shown that the lattice model leads to a packaging ratio that obeys the Law of Mass Action. Assembly of a capsid that contains both an RNA(N1) and an RNA(N2) molecule can be viewed as a chemical reaction with RNA(N1) and RNA(N2) as reactants. The concentration of the productthe assembled capsidmust then be proportional to the product of the RNA(N2) and RNA(N2) concentrations. Specifically ⎛ ϕN + N ⎞ ⎛ ψN ψN ⎞ ⎜⎜ 1 2 ⎟⎟ = ⎜⎜ 1 2 ⎟⎟e−ΔΔG ⎝ ϕN ⎠ ⎝ ψN ⎠

⎛ ϕN + N ⎞ ⎛ ψN ψN ⎞⎛ V ⎞⎛ N ⎞θ ⎜⎜ 1 2 ⎟⎟ ≃ ⎜⎜ 1 2 ⎟⎟⎜ c ⎟⎜ ⎟ ⎝ ϕN ⎠ ⎝ ψN ⎠⎝ b ⎠⎝ 4 ⎠

Since RNA(N1 + N2) is identical to the RNA(N) molecule, the differential work of compression in eq 17 is zero. For N in the range of 300 and θ = −0.16, the multiplicative factor (N/4)θ is about 0.5. If, in a packaging experiment, the molecular weights of the long and short molecules are comparable, then ψN1 ∼ ψN2 ∼ ψN/2. We assume RNA concentrations in the range of nanomolar (nM), a ∼ nm as the size scale of CPs and monomers, and N = 103 monomers. (Note: One nM roughly corresponds to a molecule per cell. It is also a typical concentration scale for in vitro vRNA experiments.) The monomer site occupation probabilities ψ are then in the range of 10−5 so the first term in brackets is of that order. V Next, bc ∼ 102 , while the last factor is of the order of one. The packing ratio is then of the order of 10−3. There is thus a significant, purely entropic, bias in favor of packing a single large molecule over two shorter ones. This is actually a consequence of the Law of Mass Action, as mentioned in Section I. (Note: One concern with this equation is that the packing ratio strongly depends on the choice of microscopic cutoff a. If however the site probabilities are converted to concentrations, then the packing ratio depends on a only through the combination a3/b, which is in general of the order of one.) According to eq 10, the Flory free energy of shuffled isomers of vRNA molecules depends on the number of monomers as N0.44. If one would use eq 10 instead of the scaling form to compute the scission free energy, one obtains ΔFs ≃ 0.4βFeq ≃ 30 for Q β, which leads to a multiplicative factor of the order of exp 30 in eq 18 for the packing ratio. This enormous overestimate is a consequence of the well-known defect of Flory theory that Flory free energies are not extensive. (Note: For quenched branched polymers, the free energy is not extensive because of the stored elastic energy.) Flory theory is not reliable for estimating scission free energies. The method of temporarily linking the two molecules allows us to separate out the scission free energy for special treatment. Branched Polymers. Next, consider the packaging of branched polymers of the type that are generated by the coarse graining method. Many different combination molecules can be obtained by linking an end point of RNA(N1) to one of RNA(Ns). For two equal-sized fragments of the Q β genome, there are 1672 ways of constructing a combination molecule by joining end points. In Figure 13, we show the combination molecule with the largest eigenvalue and the one with the smallest eigenvalue. The combination molecule with the largest eigenvalue is also the combination molecule with the lowest

(16)

Here ΔΔG = ΔGN1 + N2 − ΔGN is the differential assembly free energy for the two forms of assembly. Positive values for ΔΔG mean that encapsidation of RNA(N) is favored and negative values that packaging of RNA(N1) + RNA(N2) is favored. In Appendix C, it is shown that ΔΔG can be expressed as ΔΔG = −ΔFs + ΔFsc + ΔΔW

(18)

(17)

Here, ΔFs is the scission f ree energy, defined as the free energy difference between a state where RNA(N1) and RNA(N2) are free in solution and state in which the two molecules are combined into a combination molecule obtained by joining one point of RNA(N1) to one point of RNA(N2). (Note: For a discussion of the scission free energy of linear polymers, see ref 29.) We will denote this combination molecule by RNA(N1 + N2). Next, ΔFcs is the scission free energy when RNA(N1) and RNA(N2) are confined inside the capsid. Finally, ΔΔW = ΔW(N1 + N2) − ΔW(N) is the differential work of compression between the combination molecule RNA(N1 + N2) and the RNA(N) molecule. In the next section, we will apply the Flory method to compute ΔΔG. Scission Free Energy. Before discussing the general case, it is useful to first practice on a model polymer problem for which the scission free energy is known. The partition function Z(N) of models of linear polymers and annealed branched polymers depends on the number of monomer N according to the scaling form Z(N)|N→∞ ∝ μNN−θ.30 If the polymer is modeled as a self-avoiding random walk in three dimensions (SARW), then 4.57