Sequence Dependent Circularization of DNAs - American Chemical

Dipartimento di Chimica and Dipartimento di Genetica e Biologia Molecolare - Istituto Pasteur,. Fondazione Cenci Bolognetti, UniVersita` di Roma “La...
0 downloads 0 Views 581KB Size
9968

J. Phys. Chem. 1996, 100, 9968-9976

Sequence Dependent Circularization of DNAs: A Physical Model to Predict the DNA Sequence Dependent Propensity to Circularization and Its Changes in the Presence of Protein-Induced Bending P. De Santis,*,† M. Fua` ,† M. Savino,‡ C. Anselmi,† and G. Bocchinfuso† Dipartimento di Chimica and Dipartimento di Genetica e Biologia Molecolare - Istituto Pasteur, Fondazione Cenci Bolognetti, UniVersita` di Roma “La Sapienza”, 00185 Roma, Italy ReceiVed: September 5, 1995; In Final Form: March 15, 1996X

A simple physical model based on statistical thermodynamics is proposed to predict the DNA sequence dependent propensity to circularization, even in the presence of bend inducing proteins. Assuming the first order elasticity and an uniform force field in solution, the model requires the evaluation of the ground state energy difference between circular and linear forms as well as the difference of their canonical ensemble average energy on account of curvature and twisting fluctuations. These quantities are analytically obtained using the Parseval equality in the Fourier space and adopting a DNA curvature model previously proposed by us. The circularization propensity as defined by the J factor is obtained in terms of the intrinsic curvature, the persistence length, and the torsional constant. The comparison with the experimental data is very satisfactory in a range of DNA length between 100 and 10 000 bp. The model can also be extended to evaluate the sequence dependent energy cost of looping deformation of a DNA tract also in the presence of CAP or other regulatory proteins, repressors, and operators, as in the first step of the transcription mechanism as well as to evaluate “allosteric” effects in protein binding on topologically constrained DNAs.

Introduction The circularization reaction in macromolecules has been the object of investigations by different authors for several decades. The first theory of circularization equilibrium in polymers was formulated 45 years ago by Jacobson and Stockmayer1 and further developed by Flory et al.2 20 years later. They introduced the ratio between the equilibrium constants of circularization and bimolecular reactions, as a measure of the macromolecule chain propensity to circularization, from which the effects of constraints associated with the formation of the cyclic species can be obtained. The nature of the terminal reacting groups is immaterial on the assumption that their effects cancel, because they influence equivalently both the reactions. More recently, several authors3-13 (Olson, Shore et al., Shore and Baldwin, Shimada and Yamakawa, Hagerman, Levene and Crothers, Kotlarz et al., Koo et al., Lavigne et al., Kahn and Crothers) adopted the framework of the Jacobson-Stockmayer model and developed analytical and Monte Carlo methods to investigate the circularization kinetics and equilibrium of DNA molecules that differ in size and sequence. The study of DNA circularization is particularly important, given the very peculiar physicochemical and topological properties of the circular form with respect to the linear and its biological role. Natural occurring DNAs are circular as the plasmids and many viral DNAs. A fundamental biological mechanism, as transcription, seems to require DNA looping, that may be considered as a virtual circularization. Thus, part of the effect of DNA binding proteins on transcription, such as CAP, was interpreted as due to the increased looping propensity as a result of the protein binding induced in-phase-curvature.10,12 From such studies the propensity of DNA to circularize, known as J factor, appears to be very sensitive to the amount and phase of the DNA bending. As a consequence, the †

Dipartimento di Chimica. Dipartimento di Genetica e Biologia Moleculare. X Abstract published in AdVance ACS Abstracts, May 15, 1996. ‡

S0022-3654(95)02609-8 CCC: $12.00

circularization measurements were assumed to be a good and useful method to investigate the sequence dependent bending of DNAs. In addition, these studies were a source to evaluate the bending and twisting elasticity moduli, by comparing the theoretical results on ring probability of suitably designed DNA fragments in solution, with experimental data. In the present paper a general solution to the problem of sequence dependent DNA circularization is advanced. An analytical formulation of J based on statistical thermodynamics was obtained. The ground state and the canonical ensemble energies for DNA circularization are evaluated in terms of an intrinsic curvature function, assuming first order elasticity. The J factors of a large number of DNA tracts, investigated by different authors and in our laboratory, were satisfactorily predicted; in addition, J changes produced by curvature inducing proteins, such as CAP, appear to be in good agreement with experimental data. The Model Circularization of a DNA tract, characterized by an intrinsic superstructure, is the result of stochastic and deterministic motions of the double helix. The circularization reaction is competitive with oligomerization of a DNA carrying sticky ends. At low DNA concentration, the reaction yields essentially circles and linear dimers. The ratio between the equilibrium constants of circularization and dimerization is generally adopted as a measure of the circularization probability of a DNA tract. It is also accepted that the ratio of the kinetic constants converges to that of the corresponding equilibrium constants when the fraction of linear DNA is less than 0.1.4,5 For DNA substrates carrying similar ends, the dimerization constant is practically independent on the length as well as on the sequence of the double-stranded DNA, whereas the circularization constant strongly depends on the number of base pairs, on their nature, and on their distribution. The circularization and dimerization reactions and the corresponding equilibrium © 1996 American Chemical Society

Sequence Dependent Circularization of DNAs

J. Phys. Chem., Vol. 100, No. 23, 1996 9969

constants are given by

L a C KC ) [C]/[L]

(1)

2L a D KD ) [D]/[L]2

(2)

L, C, and D represent the linear, circular, and dimeric DNA tract, whereas [C], [L], and [D] indicate the corresponding concentrations. The J factor is defined as

J)

KC [C][L] ) KD [D]

(3)

Several years ago, Levene and Crothers8 developed the macrocyclization equilibrium theory early proposed by Jacobson and Stockmayer1 and afterward by Flory et al.,2 in the case of DNA. They considered the standard free energy change ∆G0C for the cyclization reaction 1 as

to this problem, Levene and Crothers resorted to the application of Monte Carlo methods.8 More recently, during our investigations on DNA curvature, we tried to obtain the conditional probabilities (8), by adopting a simple physical model based on statistical thermodynamics, in terms of the ground state as well as the ensemble average conformational energy differences between the circular and the linear form. We considered the solution as a continuous medium acting on the DNA molecules, circular or linear, in the different conformations with a same constant and uniform force field. This is justified by the slight local deformations involved in the circularization. This assumption and the practical independence of J from the chemical factors of the reacting ends allow us to consider DNA molecules as formally noninteracting. J can then be conveniently expressed in terms of the canonical partition functions of the circular, linear, and dimeric forms and of their ground state energies

(C) 0 ∆GC0 ) ∆G* - RT ln[W(0)Γ(C) 0 (1)Φ0,1 (τ )dVdγdτ] (4)

where W(0)dV is the probability of finding both ends of the molecule in an element of volume dV centered at the origin, Γ(C) 0 (1)dγ is the conditional probability for the correct orientation of the reacting terminals in the cyclic form, and Φ(C) 0.1 (τ0)dτ is the conditional probability for the correct twist angle. Similarly, for the dimerization reaction 2 (D) 0 ∆GD0 ) ∆G* - RT ln[N/VΓ(D) 0 (1)Φ0,1 (τ )dVdγdτ] (5)

where N is the number of molecules and V is the volume. If we consider N equal to the Avogadro’s number and V, the unit volume, all the species are in their standard state concentration, 1 M. Recalling the definition of J (3)

(

)

(C) (C) 0 ∆GC0 - ∆GD0 W(0) Γ0 (1)Φ0,1 (τ ) ) RT N Γ(D)(1)Φ(D)(τ0)

J ) exp -

0

(6)

0,1

If a Gaussian model of the end to end distribution is assumed

W(0) )

( ) 3 2π〈r2〉

3/2

(7)

(D) 0 Γ(D) 0 (1)Φ0,1 (τ )

J)

( )

1 3 N 2π〈r2〉

(10)

∆E0 ) EC0 + EL0 - ED0

(11)

where

The terms in eq 11 should contain the chemical and conformational energy contributions, as well as the solution interactions; however, since the chemical reaction at the ends of the DNA tract is the same in the case of circularization and dimerization, we can cancel the chemical contribution. Furthermore, if we consider that the solution molecules interact with the local DNA structure that remains rather invariant in the circularization, we can cancel also the terms containing solvent and counterions interactions; finally, the minimum conformational energy of the dimer can be considered equal to twice that of the linear form, so that the only relevant contributions to ∆E° are the ground state conformational energy of the circular DNA chain with respect to that of the linear form; in other words ∆E° represents the minimum conformational energy of circularization

(8)

(

(9)

This agrees with the experiments for long DNA molecules; however, for shorter chains, J formulation needs the factor (8) to be evaluated. This was attempted by Shimada and Yamakawa who developed an analytical solution for twisted wormlike chains6 that account for some experimental data. However, further experiments showed that sequence dependent effects are dominant in short and intermediate chain lengths. Because of the large complexity of finding an analytical solution

(12)

Under the same conditions, also the ratio between the partition functions reduces to that between the corresponding conformational and configurational (concentration dependent) partition functions, because the other contributions cancel. This ratio is related to the ensemble average values of the conformational and configurational free energy functions

approximates to unity, so that eq 6 simply reduces to 3/2

( )

∆E0 ) EC0 - EL0

where the unperturbed average mean square end-to-end distance 〈r2〉 is conveniently given in terms of the number of bp, n, by the Kratky-Porod relation.19 For long DNA molecules, the latter factor in eq 6 (C) 0 Γ(C) 0 (1)Φ0,1 (τ )

QCQL ∆E0 exp QD RT

J)

)

(

) ( )

QCQL 〈∆S〉 〈∆G〉 〈∆E〉 ) exp ) exp exp QD RT RT R

(13)

〈∆S〉 ) 〈∆SC〉 - 〈∆SD〉

(14)

〈∆E〉 ) 〈∆EC〉 - 〈∆ED〉

(15)

where

and

〈∆SC〉, 〈∆EC〉, 〈∆SD〉, and 〈∆ED〉 refer to the circularization and the dimerization reactions, respectively.

9970 J. Phys. Chem., Vol. 100, No. 23, 1996

De Santis et al.

In order to evaluate the entropy and energy terms, the circularization and dimerization reactions were considered as occurring in two virtual steps. The first corresponds to bringing the reacting ends in an elementary volume dV without considering their mutual orientations. It corresponds to

〈∆SC〉(1) ) R ln[W(0)dV]

(16)

N 〈∆SD〉(1) ) R ln dV V

(17)

and

[ ]

according to the r-dependent terms in Levene and Crothers’ eqs 4 and 5.8 For this step, only the circularization requires a conformational energy contribution, because the linear molecules that react in dimerization freely move in the solution. The second step corresponds to the fulfilling of the orientational constrains for a correct ring closure. The same entropy contribution is involved in circularization and dimerization, but, as above, only circularization requires an energy cost. Therefore the global process involves

V 〈∆S〉 ) 〈∆SC〉(1) - 〈∆SD〉(1) ) R ln W(0) N

[

]

(18)

∆E ) ∆Eb + ∆Et )

nb nt ∆Tw 2 (23) 〈|CC - CL|2〉 + 2 2 n

( )

where CC and CL are the static curvature functions of the circular and linear DNA chain, b is the apparent bending force constant (b ) RTP/l ) 79.4 kcal/mol rad-2), t the apparent torsional force constant (t ) 88.8 Kcal/mol rad-2) and ∆Tw, the twisting difference of the circular and the linear forms , namely the twisting number difference times 2π. We can express the curvature of a DNA chain as a complex function of the sequence number, s

C(s) ) |C(s)|eiφ(s)

(24)

where |C(s)| is the bending per bp at sequence point s and φ(s) is its phase with respect to an axis in the plain of the first bp and pointing toward the mayor groove. This function for a linear DNA can be calculated from the sequence, as previously published,14-18 by integrating along the sequence the slight deviations from the canonical B-DNA structure due to the differential interactions in the 16 different dinucleotide steps and averaging them over one turn. Evaluation of the Ground State and Ensemble Average Energy

and

〈∆E〉 ) 〈∆EC〉 ) 〈EC〉 - 〈EL〉

(19)

so that, as ∆E°, also the ensemble average energy term 〈∆E〉 contains only the conformational contributions of the circular and the linear forms. In particular 〈EC〉 represents the ensemble average conformational energy with respect to its ground state of a circular DNA, with the ends in the proper orientation and confined in an elementary reaction volume, dV but formally noninteracting, because the chemical energy of interaction was canceled with that of the dimer. Assuming a Gaussian distribution of the end-to-end distance

〈∆S〉 ) R ln

(

) [( )

W(0) V 3 ) R ln N 2π〈r2〉

3/2

( )]

+ ln

V N

(20)

where the unperturbed average mean square end-to-end distance 〈r2〉 is conveniently given in terms of the number of bp, n, by the Kratky-Porod formula19

(

〈r2〉 ) 2Pnl 1 -

P nl 1 - exp nl P

(

( )))

(21)

A persistence length P ) 450 Å and a virtual bond l ) 3.4 Å are adopted (see Figure 1). By combination of eqs 10, 12, 13, 19, 20, and 21, we obtain

J)

( )( ) ( V 3 N 2π〈r2〉

3/2

) ( )

〈∆E〉 ∆E0 exp RT RT

exp -

(22)

The theoretical prediction of the J factor requires the evaluation of the conformational energy terms, namely, the difference of the ensemble average energy of the circular and linear forms as well as that between the corresponding ground states. A suitable approximation of such energy differences is given by the first order elastic energy required to distort the linear into the corresponding circular form. This energy is proportional to the sum of the square curvature and twisting difference integrated on the whole sequence. Thus the elastic energy change between the cyclic and the linear forms is given per bp mol as

We have previously shown that the problem of determining the static curvature function for a circular form related by minimum square deviation (i.e., minimum deformation elastic energy) to that of a given linear form can be analytically solved in the Fourier space.17 For a planar curve the condition of ring closure can be expressed by imposing the zero amplitude of the curvature function expressed in Fourier series, to be 2π/n (n is the number of bp), without changing its phase, and the +1 and -1 amplitudes to be zero. These constraints are in fact the boundary conditions for the circular forms: the first one assures that the sum of the curvature on the average plane of the circular form is 2π, and the latter, the cyclic periodicity.17 The Parseval equality,20 valid for functions expressed in orthonormal series, ensures us that the curvature function of the circular form is related to the linear one by minimum square deviation when all the other terms of the Fourier series are left unchanged. In fact, on the basis of this theorem the dispersion of two functions is related to the sum of the square Fourier amplitudes, namely to a sum of all-positive contributions. It is minimum when the maximum number of such differences are zero. In our case this is obtained assuming that all the Fourier amplitude differences are zero except those relative to the boundary conditions. This is an exact solution for planar chains, and a good approximation for DNAs of low writhing number. In any case, geometrical refining of the ring closure involves negligible energy changes. Hence we can write the difference between the curvature function of the circular and the linear form as

CC(s) - CL(s) )

(2πn - |A |) exp(iφ ) 2πis -A A exp(n ) L0

L1

AL0

L-1

(25) (2πis n )

exp

where φAL0 is the phase angle of AL0; AL0, AL1, and AL-1 are the corresponding Fourier series complex amplitudes of the linear DNA curvature function

Sequence Dependent Circularization of DNAs

ALk )

1

J. Phys. Chem., Vol. 100, No. 23, 1996 9971

( )

n

∑CL(s) exp n 1

2πiks

(26)

n

maximum fluctuations of all the other terms Ak and Bk to be equivalent in the circular and the linear forms and hence to simplify the corresponding terms in the ratio

() [

Therefore the ground state energy difference in eq 22 becomes

∆E0 )

nb nt ∆Tw 〈|CC - CL|2〉min. + 2 2 n

2

( )

qC qL

(27)

| |) | | | | } ( )

{(

2

nt ∆Tw 2 + 2 n

+ AL1 2 + AL-1 2

(28) where the contribution of the twisting energy term was obtained by adjusting the twisting number of the starting linear form to both the superior and inferior next integral numbers, distributing the difference uniformly along the DNA tract, and selecting the minimum of the total circularization energy. We now need to evaluate only the term 〈∆E〉 ) 〈EC〉 - 〈EL〉 of eq 22, namely the difference between the ensemble average energy of the circular and linear DNA, due to the curvature and twisting fluctuations from their ground state. It can be obtained in terms of the ratio of the pertinent molecular partition functions qC/qL. Using the Parseval equality n/2

〈|C - C0|2〉 )

∑ k)-n/2

|Ak - A0k |2

(29)

and

( ) ∆Tw

n/2

2

)

n

∑ k)-n/2

|Bk - B0k |2

(30)

The corresponding molecular partition function can be obtained by summing over all the possible states

q)

∑ states

[ (

exp -β

nb

n/2



2 k)-n/2

|

|

Ak - A0k 2 +

nt

n/2



2 k)-n/2

|B - B | )] 0 2 k

k

(31) where β ) 1/RT. We can substitute the summation over states with an integral over the complex plane of the Ak and Bk, so that eq 31 reduces to a product of definitive integrals between the ground state and one with maximum fluctuation

(

)

∫ exp -β 2 |Ak - A0k|2 d(Ak - A0k) × ∏ k)-n/2 n/2

q)

bn

(

)

∫ exp -β 2 |Bk - B0k|2 tn

d(Bk - B0k ) (32)

The value of each integral is easily evaluated in the complex plane after transformation in polar coordinates (see Figure 2); as e.g.

∫ exp(-βbn2 |Ak - A0k|2)d(Ak - A0k) )

|

| )]

2π bn 2 1 - exp -β Ak - A0k Max (33) βbn 2 Let us consider now the ratio between qc, relative to the circular form, and qL, relative to the linear form. Since the boundary conditions for circularization affect only the Fourier terms with k ) 0, +1, and -1 in the curvature function and only the term with k ) 0 in the twisting, we can reasonably consider the

[

(

] [

1 - exp(-βbC0)

1

1 - exp(-βbL0

]

1 - exp(-βaCk)

∏ ) k)-1 1 - exp(-βa

Lk)

(34)

where

i. e.

nb 2π ∆E0 ) - AL0 2 n

)

aLk )

bn bn 0 2 0 2 |max; aCk ) |ACk - ACk |max |A - ALk 2 Lk 2

(35)

tn tn 0 2 0 2 |max; bC0 ) |BC0 - BC0 |max |B - BL0 2 L0 2 The value of 〈∆E〉 is now easily obtained as bL0 )

∂ 〈∆E〉 ) - (ln(qC/qL)) ∂β and, according to eq 34 〈∆E〉 ) bL0 (eβbL0 - 1)

-

bC0 (eβbC0 - 1)

1

+



aLk

βa k)-1(e Lk

- 1)

(36)

-

aCk (eβaCk - 1)

(37) The values of bL0 and of aLK are expected to be much higher than those of the circular form. The corresponding terms in eq 37 are neglected with respect to those of the circular form. In fact we can estimate the average fluctuation of the bending per bp as 2l/P, where P is the persistence length and l is the distance between two consecutive bp. This corresponds to a Fourier amplitude fluctuation of 1.5 × 10-2 rad-2; assuming this or an higher value for the maximum fluctuation we find that all the terms corresponding to the linear form in eq 37 converge rapidly to zero for n > 100 bp. Considering the twisting terms we obtain an analogous result. Therefore eq 37 reduces to 〈∆E〉 ) aC1 aC-1 bC0 aC0 - βa - βa - βb - βa C0 C C (e - 1) (e 1 - 1) (e -1 - 1) (e C0 - 1) (38) All these four contributions depend on DNA length, n, and on the maximum fluctuation of the zero and first Fourier amplitudes of the circular form. Thus, the ensemble average energy difference between the circular and the linear forms contains only the energy contribution pertinent to the Fourier amplitudes restricted by the boundary condition of circularization. This justifies also the cutoff of the factors with k * 0, (1 in eq 34. The analysis of the energy difference between the circular and linear forms due to curvature and twisting fluctuations is shown in Figure 3 for fluctuation parameters changing from 1.0 × 10-6 to 1.0 × 10-3. These fluctuations allow the cycles to explore nonplanar and nonzero writhing forms. It should be noted that the maximum amplitudes fluctuations appear multiplied by n in all the expressions; as a consequence, the wholecycle fluctuation increases linearly with the chain length as can be expected for a distribution of local random deformations along the sequence. We tentatively assumed an average value of the curvature and twisting fluctuations of 1.3 × 10-5 rad-2, a value much smaller than that adopted for the linear forms (this value should be zero if we consider only planar forms). It is the only

9972 J. Phys. Chem., Vol. 100, No. 23, 1996

De Santis et al.

Figure 1. Kratky-Porod 〈r2〉 against n and its bilogarithmic derivate.

Figure 4. Asynthotic behavior of the J factor for twisted wormlike chains, namely straight DNAs with random or uniform sequences. The modulated oscillations are due to the periodic phasing of the twisting. b indicates the Shore and Baldwin experimental data.5

Figure 2. A scheme of the range of Fourier amplitude fluctuation in the complex plane.

) 1/2nb(2π/n)2. (The comparison of the experimental and theoretical data taking into account the sequence dependent ∆E° is shown in Figure13.) Therefore, adopting in a first approximation, the value of 1.3 × 10-5 rad-2 for all the Fourier amplitude fluctuations in eq 38, we can calculate the J factor from the sequence of DNA tracts with a bp number ranging from 100 to 10 000.

(

) (

)

〈∆E〉 ∆E° exp (39) RT RT where 〈r2〉 is represented by the Kratky-Porod expression (21), ∆E°, the sequence dependent term, is the ground state circularization energy (28), and 〈∆E〉 is the difference of the canonical ensemble average elastic energy between the circular and the linear forms, due to the curvature and twisting fluctuations, as given before (38). J(nM) ) 5.5 × 1011 〈r2〉-3/2 exp -

Results and Discussion

Figure 3. Energy difference between the circular and linear forms due to bending and twisting fluctuations, for maximum fluctuation parameters ranging from 1.0 × 10-6 to 1.0 × 10-3.

adjustable parameter, where the model is fitted with experimental J data in the asymptotic regions of twisted wormlike chains. Adopting the value of 1.3 × 10-5 rad-2, we obtained the asymptotic behavior of our model for twisted wormlike chains, namely straight DNAs with random or uniform sequences. It satisfactorily fits experimental data, as shown in Figure 4, and agrees with others theoretical models.3-6 In this case the ground state circularization energy was set to a value of 1/2nb(2π/n)2, corresponding to that of a perfectly circular DNA. The apparent oscillations are due to the periodic phasing of the twisting that gives rise to a superperiodicity, superimposed to tenfold oscillations, because of the nonintegral DNA periodicity (〈ν〉 ) 10.4). The amplitude of oscillations reduces rapidly increasing the DNA length. This diagram is practically equal to the analytical solution by Shimada and Yamakawa6 for sequence independent DNA circularization and agrees closely with the Monte Carlo results by Levene and Crothers.8,9 It fits very satisfactorily the Shore and Baldwin data on pBR322 and ΦX174.5 It must be noted that the positive deviations of the experimental data from the theoretical trend for n < 500 bp are due to the sequence dependent curvature that invalidates the approximation of ∆E0

We checked the validity of our physical sequence dependent circularization propensity model on a large set of experimental data, obtained by several authors (Shore and Baldwin;5 Levene and Crothers;8 Koo et al.;11 Kahn and Crothers13) and in our laboratory. ∆E° and 〈∆E〉 were evaluated adopting the curvature model we proposed several years ago.14-17 Thus, the local sequence dependent curvature of DNA can be conveniently represented by a complex function of the sequence number

C(s) )

2π 360 ν



d(k) exp(2πisk/ν)

(40)

s-th turn

where C(s) is the curvature in radiant per bp of the double helix in modulus and phase, assigned to the central base pair of the sth turn; d(k) ) (F,-τ) ) F - iτ, the deviation of the kth dinucleotide step from the canonical B-DNA structure in terms of roll (F) and tilt (τ) angles, and ν the helical periodicity of the sth turn as evaluated from the local twist angles, Ω (see Figure 5).

d T A C G

A T G C 8.0, 0.0 5.4, -0.5 6.8, 0.4 2.0, -1.7 5.4, 0.5 -7.3, 0.0 1.0, 1.6 -2.5, 2.7 6.8, -0.4 1.0, -1.6 4.6, 0.0 1.3, -0.6 2.0, 1.7 -2.5, -2.7 1.3, 0.6 -3.7, 0.0

(41)

Further, the high sensitivity of the circularization factor, J, to

Sequence Dependent Circularization of DNAs

J. Phys. Chem., Vol. 100, No. 23, 1996 9973

Figure 5. Orientation parameters of the base pair average plane in a dinucleotide step.

Figure 7. Shore and Baldwin5 J factors against the minimum circularization energy. The straight line with slope equal to 1/RT represents the theoretical trend for n about 250 bp.

Figure 6. Experimental J factors of Shore and Baldwin5 DNA tracts against the chain lengths, as compared with the Boltzmann factors of the minimum circularization energy.

the chain twisting, offered the opportunity to refine our twist matrix (Ω) that slightly changed within 0.6° standard deviation

Ω (deg) T A C G

A 34.1 36.0 33.8 34.5

T 36.0 35.1 33.5 34.5

G 33.8 35.5 34.3 33.0

C 34.5 34.5 33.0 33.3

(42)

Figure 6 illustrates the J factors experimental trend of the DNA tracts against the chain lengths (237-254 bp) investigated by Shore and Baldwin5 in order to determine the dependence on twist. The experimental data are reported in a double plot with the Boltzmann factor corresponding to the ground state circularization energy, ∆E°. It is evident the periodic trend of about 10 bp of the circularization energy that mirrors the J profile. Furthermore, given the low sensitivity of the J factor to the bp number in the range of such DNA length, we plot in Figure 7 the experimental data against the minimum circularization energy ∆E°. According to our model, they appear to fit a straight line with a slope equal to 1/RT. It should be noted that the data dispersion contains also the J dependence on length. Figure 8 shows experimental J data against the ground state circularization energy in the case of a number of DNA tracts, characterized by a narrow range of molecular weights (155160 bp), investigated by Kahn and Crothers.13 An additional experimental point was also included in the data set, obtained by interpolating the J value for a DNA tract of 158 bp from the circularization rate measurements on DNA multimers.11 In agreement with our model, the experimental data fit in the logarithmic scale a straight line with the angular coefficient equal to 1/RT, in a range of six orders of magnitude. Such a line appears to be shifted along the J axis at higher values, with respect to the previous diagram (Figure 7), on account of the shorter DNA length.

Figure 8. Kahn and Crothers13 J factors against the minimum circularization energy. The straight line with slope equal to 1/RT represents the theoretical trend for n about 160 bp.

The Influence of the CAP Protein on the Circularization Reaction The influence of protein induced bending on DNA circularization has been investigated by different authors. Kahn and Crothers13 showed that the formation of the association complex of CAP protein at the corresponding binding domain can occur in the cyclic form of properly phased DNAs, up to 200-fold more tightly than in the corresponding linear form; this ability appears to be thermodynamically correlated to the parallel increase of the circularization factor J. We evaluate the effect of a bend inducing protein on circularization by simply changing the curvature function of the linear DNA in order to take into account the induced bend and then studying the effect of this change on J. Adopting our model of minimum distortion elastic energy, a deformation of 90° was then introduced to the CAP binding domain, as indicated by the X-ray crystal structure of CAPDNA complex (Schultz et al.21). This was done adopting the same mathematical methods used for selecting the minimum energy circularization (see eq 25), by imposing that the zero Fourier term of the curvature function in the binding region be equal to (π/2)/37, 37 being the number of bp of the binding domain. Thus the curvature function of the binding region was changed into

CCAP(s) ) CL0 (s) +

exp(iφ - A (π/2 37 | |) 0 L0

0 ) AL0

0 0 is the phase of A where φAL0 L0 in such DNA tract.

(43)

9974 J. Phys. Chem., Vol. 100, No. 23, 1996

Figure 9. Curvature diagrams (modulus and phase) of one of Kahn and Crothers DNA.13 The linear (s) and the minimum energy circular form (- -) are reported, as well as the corresponding energy values.

De Santis et al.

Figure 11. Effect of the CAP induced curvature on DNA circularization factor J in comparison with the changes of the corresponding circularization energy: the 9 data are the same as in Figure 8 in absence of CAP, whereas the 2 ones are in the presence of the binding protein.

reaction, where CP and LP refer to the circular and the linear DNA form, respectively

CP + L a LP + C (44) Its equilibrium constant corresponds to the ratio J(+)/J(-), where + and - indicate the presence or the absence of CAP. In fact, recalling eq 3 J(+) KC(+) KD(-) KC(+) ) ) J(-) KD(+) KC(-) KC(-)

(45)

assuming the dimerization constant to be independent of the presence of the protein. This is very plausible when the binding domain is far away from the reacting ends. Adopting the same statistical thermodynamic model

[

]

0 0 ELP + EC0 - ECP - EL0 J(+) QCPQL exp ) RT J(-) QLPQC

Figure 10. Curvature diagrams of the same DNA tract of Figure 8 in the presence of CAP induced bending; the corresponding energy values are reported.

The structure of the binding domain so obtained is characterized by the same phase of curvature, as found in the X-ray structure: the CAP protein amplifies the slight curvature already present in the binding site. Figures 9 and 10 illustrate the curvature diagrams of the linear and the minimum energy circular form corresponding to one of the DNA tracts (all including a CAP binding site phased against a sequence-directed curvature), investigated by Kahn and Crothers,13 in the absence and in the presence of CAP induced bending; the corresponding energy values are reported. Taking into account the effect of CAP on DNA curvature the minimum circularization energy was obtained in all DNA samples examined by Kahn and Crothers.13 Figure 11 illustrates the effect of the CAP induced curvature on DNA circularization factor J in comparison with the changes of the corresponding circularization energy. As it can be observed changes in J mirror the variation of the circularization energy, so that both the set of experimental data fit the same straight line, corresponding to our model. The deviation of the two points at high elastic energy can be due to a degree of anharmonicity, which should reduce the circularization energy, or to a lower induced bending at the CAP domain. The circularization equilibrium in presence or absence of a binding protein can be represented by the following exchange

(46)

that is equal to the equilibrium constant of the reaction 44, as suggested before. Alternatively, the same expression can be read as the ratio between the protein binding constants to the cyclic and the linear form, KCb/KLb, as obtained by permuting the energy terms and the partition functions in eq 46. Thus, the form of the J ratio allows us to write the equivalence between the ratio of protein binding constants to the circular and the linear forms and that of the circularization constants in the presence and absence of the protein. Furthermore, recalling eq 39

J(+) KC(+) KCb ) ) ) J(-) KC(-) KLb

( ) ( 〈r2〉-

〈r2〉+

3/2

exp -

)

∆E0(+) - ∆E0(-) (47) RT

where ∆E°(+) and ∆E°(-) represent the minimum circularization energy in the absence and in the presence of the CAP protein. This agrees very satisfactorily with the Crothers’ laboratory results.13 Figure 12 shows the experimental data of both the ratios of binding and circularization constants in presence and absence of CAP, against the theoretical circularization minimum energy changes, due to the protein binding curvature distortion. The straight line represents the above equation. The value of 〈r2〉+ was evaluated according the Kratky-Porod formulation: the CAP bent site was adopted as the origin of both the DNA tracts that move in two different directions. The maximum deviation from 〈r2〉_ occurs when the

Sequence Dependent Circularization of DNAs

J. Phys. Chem., Vol. 100, No. 23, 1996 9975

Figure 12. Kahn and Crothers experimental data13 of both the ratios of circularization and binding constants in presence and absence of CAP, against the theoretical circularization minimum energy changes after the curvature distortion induced by protein binding. The straight line represents the theoretical trend.

binding domain is localized in the middle of the DNA tract. In the case of 90° bend, 〈r2(n)〉 changes in 2〈r2(n/2)〉 for a chain length of n bp. This is clearly without consequence for long chains where 〈r2(n)〉 converges to 2Pln; however, in the case of shorter DNA tracts it reduces the value of 〈r2(n)〉 up to 21/2 times, increasing W(0) of 1.7. This factor becomes equal to 2.8 for a bend inducing protein of 120°. The deviations of the high energy points (the same of Figure 11) are plausibly due to anharmonicity effects as suggested before. The Influence of the Position of the CAP Binding Domain on the Circularization Reaction The influence of the protein binding domain position on the circularization J factor was investigated by Buc and coll. (Kotlarz et al.10). They showed that localized DNA bend, induced by the cyclic AMP receptor protein (when it interacts with its binding site at the lactose control region), appears to influence the circularization propensity for a slight factor (1.3), found with the eccentric CAP site, up to a factor of 4, with the middle site. This was obtained by a cyclic permutation of a DNA tract, 273 bp, containing the CAP site. We can formulate the ratio between the corresponding J factors considering the identity of the ground state circularization energy ∆E°, as

Jm(+) Jp(+)

)

QCm+QLp+ QLp+ QLp+QLm) ) ) QLm+QCp+ QLm+ QLp-QLm+

( ) 〈r2〉p

〈r 〉m 2

3/2

)

Kbp (48) Kbm

p and m indicate the peripheral and the middle binding domain; the cyclic forms are identical and cancel as well as QLm- and QLp- because of the negligible intrinsic curvature of the CAP site in the absence of the binding protein. In this case the ratio of the J factors, that corresponds to the ratio of the circularization constants, becomes practically equal to the equilibrium constant of the exchange reaction

Lm + LpP a LmP + Lp

(49)

where Lp, LpP, and Lm, LmP represent the linear forms of free DNA and the protein complex, with the peripheral and middle binding domain; the cyclic forms are identical and cancel.

Figure 13. Comparison between theoretical and experimental J values for a large number of DNA fragments different for sequence and length (n, ranging among 105 and more than 4300 bp). The experimental data are from the following: b Shore et al.; lightface [ Shore and Baldwin; O Levene and Crothers; lightface 2, Koo et al.; [, and 9, Kahn and Crothers; 1 Our Lab.

Finally the last expression in eq 48 represents the ratio between the binding constants of protein to the two cyclically permuted forms of linear DNA, which should be equal to the ratio of the J factors. This was evaluated, as discussed in the previous section, from the ratio of the end-to-end distances, that are equal to 2 and 3 for a CAP bend of 90° and 120°, respectively, in agreement with the experimental value.10 Conclusions Figure 13 illustrates the direct comparison between theoretical and experimental J values for a large number of DNA fragments, that differ for sequence and length (n, ranging among 105 and more than 4300 bp). Here, not only the dependence from the circularization energy but also the entropy term is evaluated in a J range of six orders of magnitude. Two theoretical values obtained by Levene and Crothers8 using Monte Carlo methods, for K DNA (423 and 240 bp), are also represented as well as the experimental data of pBR322, (4361 bp), and its fragment (2302 bp), and of the already reported (Figure 4) ΦX174 tracts (Shore and Baldwin5). The results are very satisfactory; the only significant deviations from the theoretical predictions are represented by the pair of points corresponding to the Kahn and Crothers CAP-DNA (17A11 and 15A9) already discussed, and the 105 Koo et al.11 multimer, perhaps, because of its too short length. That requires a relatively large deviation from the canonical B-DNA structure and plausibly different bending and twisting elastic constants. Calculations of the J theoretical values reported in the figures of this paper require only few minutes on a personal computer. The model of sequence dependent circularization is now being extended to supercoiling 8-shaped DNA. It can also be adopted to evaluate the sequence dependent energy cost of looping DNA tract also in the presence of CAP and regulatory proteins, repressors, and operators, as in the first step of the transcription mechanism as well as to evaluate “allosteric” effects in protein binding on topologically constrained DNAs, namely cooperativity in protein binding.

9976 J. Phys. Chem., Vol. 100, No. 23, 1996 We are also attempting to extend such methods to the protein folding problem where a hypothetical starting secondary structure is perturbed by long range forces to reach its final globular structure. DNA curvature and its experimental manifestations in the gel-electrophoretic retardation assays or in circularization reactions as well as in association with proteins (histones, CAP) are a complex aspect of DNA structure and dynamics. They require the adoption of reduced models of the physical realty. The physical model we have advanced several years ago14-18 certainly contains approximations and assumptions. Nevertheless, it explains consistently and with surprising accuracy all the experimental data of different physical nature, on the basis of the same local deviation matrix for the 16 different dinucleotide steps (derived initially from the theoretical evaluation of the differential conformational energy, several years ago), and by adopting the first order elasticity to evaluate the energy cost of DNA deformations, in circularization as well as in nucleosome formation. DNA dynamics certainly influence the structure and superstructure of single molecules in the case of the gel-electrophoresis18 as well as in the circularization reactions, but they do not appear to determine the behaviors of their ensemble. Acknowledgment. This research was supported in part by CNR "Progetto Finalizzato Chimica Fine" and by Progetto Strategico “Biologia Stutturale”. References and Notes (1) Jacobson, H.; and Stockmayer, W. H. J. Chem. Phys. 1950, 18, 1600.

De Santis et al. (2) Flory, P. J.; Suter, U. W.; and Mutter, M. J. J. Am. Chem. Soc. 1976, 98, 5733. (3) Olson, W. K. Biopolymers 1979, 18, 1213. (4) Shore, D.; Langowski, J.; and Baldwin R. L. Proc. Natl. Acad. Sci. U.S.A. 1981, 78, 4833. (5) Shore, D.; and Baldwin R. L. J. Mol. Biol. 1983, 170, 957. (6) Shimada, J.; and Yamakawa, H. Macromolecules 1984, 17, 689. (7) Hagerman, P. J. Biopolymers 1985, 24,1881. (8) Levene, S. D.; and Crothers, D. M. J. Mol. Biol. 1986a, 189, 61. (9) Levene, S. D.; and Crothers, D. M. J. Mol. Biol. 1986b, 189, 73. (10) Kotlarz, D.; Fritsch, A.; and Buc, H. EMBO J. 1986, 5, 799. (11) Koo, H. S.; Drak, J.; Rice, J. A.; and Crothers, D. M. Biochemistry 1990, 29, 4227-4234. (12) Lavigne, M.; Herbert, M.; Kolb, A.; and Buc, H. J. Mol. Biol. 1992, 224, 293-306. (13) Kahn, J. D.; and Crothers, D. M. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 6343. (14) De Santis, P.; Morosetti, S.; Palleschi, A.; and Savino, M. In Structure and Dynamics of Nucleic Acids, Proteins And Membranes; Clementi, E., Chin, S., Eds.; Plenum Publishing: New York, 1986; pp 3149. (15) De Santis, P.; Palleschi, A.; Savino, M.; and Scipioni, A. Biochemistry 1990, 29, 9269. (16) Boffelli, D.; De Santis, P.; Palleschi, A.; Scipioni, A.; and Savino, M. Int. J of Quant. Chem. 1992, 42, 1409. (17) De Santis, P.; Fua`, M.; Palleschi, A.; and Savino, M. Biophys. Chem. 1993, 46, 193. (18) De Santis, P.; Fua`, M.; Palleschi, A.; and Savino, M. Biophys. Chem. 1995, 55, 261. (19) Kratky, O.; and Porod, G. Recl. TraV. Chim. Pays-Bas 1949, 68, 1106. (20) Spiegel M. R. Fourier Analysis; McGraw-Hill Book Company: New York, 1965. (21) Schultz, S. C.; Shield, G. C.; and Steitz, T. A. Science 1991, 253, 1001.

JP9526096