Sequence Selection during Copolymerization - The Journal of

Jul 21, 2007 - Jonathan A. D. Wattis*, and Peter V. Coveney*. Theoretical Mechanics, School of Mathematical Sciences, University of Nottingham, Univer...
0 downloads 0 Views 373KB Size
9546

J. Phys. Chem. B 2007, 111, 9546-9562

Sequence Selection during Copolymerization Jonathan A. D. Wattis*,† and Peter V. Coveney*,‡ Theoretical Mechanics, School of Mathematical Sciences, UniVersity of Nottingham, UniVersity Park, Nottingham NG7 2RD, United Kingdom, and Centre for Computational Science, Department of Chemistry, UniVersity College London, Gower Street, London WC1E 6BT, United Kingdom ReceiVed: March 5, 2007; In Final Form: May 3, 2007

We analyze a simple model for the copolymerization of two chemically distinct monomers that displays a wide variety of product sequence compositions depending on the rate coefficients for the stepwise addition of each monomer to a growing polymer chain. We show that in certain regions of parameter space the copolymer sequence composition depends sensitively on these parameter values, and we compute the information content of the resultant sequences. Finally, we show how the model may be used to interpret experimental data on copolymerization reactions.

1. Introduction When two monomers, each of which can individually selfpolymerize, are mixed and react to form a copolymer, what is the resulting copolymer sequence? This is the question we set ourselves to answer in the present paper. Such copolymerization reactions arise inter alia in industrial and biological processes, and the resulting sequences are often critical in determining the properties of the product polymers. While there is significant literature on certain aspects of pure and applied copolymerization,2,13,27 remarkably little insight is available on sequence selection in such systems. Flory14 discusses initiation and termination of the polymerization process. For the main problem concerned with the kinetics at intermediate times, he considers the evolution of a system in which as one monomer is depleted the other dominates. He also discusses the semiempirical Alfrey-Price Q,e scheme for approximating the rates of monomer addition and briefly summarizes models of copolymerization kinetics. Carraher8 reprises the basic kinetics of copolymerization before discussing block copolymers and their physical properties, applications, and commercial uses. Experimentally, the determination of copolymer composition can be achieved via a variety of techniques. UV/vis spectroscopy can be used in situations where the monomers have different characteristic adsorption frequencies. In certain systems transmission electron microscopy (TEM) can be used to assess the periodicity of block copolymers. Campbell et al.7 discuss these techniques together with a detailed account of the use of NMR techniques, which are suited to the investigation of copolymer composition, monomer sequences, configurational sequences, and reaction mechanisms. As well as determining the large-scale ratios of monomers composing copolymers, NMR can be used to determine the detailed sequence structure of copolymers. NMR also permits the determination of stereochemical configurations, allowing meso and racemic (or isotactic, heterotactic, and syndiotactic) sequences to be differentiated. This is possible because while the 1H spectra of homopolymers may be fairly * Authors to whom correspondence should be addressed. E-mail: [email protected]; [email protected]. † University of Nottingham. ‡ University College London.

simple that of the copolymer is more complex due to interactions between the different monomers in the chain’s sequence. One further advantage of NMR is that polymerization rates can be found from spin-spin relaxation times, and hence kinetic properties can be investigated. Nevertheless, the interpretation of NMR data can be complicated due to the effects of uncertain orientation, crystallinity, and/or cross-linking of polymers. Recently, experimental results on systems in which left- and right-handed N-carboxyanhydride (NCA) amino acids copolymerize have been reported by Blocher et al.4 and Hitz et al.16 The copolymerization products show a distinct bias away from the binomial distribution, which one would naively expect if the two monomers had equal attachment probabilities. Although these authors still observe a dominance of polymers containing approximately equal numbers of left- and right-handed monomers, there are considerably more homopolymers (comprised of both pure left- and pure right-handed forms) than are predicted by a straightforward binomial distribution. We shall show that this can be accounted for by a model where the probabilities of attachment depend both on the attaching monomer and on the unit that currently terminates a growing polymer. Our central assumption is that polymers grow by the stepwise addition of a single monomer at a time with rates dependent on the identity of the currently terminating monomer unit within the growing polymer chain and on the nature of the monomer being added. This assumption enables us to exploit basic properties of the Becker-Do¨ring equations.3 Becker-Doring theory describes the general stepwise growth of chains and clusters, being valid for the formation of crystal nuclei as well as polymers.3,20 The latter case is simpler to study as it typically has growth rates that are independent of cluster size (equivalent to polymer length). In systems with continuous input of monomers, instead of the equilibrium state, a steady state may be approached in the large-time limit. Since this would imply a fixed monomer concentration, a common approach is to study models in which the monomer concentration is held at some fixed value; this is often referred to as the “pool chemical approximation”. While the steady-state solution is straightforward to write down for Becker-Do¨ring polymerization systems involving only a single species, the steady state is vastly more complex in the multicomponent model formulated below. To

10.1021/jp071767h CCC: $37.00 © 2007 American Chemical Society Published on Web 07/21/2007

Sequence Selection during Copolymerization model the formation of copolymers of the form ...AABAABBBA..., we propose a model that has two sets of concentration variables: one for the polymers ending in an “A” and a second set for the concentrations of those terminated by a “B”. In addition each set of variables has two subscripts, denoting the total number of A and B units in the chain. Since the equtions for these two concentration distributions are coupled together, the steady state for the system as a whole is extremely complex. The Becker-Do¨ring equations previously have been generalized in various other ways to deal with multicomponent systems. For example, Wu28 determined the nucleation rate of a multicomponent system by proposing a multicomponent Becker-Do¨ring model and analyzing the saddle point of an equilibrium distribution. Carr and Dunwell9 used a multicomponent Becker-Do¨ring model to analyze cell surface capping. In previous papers, we have used generalized multicomponent Becker-Do¨ring systems of equations to describe the formation of vesicles in systems that exhibit an induction time and in systems that display a curious size-templating phenomenon5,10,11 and cluster growth in which shape changes occur simultaneously with aggregation.22 In all of these cases a scalar system of ordinary differential equations for the distribution of clusters as a function of size and composition is produced; in this paper we obtain a more complicated vector system in which we have to solve for two such distributions simultaneously. One of the purposes of analyzing the model proposed here is to show that copolymerization can give rise to long chains and, more importantly, that chains with a possibly complex internal sequence structure can be formed, namely, a structure of the type which could be used to store information as is seen in RNA and DNA. However, as we have described in other papers,23-25 the passing on of such information requires some form of feedback mechanism. This could be by way of an autocatalytic process, as in our model for the emergence of an “RNA world”,23 or in a more indirect way in which products (long chains) influence the production of more basic species (e.g., monomers), as observed in Sandars’ model of chiral polymerization.19,24 The model proposed here has no such feedback mechanisms, and so the sequence of monomer units comprising any polymer depends only on the attachment rate constants and on the concentrations of monomers. In the present paper, we analyze a system in which the rates are small perturbations away from a model system in which both monomers have the same rate of attachment as well as systems in which the rate constants are strongly biased in favor of producing either homopolymers or heteropolymers with alternating sequences of monomers. Some of our mathematical methods of solution are based on a simple generalization of the asymptotic methods used in our earlier papers on the Becker-Do¨ring system17,26 to determine the evolution of clusters of large sizes at large times. We also use exact methods to determine the time evolution of quantities of macroscopic importance, such as the total number of polymer chains present and the average length of the chains. Although such exact methods lead to complicated expressions, largetime asymptotic approximations provide useful descriptions of behavior. In section 2 the copolymerization model is formulated, and the kinetic behavior of some macroscopic quantities is derived for general systems. The detailed mathematical analysis leading to these results is summarized in Appendix A.

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9547 There then follow three sections in which specific systems are studied in more detail. The full mathematical analysis is reserved for Appendix B, and the results for three example systems are illustrated in sections 3, 4, and 5. In section 3 we examine a system in which the rate of attachment of both A and B monomers is the same regardless of whether the chain currently ends in an A or a B unit. We give details of the kinetics of the macroscopic quantities such as average chain length, number of chains, and mass of chains as well as describing the steady-state solution. Similar details are given for the heteropolymer-dominated and homopolymer-dominated systems in sections 4 and 5. The latter is of particular interest, since homopolymers only predominate at shorter lengths, while longer polymers are dominated by heteropolymers, albeit with chains that include long homopolymer subsequences. The analysis of the expected lengths of homopolymer subsequences and information content for the chains is given in section 6. Section 7 illustrates an application of our theory to experimental results on the polymerization of amino acids, which occur in left- and right-handed monomeric forms, and so upon polymerization form a mixture of homochiral and heterochiral chains. Using liquid chromatography/mass spectrometry, Luisi’s group has managed to obtain the relative abundances of each type of polymer,4,16 finding abnormally high concentrations of homopolymers. While they argue that this is due to a high-order kinetic Markov process, we show in this paper that the results can be explained by a model that uses only first-order Markov kinetics. The conclusions from this study are presented in section 8. 2. Copolymerization Model We model the formation of copolymers via the following four reactions ...XXXA + A f ...XXXAA with rate coefficient ) kAA ...XXXA + B f ...XXXAB with rate coefficient ) kAB ...XXXB + A f ...XXXBA with rate coefficient ) kBA ...XXXB + B f ...XXXBB with rate coefficient ) kBB (2.1)

where X represents either an A or a B monomeric unit. This is the starting point for Flory’s analysis of copolymerization14 (eq 1 of chapter 5). We assume that polymerization occurs by only one monomeric unit being added at a time (stepwise growth), the key physically realistic assumption that enables us to use Becker-Do¨ring equations to model the process. We also assume that the rate of addition of monomers depends only on the concentration of the monomer being added and the final monomer on the chain; that is, there are no longer-range interactions involving monomer units further down the chain. To model the full system in all its complexity we would need γ to introduce concentration variables cr,s , where γ is a string of A’s and B’s encoding the exact order in which monomers have been added to make the polymer. Such a problem is NP (nonpolynomial) hard.15 Therefore, instead of the full problem, we shall analyze a simplification. We shall classify a polymer chain by just three quantities: the number of A monomers of which it is composed, which we denote by a subscript r; the number of B monomers it contains, denoted by the subscript s; the final monomer in the chain, which we shall denote with a superscript. Thus we shall denote the copolymer chains by A ∞,∞ B ∞,∞ {Cr,s }r)1,s)0 and {Cr,s }r)0,s)1 and their concentrations by lowerA ∞,∞ B ∞,∞ and {cr,s (t)}r)0,s)1 , respectively. case variables {cr,s(t)}r)1,s)0

9548 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney B A cq,0 ) 0 and c0,q ) 0 for q g 2

(2.12)

Then eq 2.6 is valid for all (r,s) except (1,0) and (0,0), and eq 2.7 is valid for all (r,s) except (0,1) and (0,0). There are two possibilities for the monomer concentration; first, assuming the pool chemical approximation, we will consider constant monomer concentrations cA1,0 and cB0,1 independent of time; alternatively, we could consider a constant density condition for each component, namely, that Figure 1. Illustration of (r,s)-space showing domains of definition of A B the concentrations cr,s and cr,s .

Note that in this scheme A ) CA1,0 and B ) CB0,1; pure A B and C0,s ; all copolymer homopolymers are denoted by Cr,0 A B chains Cr,s or Cr,s with r, s g 1 are heteropolymers. Note also B A that chains of the form cr,0 and c0,s are forbidden. Thus we have the following reactions to consider A A Cr,s + A f Cr+1,s with rate coefficient kAA

(2.2)

A B + B f Cr,s+1 with rate coefficient kAB Cr,s

(2.3)

B A + A f Cr+1,s with rate coefficient kBA Cr,s

(2.4)

B B + B f Cr,s+1 with rate coefficient kBB Cr,s

(2.5)

Instead of a simple one-dimensional distribution, we now have to picture distributions in three dimensions, the axes being r, s, and concentration. Figure 1 illustrates the areas in (r,s) parameter A B space where cr,s and cr,s are defined. The monomer reactivity ratios are defined by rA ) kAA/kAB and rB ) kBB/kBA. Flory14 calculates how the evolution of copolymer composition alters with rA, rB, and fA ) cA1,0/(cA1,0 + cB0,1), noting that all systems that had been subject to experimental determination of rates showed rArB e 1. The case rArB ) 1 is classified as “ideal” and leads to the sequence of monomers in the resultant polymers being random. For a system in which the monomers are not replenished, it is necessary to trace the ratio of monomers in the unreacted part of the system, that is, fA, if one is to predict the composition of copolymers. In the pure irreversible polymerization case (that is, neglecting any possible breakdown of the growing chains) the kinetic equations are A A A A B c˘ r,s ) kAAcr-1,s cA1,0 - kAAcr,s c1,0 + kBAcr-1,s cA1,0 A B c0,1 r g 2, s g 1 (2.6) kABcr,s B B B B A ) kBBcr,s-1 cB0,1 - kBBcr,s c0,1 + kABcr,s-1 cB0,1 c˘ r,s B A kBAcr,s c1,0 r g 1, s g 2 (2.7) A A A B A A B ) -kAAc1,s c1,0 + kBAc0,s c1,0 - kABc1,s c0,1 s g 1 c˘ 1,s

(2.8)

B B B A B B A ) -kBBcr,1 c0,1 + kABcr,0 c0,1 - kBAcr,1 c1,0 r g 1 c˘ r,1

(2.9)

A A A A A B ) kAAcr-1,0 cA1,0 - kAAcr,0 c1,0 - kABcr,0 c0,1 r g 2 (2.10) c˘ r,0 B B B A B B ) kBBc0,s-1 cB0,1 - kBAc0,s c1,0 - kBBc0,s c0,1 s g 2 c˘ 0,s

(2.11)

together with equations for the monomer concentrations. The last four equations can be ignored if one imposes the conditions

FA ) FB )









A B r(cr,s (t) + cr,s (t)) ∑ ∑ r)1 s)0 A B s(cr,s (t) + cr,s (t)) ∑ ∑ r)0 s)1

(2.13)

are independent of time. Since the only process occurring is irreversible polymerization, the mass will increase without bound in the former case, whereas in the latter (eq 2.13), an equilibrium configuration will be reached as all of the monomer is converted into polymeric form and the monomer concentrations drop to zero. In this case the equilibrium solution is degenerate in that there are infinitely many equilibria, and the one approached will depend upon the initial conditions of the system. For the remainder of this paper we make use of the pool chemical approximation; that is, we study the constant monomer formulation, where we assume that mass is continuously inserted into the system as polymers form so as to keep the concentrations cA1,0 and cB0,1 constant. The total number of polymers and total mass of the system thus increase without bound. 2.1. Temporal Behavior. Here we describe the evolution of macroscopically observable quantities, such as the total number of polymers, the mass of polymers, and average chain length. Since the model is based on two distributions, one for polymers A B ) and the other for those terminated by B (cr,s ), ending in A (cr,s we define corresponding quantities for the total number of polymers ∞







NA(t) )

∑ ∑ cr,sA (t) r)1 s)0

NB(t) )

∑ ∑ cr,sB (t) r)0 s)1

N(t) ) NA(t) + NB(t)

(2.14)

the total (mass-weighted) concentration of polymeric material being given by eq 2.13 together with F(t) ) FA(t) + FB(t). This definition of F(t) assumes that the masses of the A and B monomers are equal. In Appendix A a more detailed analysis is given that is applicable to cases where the monomers have differing masses. Appendix B describes the process by which the definitions of macroscopic quantities in terms of microscopic variables (eqs 2.13 and 2.14) and the kinetic equations for the evolution of the microscopic variables (eqs 2.6-2.11) are combined to form a solvable system of equations for the macroscopic variables. Appendix B also describes the methods of solution of the kinetic equations. The resulting kinetic equations for the macroscopic quantities NA and NB are

Sequence Selection during Copolymerization

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9549

N˙ A ) - γNA + δNB + (R + γ)cA1,0 N˙ B ) γNA - δNB + (β + δ)cB0,1

(2.15)

where, to simplify the analysis of these equations, we define the following parameters

R ) kAAcA1,0

β ) kBBcB0,1

γ ) kABcB0,1

δ ) kBAcA1,0

(2.16)

β β+δ

(2.17)

Rˆ )

R R+γ

βˆ )

Since we are analyzing the system using the pool chemical approximation, the concentrations cA1,0 and cB0,1 are constants. The system (eq 2.15) is solved by noting that the total number of chains N(t) ) NA(t) + NB(t) grows linearly with time so that

N(t) ) N(0) + [(R + γ)cA1,0 + (β + δ)cB0,1]t

(2.18)

Full details of the solution are given in Appendix A. To summarize, in general, at large times we find linear growth in the number of chains according to the asymptotic formulas

NA(t) ≈ NB(t) ≈

δ[(R + γ)cA1,0 + (β + δ)cB0,1]t (γ + δ) γ[(R + γ)cA1,0 + (β + δ)cB0,1]t (γ + δ)

(2.19)

Note that the quantity NB/NA ) γ/δ is independent of time and so the numbers of the two types of chains grow while maintaining the same constant of proportionality. The masses in the system (eq 2.13) grow quadratically with time, the large-time asymptotic rate being given by

FA ≈ FB ≈

δ(R + γ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2 2(γ + δ) γ(β + δ)[(R +

γ)cA1,0

+ (β +

δ)cB0,1]t2

2(γ + δ)

(2.20)

Note that FB/FA ) (1 - Rˆ )/(1 - βˆ ) is independent of t. The average chain length can be determined from the quantities N, FA, and FB. Since the total mass of polymer chains is given by FA + FB, the average length, L, is given by

L)

FA + FB [δ(R + γ) + γ(β + δ)]t ) N A + NB 2(γ + δ)

states whose shape is described by the complementary error function (erfc). Thus at large times, the total number of clusters t t 1 ) t, and the total mass of clusters F ≈ ∑r)1 r≈ N ≈ ∑r)1 1/ t2. Hence the mass grows quadratically in time while the 2 number of polymers grows linearly, and this is entirely consistent with the individual concentration variables approaching a steady state. In the next few sections we illustrate the form of the steady-state solution for some archetypal situations. The general form of the steady-state solution is given in Appendix B (along with solutions for the dynamics away from steady state). 3. Random Polymerization In this section we study the situation in which the rates at which monomers attach to polymers depend neither on the terminating unit of the polymer nor on the nature of the monomer being attached. Hence we use the term “random polymerization”. In later sections we will analyze the two cases where the terminating polymeric residue displays a strong preference to attach either to a different monomer or to a monomer of the same species, leading to polymers with sequences that are predominantly ...ABABAB... or ...AAAAA..., respectively. In terms of the rates (eq 2.16), the random polymerization scheme corresponds to R ) β ) γ ) δ; hence Rˆ ) βˆ ) 1/2 (eq 2.17). From eqs 2.19 and 2.20, the large-time asymptotes of the macroscopic quantities are given by NA ) NB ) t and FA ) FB ) (cA1,0 + cB0,1)t2. The average polymer length, L ) F/N, is thus given by L ) t. The microscopic quantities, namely, the individual concentration variables, are obtained from eqs B23, B24, and B31

[

r-2 1 1 A cr,s ≈ cB0,1dr-2,s-1 + cA1,0 2p+2-rdp,s-1 + 4 4 p)0 1 B s-2 q+2-s 2 dr-2,q (3.1) c0,1 2 q)0

]



[

1 1 1 A r-2 p+2-r B ≈ cA0,1dr-1,s-2 + 2 dp,s-2 + cr,s c1,0 4 42 p)0



s-2

cB0,1

2q+2-sdr-1,q ∑ q)0

]

(3.2)

where dp,q is defined by eq B22. In this case, since Rˆ + βˆ ) 1, dp,q takes on a particularly simple form, namely, dp,q ) 2-p-q(p + q)!/p!q!. This enables eqs 3.1 and 3.2 to be simplified to

(2.21)

Thus the average length of the polymer chains grows linearly in time as does the total number of chains. Since the number of polymers is growing and the mass of polymers is growing faster, it is tempting to assume that the A (t) and c B (t) are individual concentrations of each species cr,s r,s also increasing without bound. However, the individual concentrations approach a steady state. This can be seen from a simple one-component Becker-Do¨ring system, c˘ r ) c1(cr-1 cr) with c1 ) 1 as the asymptotic solution and cr ) 1/2 erfc((r - t)/x2t).26 This solution has three regions: For r/t < 1, the concentrations are at the steady state given by cr ≈ 1; for r/t > 1, the concentrations are basically zero; in the narrow region r - t ) O(xt), there is a smooth transition between these two



A cr,s

)

B cr,s )

2r(rcA1,0 + scB0,1) (r + s)2 2s(rcA1,0 + scB0,1) (r + s)2

( (

exp -

(r - s)2 2(r + s)

exp -

(r - s)2 2(r + s)

) )

(3.3)

A B Illustrative examples of distributions cr,s and cr,s are given in Figure 2. Note that the system is dominated by heteropolymers but that the distribution is widely spread around the line r ) s.

4. Heteropolymer-Dominated System In this case, the system’s rate parameters are such that R ≈ β , γ ≈ δ; that is, a B monomer is much more likely to attach to a polymer chain ending in A than a polymer that terminates

9550 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney

A B Figure 2. Top panels: Illustrations of cr,s (left) and cr,s (right) distributions, plotted against aggregation numbers r,s in the case of random A B 1 polymerization, that is, where R ) β ) γ ) δ ) /2; thus also Rˆ ) βˆ ) 1/2. Bottom panels: Similar plots of log(cr,s ) (left) and log(cr,s ) (right).

with B. Thus we expect to find long ABABAB sequences and very few instances of AA or BB. 4.1. Temporal Behavior. The large-time asymptotes for the total number of polymer chains ending in an A or a B reduce to

NA(t) ∼

δ(γcA1,0 + δcB0,1)t (γ + δ) γ(γcA1,0 + δcB0,1)t

NB(t) ∼

(γ + δ)

(4.1)

when the approximations of this section (namely R,β , γ,δ) are applied to the equations of subsection 2.1. The corresponding results for the mass of material in polymeric form are

FA ≈ FB ≈

γδ(γcA1,0 + δcB0,1)t2 2(γ + δ)

(4.2)

in the limit R,β f 0. Together, these results imply that the average chain length grows linearly in time and is given by

L≈

γδt γ+δ

(4.3)

If the monomers are present in equal concentrations, that is cA1,0 ) cB0,1 ) c1, then these results simplify further to NA ≈ δc1t, NB ≈ γc1t, and FA ≈ FB ≈ γδc1t2/2. 4.2. Microscopic Steady-State Solution. From subsection B4 and eqs B23 and B24 of Appendix B we have

the distributions have the same qualitative shape as for the randomly polymerizing system of section 3, the distribution is much more sharply localized along the line r ) s. 5. Homopolymer-Dominated System In homopolymer-dominated systems, the rate parameters are such that R ≈ β . γ ≈ δ; that is, a B monomer is much more likely to attach to a polymer terminating in another B than one that terminates with an A. Thus we expect to see long sequences of A’s and B’s but only rarely to observe an AB or BA step. Shorter polymers will thus be dominated by pure homopolymer sequences. However, as polymer length increases, so does the likelihood of finding one or a few such changes of this type. Thus, for long polymers, we expect that the distribution of polymers will make a transition away from being dominated by pure homopolymers to a situation dominated by mixed polymers. 5.1. Temporal Behavior. While eqs A1, A2, A5, A6, 2.19, and 2.20 are valid for γ,δ > 0, in the case R,β . γ,δ > 0, there is a simpler asymptotic solution that is valid at intermediate times. In this limit, the eq 2.15 reduces to N˙ A ) RcA1,0 and N˙ B ) βcB0,1, with solutions

NA(t) ) NA(0) + RcA1,0t NB(t) ) NB(0) + βcB0,1t (5.1) The mass is then determined by F˘A ) RNA + RcA1,0, F˘B ) βNB + βcB1,0, implying

1 FA ) FA(0) + R(cA1,0 + NA(0))t + R2cA1,0t2 2

A ≈ cr,r

kBAcA1,0 δ B c0,1 ) γ kAB

1 FB ) FB(0) + R(cB0,1 + NB(0))t + β2cB0,1t2 2

B cs,s ≈

kABcB0,1 γ A c1,0 ) δ kBA

Assuming γ,δ > 0 these expressions are valid for times less than O(1/(γ + δ)), over which the average chain length is given by

A B cr+1,r ≈ cA1,0 cs,s+1 ≈ cB0,1

(4.4)

Example distributions are given in Figure 3. Note that, although

FA + FB (R2cA1,0 + β2cB0,1)t ≈ L) N A + NB 2(RcA + βcB ) 1,0

0,1

(5.2)

(5.3)

Sequence Selection during Copolymerization

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9551

A B Figure 3. Top panels: Illustrations of cr,s (left) and cr,s (right) distributions plotted against r,s in the heteropolymer-dominated case where R ) β A B ) 0.1 and γ ) δ ) 1. Bottom panels: Plots of log(cr,s ) (left) and log(cr,s ) (right). Note the vertical scales here are different from those in Figure 2.

Over longer time scales alternative formulas described in Appendix A, namely, eqs A1 and A2, should be used. When t . 1/(γ + δ), eqs A1 and A2 simplify to

δ(RcA1,0 + βcB0,1)t NA ≈ γ+δ

FB ≈

(5.4) A ) cr,s

δR(RcA1,0 + βcB0,1)t2 2(γ + δ) γβ(RcA1,0 + βcB0,1)t2

(5.5)

2(γ + δ)

hence L is given by

L)

A cr,s ≈

FA + FB (Rδ + βγ)t ≈ N A + NB 2(γ + δ)

(5.6)

Thus we observe that in both temporal regimes NA, NB, and L all grow linearly, and FA and FB grow quadratically in time, though with different rates when (γ + δ)t is small or large. The transition region occurs when t ≈ 1/(γ + δ), and the kinetics in this region are governed by the more complex expressions or eqs A1 and A2 of Appendix A. 5.2. Steady-State Solution. We consider the limit in which A-A and B-B growth rates are significantly faster than the A-B and B-A rates of formation. In this case we assume R ≈ β ≈ O(1) , γ ≈ δ ≈  so that Rˆ ,βˆ ) 1 + O(). For polymers of length O(1), we expect the system to be dominated by pure polymers with concentrations given by

( (

δcB0,1 γcA0,1 B ≈ + O(2) cr,s + O(2) R+γ β+δ

(5.8)

and

γ(RcA1,0 + βcB0,1)t NB ≈ γ+δ FA ≈

For other concentrations, we have

) )

A ) Rˆ r-1cA1,0 ≈ cA1,0 1 cr,0

γ(r - 1) R+γ

B c0,s ) βˆ s-1cB0,1 ≈ cB0,1 1 -

δ(s - 1) β+δ

B ) cr,s

δcB0,1 [1 + (r - 3)R + (s - 1)β] + R+γ Rβ(r - 2)cA1,0 (5.9) γcA1,0 [1 + (r - 1)R + (s - 3)β] + β+δ Rβ(s - 2)cB0,1 (5.10)

Example distributions are given in Figure 4. At shorter polymer lengths, the distribution is dominated A by homopolymers; that is, the cr,s distribution is dominated by A B B A B and cr,s in cr,0 and cr,s by c0,s. This is evident in the plots of cr,s the top part of Figure 4. As longer polymers are formed, it becomes increasingly likely that a heteropolymer step will be incorporated into the sequence of monomer units. Hence at A B and cr,s larger r,s, the distributions become dominated by cr,s with both r and s nonzero. However, since the polymers so formed have extremely long homopolymeric subsequences, the distributions have a large variance. This effect is more evident A B in the plots of log(cr,s ) and log(cr,s ) shown in the lower part of Figure 4. 6. Average Run Length and Information Content

(5.7)

To determine the average length of a sequence of identical units in a given polymer chain as well as other quantities that relate to the information stored in the chain, we define the probabilities pAA, pAB, pBA, and pBB, pXY being the probability

9552 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney

A B Figure 4. Top panels: Illustrations of the cr,s (left) and cr,s (right) distributions plotted against r and s in the homopolymer-dominated case where A B A R ) β ) 1 and γ ) δ ) 1/4. Bottom panels: Plots of log(cr,s ) and log(cr,s ) against r,s. In the top left, the dominance of homopolymers cr,0 is evident B in the distribution of shorter chains, similarly for c0,s in the top right. The lower graphs show the dominance of heteropolymers in longer chains.

of occurrence of an X monomer followed by a Y. Thus we have

pAA )

R ) Rˆ R+γ

pBA )

δ ) 1 - βˆ β+δ

pAB )

γ ) 1 - Rˆ R+γ

(6.1)

β ) βˆ β+δ

(6.2)

pBB )

Note that pAA + pAB ) 1 ) pBB + pBA. We use the variable A to refer to the length of an A sequence, and B for the length of a B sequence. We find that the probabilities of finding a sequence of precisely n repeating A units or n B units are n-1 P[A ) n] ) pBApn-1 AA pAB P[B ) n] ) pABpBB pBA

E[A] )

∑n

)

pBApn-1 AA pAB

1 1 - pAA

)

1 1 - Rˆ

E[B] )

∑n

pABpn-1 BB pBA

)

1 1 - pBB

)

1 1 - βˆ

∑n n2pBApn-1 AA pAB ∑n

- E[A]2 )

pBApn-1 AA pAB

pAA

γ

V[B] )

∑n n2pABpn-1 BB pBA ∑n

pABpn-1 BB pBA

- E[B]2 )

γ

)

1+

β δ

(6.5)

These quantities give an indication of the typical lengths of chain to be expected in the system. This calculation ignores end effects, which are negligible in extremely long chains; formally

R γ

(6.6)

)

(1 - pBB)2

δ (6.4)

( ) 1+

pBB

β

R

)

(1 - pAA)2 R

)

1+

∑n npBApn-1 BB pBA

V[A] )

(6.3)

The expected lengths of such homopolymeric sequences are then defined by

∑n npBApn-1 AA pAB

in the homopolymer-dominated case, the total chain length should exceed O(R/γ,β/δ) before such an assumption is valid. Note that our model allows the lengths of A tracts and those of B tracts to differ, since they are determined by completely independent parameters. The variances of the length distributions are also calculable

( ) 1+

β

δ

(6.7)

These quantities show how likely it is to observe chains far from the expected lengths (eqs 6.4 and 6.5). A large variance indicates that a wide distribution of chain lengths will be observed, while a small variance indicates a close to monodisperse distribution. The standard deviations are defined by σA2 ) V[A] and σB2 ) V[B]. 6.1. Run Length. The run length is the average number of sequences of either type per 100 monomer units. To determine this we follow Allcock and Lampe2 and define S(t) as the number of sequences in the system. This increases whenever a homopolymeric subsequence is completed, that is, whenever an A-B or a B-A attachment is made; thus its rate of change is given by

S˙ ) γNA + δNB

(6.8)

Sequence Selection during Copolymerization

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9553

We also define the flux of monomers into the system (J), which is the total rate of monomer addition to chains

J ) (R + γ)NA + (β + δ)NB

n

pi log2 pi ∑ i)1

(6.10)

and pi is the probability of event Ei. (Note that the logarithm is taken to the base two, since in most applications the events are encoded as zeros or ones.) In our case, to make a simple application of this theory we assume that the information is stored in the sequence of monomers, that is, whether an A is followed by another A or a B rather than in the copolymer blocks themselves (that is, in whether any given monomer in the chain is an A or a B). So our system has four basic units of information, namely, the AA, AB, BA, and BB bonds. We use conditional probabilities to determine the probability associated with each event. The probability of finding an AB bond is given by the product of the probability of finding an A monomer and the probability of it being followed by a B. Thus

P[AA] ) P[AA|A]P[A] P[AB] ) P[AB|A]P[A]

(6.11)

P[BA] ) P[BA|B]P[B] P[BB] ) P[BB|B]P[B]

(6.12)

The conditional probabilities P[XY|X] refer to the probability of a monomer X being followed by Y; these quantities are determined by the relative rates of the processes by which A monomers are added to sequences that end in an A or B, etc. Since

P[AA|A] + P[AB|A] ) 1 ) P[BB|B] + P[BA|B]

(6.13)

and the rates for AA, AB, BA, and BB formation are R, γ, δ, and β respectively, we have

P[AA|A] ) Rˆ )

R P[AB|A] ) R+γ 1 - Rˆ )

P[BA|B] ) 1 - βˆ )

γ (6.14) R+γ

β δ P[BB|B] ) βˆ ) (6.15) β+δ β+δ

Finally, we need the probability of finding an A or B in a chain. This is given by the relative numbers of each type of monomer in the system; hence

P[A] )

FB γ(β + δ) ) ) FA + FB δ(R + γ) + γ(β + δ) 1 - Rˆ (6.17) 2 - Rˆ - βˆ

(6.9)

The quantity J/S˙ is then the average sequence length of either type of chain, and so 100S˙ /J is the run length. 6.2. Shannon Entropy. If we are interested in the ability of copolymers to serve as a primordial genetic code, then it is useful to have a measure of the information contained within a polymeric sequence. Here, we shall define “information” as “negative entropy”, specifically the Shannon entropy, H.12,18 This is a measure of the uncertainty or entropy associated with the sample space of a sequence. Given a sequence X composed n of a sequence of n events {Ei}i)1 , the Shannon entropy is defined by H, where

H(X) ) -

P[B] )

FA δ(R + γ) ) ) FA + FB δ(R + γ) + δ(β + δ) 1 - βˆ (6.16) 2 - Rˆ - βˆ

Putting all of these probabilities together using eq 6.10 we find

H)

-1 (Rδ log2(Rδ) + βγ log2(βγ) + σ 2γδ log2(γδ) - σ log2σ) (6.18)

where σ ) Rδ + βγ + 2γδ ) (R + γ)(β + δ)(2 - Rˆ - βˆ ). 6.3. Randomly Polymerizing System. In this case we have, for some constant k, R ) k(1 + R), β ) k(1 + β), γ ) k(1 + γ), δ ) k(1 + δ), with X , 1 for all X, so that Rˆ ) 1/2(1 + 1/  - 1/  ) and β ˆ ) 1/2(1 + 1/2β - 1/2δ). The quantities A 2 R 2 γ and B describe the number of consecutive A and B units in a chain. We find

E[A] ≈ 2 + (R - γ) E[B] ≈ 2 + (β - δ)

(6.19)

V[A] ≈ 2 + 3(R - γ) V[B] ≈ 2 + 3(β - δ)

(6.20)

so the average run length is approximately two, with the perturbations R and γ affecting the average length of A tracts and β and δ affecting the length of B tracts. These quantities also affect the variance of the sequence length distributions; in this case the standard deviations (which are approximately x2) are large in comparison with the mean, and so we expect a much broader distribution of sequence lengths. The run length is calculated from 100S˙ /J, which is

1 50 1 + (γ + δ - R - β) 4

(

)

(6.21)

It is thus affected by all perturbations, positive R and β reducing the run number and positive γ and δ increasing it as one would expect. In this case, applying eq 6.18, we find σ ) 4 and H ) 2. Any deviation of the rate parameters R, β, γ, and δ from unity causes a reduction in information content, according to the relationship

(

)

2(R - β - γ + δ)2 + (R + β - γ - δ)2 + O(3) (6.22) H)232 loge 2 Note that the correction to H ) 2 is O(2) and not O(); that is, small variations in the rate constants only make very small alterations to the information content of the resulting sequences. Even at O(2) only certain perturbations to the rate constants alter the Shannon entropy; for example, if R ) γ and β ) δ, then corrections to H are even smaller. We will comment further on the information content in subsection 6.6, when similar calculations have been presented for the other systems studied and the results can be compared. 6.4. Heteropolymer-Dominated System. In this case, we assume that the system has a strong preference for alternating sequences; that is, B attaches to A and A to B with a much higher probability than A to A or B to B; mathematically, this corresponds to the rate parameters satisfying γ,δ . R,β so that Rˆ ,βˆ , 1. Equations 6.4 and 6.5 show that the expected run lengths for uninterrupted sequences of A (and B), E[A] (and E[B]), are both very slightly above unity, since in this case R,β , γ,δ. The variances are V[A] ≈ R/γ and V[B] ≈ β/δ

9554 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney

with corresponding standard deviations σA ≈ xR/γ and σB ≈ xβ/δ, so there is no significant variation in chain lengthssall are tightly clustered around unity, as one would expect in a system in which polymers of the form ...ABABABA... dominate. We thus expect the run length to be approximately 100. We use eq 4.1 to obtain

R 2γδ β S˙ ≈1≈ J (R + γ)δ + γ(β + δ) 2γ 2δ

(6.23)

and thus we have the first correction term to the run number (given by 100S˙ /J). Calculating the Shannon entropy in this case, we find H ) 1 from eq 6.18, since σ ≈ 2γδ. We will comment further on this result in section 6.6. 6.5. Homopolymer-Dominated System. In this case the monomers have a strong self-affinity, so the formation of bonds between similar monomeric units dominates that of differing units; hence polymeric strands are primarily formed through the creation of A-A and B-B bonds with very few A-B or B-A bonds being formed. Mathematically, this corresponds to the parameters satisfying R,β . γ,δ so that both Rˆ and βˆ are close to, but just below, unity; we write

Rˆ ) 1 - R˜ βˆ ) 1 - β˜ with R˜ ,β˜ , 1

(6.24)

The quantities A and B describe the run lengths of A and B monomers, respectively, as defined by eq 6.3. Ignoring end effects arising in short chains, we find

E[A] ) 1 +

β R . 1 E[B] ) 1 + . 1 γ δ

(6.25)

Here we are assuming that the length of chains is typically much greater than R/γ and β/δ (otherwise end effects will be significant). The variances V[A] and V[B] are also large, giving standard deviations σA ≈ R/γ and σB ≈ β/δ. There are thus significant variations about the mean and substantial degrees of polydispersity. Calculating S˙ and J we find

S˙ ≈ (RγcA1,0 + βδcB0,1)t J ≈ [(R + γ)RcA1,0 +(β + δ)βcB0,1]t (6.26) at larger times; thus the run length can be approximated by A B 100S˙ Rγc1,0 + βδc0,1 ≈ 2 A ,1 J R c1,0 + β2cB0,1

(6.27)

A small run length is to be expected in a system that is dominated by the homopolymer reaction mechanism. Such a system has a low probability of forming an A-B or a B-A step, so homopolymeric subsequences are long, and in a sequence of 100 units, there will be very few such homopolymeric subsequences. In homopolymer-dominated systems, using eq 6.24, the Shannon entropy H simplifies to

H)

-1 (R˜ log2 R˜ + β˜ log2 β˜ R˜ + β˜ (R˜ + β˜ ) log2(R˜ + β˜ )) (6.28)

Thus we observe that H depends quite delicately on the sizes of the smaller quantities R˜ ≈ γ/R and β˜ ) δ/β. The Shannon entropy has a maximum value of unity when R˜ ) β˜ , and a minimum value of zero along the lines R˜ ) 0 and β˜ ) 0.

6.6. Entropy and Information. While the interpretation of run length is intuitively clear, the meaning of informational entropy requires further discussion. For the randomly polymerizing system we found H ) 2, for the system dominated by heteropolymers H ) 1, and for the homopolymeric system, the more complex result eq 6.28, which satisfies H e 1. Thus the randomly polymerizing system has the potential of containing twice as much information as either of the other two systems. In general the homopolymeric system carries the least information; however, if carefully balanced, the homopolymeric system can carry as much information as the heteropolymeric system, but if the two components have differing polymeric growth rates, the system’s capacity for information storage is vastly reduced. 7. Comparison with Experimental Data The theory derived above applies to general copolymerizing systems composed of two types of monomeric units. It is straightforward to see that the theory could be generalized to systems involving more types of monomeric units, and while the complexity of the system increases, no new ideas are required in the construction and derivation of the governing kinetic equations. We have introduced all of the techniques required for the progression from single-component, classical, Becker-Do¨ring theory to multicomponent copolymerization. There are a number of interesting experimental systems in which two monomers interact. In generic systems, where the A and B monomers are quite different, there is no a priori reason for any constraints on the rates of addition R, β, γ, and δ. However, one of the more intriguing examples in which surprising results occur is chiral polymerization where the monomeric units are identical except for their chiral structure. This difference influences the growth of polymeric chains. A variety of copolymeric chains grow with chains typically comprising a mixture of right-handed (D) and left-handed (L) monomers. Defining the left-handed monomer as A ) L and the right-handed as B (or R or D), we then expect the A-A interactions to be identical to the B-B interactions and the A-B monomer addition rate to be the same as the B-A rate. Hence we expect R ) β and γ ) δ, though γ * R is possible and in general to be expected. This restriction on the expected rates actually simplifies the model and makes it easier to fit to experimental data since there are fewer parameters that one can vary to obtain the best fit. We assume that the polymerization process is a first-order Markov process; that is, the rates are independent of the sequence structure further along the existing polymer chain. Luisi’s group has reported on a system involving a racemic mixture of amino acids, namely, penta-deuterated NCA-L Trp and normal NCA-D Trp (the L and D tags refer to left- and righthanded forms of the monomer, respectively); this combination enables the various copolymers to be distinguished according to the amount of L monomer units in each copolymer. Typically, copolymer lengths, N, are between 3 and 10, though polymers of lengths up to 29 are also observed. Some of the experimental systems include an added lipid, 1-palmitoyl-2-oleoyl-sn-glycero3-phosphorylcholine (POPC), as well as the amino acid Trp. The effect of this is to favor the formation of n-mer with specific n values. Using liquid chromatography/mass spectrometry ( selected ion monitoring method) they obtained plots of the relative frequency of polymers of the form DiLN-i against i for i ) 0, 1, ..., N. Blocher et al.4 report the results for 7-mers and 10-mers. Taking into account the order of monomers in a sequence of length N ) 10, there are 210 ) 1024 different

Sequence Selection during Copolymerization

Figure 5. Stereoisomers of length n are categorized by the number of left- (L) and right-handed (D) monomeric units of which they consist and are denoted by DpLq with p + q ) n. The graphs show the relative abundances of DpLq stereoisomer groups of the oligo-Trp n-mers (n ) 7 and 10, respectively) obtained after two racemic NCA-Trp feedings (a) in the absence (n ) 7) and (b) in the presence of POPC liposomes (n ) 10). The relative abundances of the DpLq stereoisomer groups (light gray columns) are mean values of three measurements. Standard deviations are given as error bars. The white columns correspond to a theoretical distribution, based on the assumption of purely statistical oligomerization. Reprinted from ref 4. (Reproduced from HelVetica Chimica Acta with permission.)

Figure 6. Fit of our model to the experimental data of Figure 5. The numbers on the horizontal axis correspond to the number of L monomers in the polymer; thus 0 f D10, 1 f D9L1, 2 f D8L2, ..., 10 f L10. The relative abundance is plotted on the vertical axis as a percentage of the total number of decamers. (See text for details of parameter values.)

sequences; however, neither the experimental technique nor our model is detailed enough to distinguish all possible chain sequences. Instead we use a categorization into the 11 different sequence classes DiLN-i (for 0 e i e 10). The experimental results of Blocher et al.4 are displayed in Figure 5, and our fit of the distribution of decamers is illustrated in Figure 6. In our fit of the experimental data, for each type of polymer sequence, five vertical bars are shown: The leftmost and rightmost correspond to the experimental results. The second bar indicates the results from a simple binomial fit, which, in our model, corresponds to the random case. The fourth bar is a fit to the general case with R, β, γ, and δ chosen to provide the best possible fit to the experimental data. The third bar shows the “symmetric fit” where β and δ are chosen such that β ) R

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9555 and δ ) γ so as to satisfy the symmetry conditions of the chiral monomers, and then R and γ are chosen to give the best possible fit. This case assumes that the rate of addition of an L monomer to a chain ending in an L is the same as a D monomer being added to a chain currently ending in an D and that the rate of addition of a D monomer to a chain terminated with an L is identical to the rate of addition of L to a D-terminated chain. Since all of our parameters (R, β, γ, and δ) correspond to rates of monomer addition and only the ratio of rates is relevant to the steady-state concentrations, without loss of generality, we can choose R + β ) 2. In the symmetric fit this implies that R ) β ) 1, and fitting γ ) δ simply involves varying one parameter. A fully general fit involves varying three parameters, γ, δ, and the difference R - β. The symmetric fit to the decamer results shown in Figure 5 yields γ ) δ ) 0.351 and is displayed by the central bar in each group in Figure 6. To help assess the relative accuracy of our predictions, we define the root-mean-square error rms ) expt th expt - ci,N-i )2 where ci,N-i is the experimentally measured ∑i(ci,N-i th ) value of the concentration of the N-mer DiLN-i and ci,N-i A B ci,N-i + ci,N-i is the predicted value of ci,N-i from our theory, in which cA1,0 and cB0,1 are the concentrations of the D and L monomers, respectively. For the symmetric fit, we have rms ) 10.41. A fully general fit of the experimental data yields R ) 1.1266, β ) 0.8734, γ ) 0.421, and δ ) 0.289. Note that the average value of γ,δ under this more general fitting is 0.355, which differs by only 1% from the constrained, symmetric fit in which R ) β and γ ) δ. The difference in R,β is more significant. This general fit gives an error of rms ) 6.98, which is only a moderate improvement over the symmetric fit of rms ) 10.41. We should expect some improvement since in the symmetric fit only one parameter is being altered, whereas three are fitted for the fully general model. An alternative measure of the fit is th expt 2 th the χ2 statistic, defined by χ2 ) ∑i(ci,N-i - ci,N-i ) /ci,N-i; the 2 lower the value of χ , the better the fit to the data. The χ2 statistic assumes that deviations from the expected results are normally distributed. For the symmetric fit we find χsym2 ) 1.498, and for the general fit we have χgen2 ) 1.615. The 95% confidence interval allows values of χ2 up to χ2 ) 18.3 for a symmetric fit and up to χ2 ) 15.5 for the general fit;1 if we found χ2 values above these limits, then the model would be rejected as not a good fit to the data. The 1% confidence interval allows χ2 values up to 1.65 for the general fit and up to 2.56 for the symmetric fit. We should be suspicious of values of χ2 much smaller than this since it is rare that agreement with experimental measurements is so precise. The symmetric fit is a 9 degrees of freedom (dof’s) problem since we have 11 items of data and have fitted only one parameter and scaled the data so that its total is 100%; the general fit is a 7 degrees of freedom problem (subtract 3 fitted parameters and the scaling to 100% from 11 items of data). Using the fitted values of R, β, γ, and δ, we have calculated the average sequence length and information content (Shannon entropy) of the system using eqs 6.8, 6.9, and 6.18. These data are given in Table 1 wherein we see that the change from a symmetric fit to a general fit makes no difference to the expected sequence length. The general fit, however, has a slightly lower information content than the symmetric fit. A more detailed analysis of liposome assisted polycondensation of NCA amino acid systems is given by Hitz et al.16 from which the experimental results of Figure 7 are reproduced. They find that homochiral copolymers are over-represented when compared to a binomial distribution and that the overrepresentation is more extreme at longer polymer lengths.

9556 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney

TABLE 1: Summary of the Data Arising from a Numerical Fit of the Experimental Results to the Models Proposed and Analyzed Hereina system, fit

parameters

decamer (Figure 6) symmetric fit

R)β)1

decamer (Figure 6) general fit

R ) 1.1266, β ) 0.8734 γ ) 0.421, δ ) 0.289

χ2 data

other data

χ2 ) 1.498 rms ) 10.41

γ ) δ ) 0.351 χ95%,9dof2 ) 18.3 χ1%,9dof2 ) 2.56

SL ) 3.85 H ) 1.8265

χ2 ) 1.615 rms ) 6.98 χ95%,7dof2 ) 15.5 χ1%,7dof2 ) 1.65

SL ) 3.85 H ) 1.8245

trimer (Figure 8) R ) β ) 1 χ2 ) 0.818 rms ) 18. symmetric fit γ ) δ ) 0.507 χ95%,3dof2 ) 7.81 SL ) 2.97 χ5%,3dof2 ) 0.35 H ) 1.9214 trimer (Figure 8) R ) 0.901, β ) 1.099 general fit γ ) 0.367, δ ) 0.619 hexamer (Figure 9) symmetric fit

R)β)1

hexamer (Figure 9) general fit

R ) 0.898, β ) 1.102 γ ) 0.439, γ ) 0.439

χ2

) 0.15

χ95%,1dof2 ) 3.84 χ5%,1dof2 ) 0.004 χ2

rms ) 3.859

Figure 8. Fit of our model to the experimental data of Figure 7. The relative abundance is plotted on the vertical axis as a percentage of the total number of trimers. See text for details of parameter values.

SL ) 3.12 H ) 1.8928

) 1.398 rms ) 18.94

γ ) δ ) 0.535 χ95%,6dof2 ) 12.59 χ5%,6dof2 ) 1.635

SL ) 2.87 H ) 1.9328

χ2 ) 1.109 rms ) 12.39 χ95%,4dof2 ) 9.49 χ5%,4dof2 ) 0.711

SL ) 2.89 H ) 1.9272

In addition to quoting the best fit values for the parameters R, β, γ, and δ, we summarize the χ2 results, the root-mean-square error (rms), the average sequence length (SL), and the information content (Shannon entropy, H). a

Figure 7. Stereoisomers of length n are categorized by the number of left- (L) and right-handed (D) monomeric units of which they consist and are denoted by DpLq with p + q ) n. Relative abundances for the DpLq stereoisomer groups of the oligo-Leu n-mers are shown for one racemic NCA-Leu feeding in the absence of POPC liposomes (eqs A1A4). The gray columns represent the relative abundances of the DpLq stereoisomer groups and show the mean values of three measurements. Standard deviations are given as error bars. The black columns correspond to a theoretical distribution, based on the assumption of purely statistical (Bernoullian) oligomerization. Reprinted from ref 16. Copyright 2001 American Chemical Society.

Although their main study is on NCA-Trp, they also report similar results from experiments on NCA-Ile and NCA-Leu systems. By repeating experiments at a range of temperatures between 5 and 50 °C, they also show that the effect is independent of the state of the added (POPC) lipid. Fits to distributions of trimers and hexamers are illustrated in Figures

Figure 9. Fit of our model to the experimental data of Figure 7. The relative abundance is plotted on the vertical axis as a percentage of the total number of hexamers. See text for details of parameter values.

8 and 9. As with Figure 6, the first and fifth bars correspond to the experimental results, the second to the “random”, or binomial expectation, the third to a “symmetric” fit of our model (allowing γ ) δ to be varied and β ) R ) 1 held fixed), and the fourth bar to a general fit (allowing γ, δ, and R - β to be varied). For the trimer results quoted in Figure 7 (panel A1), we provide a best fit using our model in Figure 8. The symmetric fit gives γ ) δ ) 0.507 (in addition to R ) β ) 1), giving a residual error of rms ) 18. The fully general fit gives γ ) 0.367 and δ ) 0.619 together with R ) 0.901 and β ) 1.099, (rates scaled so that R + β ) 2). The average value of γ and δ is 0.493, which differs from the symmetric fit of 0.507 by only 3%. This fully general fit gives a residue of rms ) 3.859, which is a considerable improvement on the symmetric fit. This improvement is also demonstrated by the χ2 values of χsym2 ) 0.818 and χgen2 ) 0.150. The 95% confidence intervals for these quantities are χsym2 ) 7.81 (3 dof’s) and χgen2 ) 3.84 (1 dof), and the 5% confidence intervals are 0.35 (3 dof’s) for the symmetric case and 0.004 (1 dof) for the asymmetric case. As with our results on the decamers, both fits indicate that the homopolymeric addition rates (A + A and B + B, namely, R and β) are about twice that of the heteropolymer rates (γ and δ for A + B and B + A, respectively). As in the previously analyzed experimental data, the general fit leads to a lower Shannon Entropy (H) than the symmetric fit. In this case the results of the average sequence length (SL) are higher for the general fit than the symmetric fit; however, since this data is from trimers, we expect end effects to dominate

Sequence Selection during Copolymerization the experimental data and the theoretical predictions to be inaccurate since they are based on long polymers. A symmetric fit of the hexamer data in Figure 7 (panel A4) leads to γ ) δ ) 0.535 together with R ) β ) 1 and rms ) 18.94. A fully general fit yields R ) 0.898, β ) 1.102, γ ) 0.439, and δ ) 0.635. This gives an average heteropolymer addition rate of 1/2(γ + δ) ) 0.537, which is extremely close to the symmetric fit of 0.535. However, in this fit, all parameters differ from the symmetric fit by about 0.1, which is a relative change of 10% for the homopolymer addition rates and 19% for the heteropolymer addition rates. Although these changes are quite high, the more general fit only gives a marginal improvement in root-mean-square error, down from rms ) 18.94 to rms ) 12.39; this behavior is reflected in the χ2 statistics, since χsym2 ) 1.398 and χgen2 ) 1.109. Such a reduction is not so great as the reduction in the 95% confidence limits from χsym2 ) 12.59 (6 dof’s) to χgen2 ) 9.49 (4 dof’s); the 5% confidence intervals are 1.635 (6 dof’s) and 0.711 (4 dof’s). Hence, overall, there is no great motivation here to relax our assumption that the system should be symmetric in R ) β and γ ) δ. The average sequence length is almost identical for both fits, and the information content varies very little, again being slightly lower for the general fit. The terminology, as used by Hitz et al.,16 is that a zerothorder Markov process is one in which the rate of attachment of monomers to the end of a growing polymer does not depend on the terminating unit of the polymer. A first-order Markov process is one in which the rate of addition depends upon the terminating unit of the polymer, and a second-order Markov process in one in which the rate of addition depends upon the penultimate as well as the terminating units of the polymer. It is not clear whether these authors allow the rates to depend, in addition, on the monomeric unit to be added, as we do in our model. Hitz et al.16 claim that their results show that the copolymerization process is at least a second-order Markov process. This argument is entirely qualitative with no supporting calculations quoted. However, we believe our model, which describes the copolymerization process strictly as a first-order Markov process, provides a perfectly acceptable fit of the data; there is no need to invoke higher-order Markov processes. This may be due to our model allowing the growth rate of the chains to depend upon the monomer being added as well as the currently terminating unit of the chain. 8. Conclusions The theoretical analysis of copolymerization presented herein gives a more detailed description than previous analyses of copolymerization. We have constructed a model for copolymerization that retains some important kinetic aspects of chain sequence and composition (section 2). The mathematical model is based on the Becker-Do¨ring system of equations for cluster growth. In particular, by considering the terminating unit and overall composition of each polymer sequence, we are able to investigate the effects that each of the four rate constants and the two monomer concentrations have on: (i) the overall dynamics of the total number of polymer molecules, which grows linearly in time, (ii) the total mass of material in polymeric form, which grows quadratically in time, and (iii) the average polymer length, which grows linearly in time. Three special cases are considered in more detail. For the system in which monomers are identical and a chain’s growth rate does not depend on its terminating unit, there is no preference for monomer attachment built into the model and we observe a distribution with a broad spread of compositions.

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9557 In the case where A and B monomers are more likely to attach to polymers ending with the opposite unit, a strongly peaked narrow distribution of polymers is produced. In the opposite extreme, where monomers are most likely to attach to polymers with the same terminating unit we find that homopolymers dominate at shorter polymer lengths, but as the length of polymer increases, so does the likelihood of a heteropolymeric step, and so for extremely long polymers, the distribution is dominated by heteropolymers, though the distribution is spread extremely diffusely. Variations in the rates of monomer attachment to polymers terminating with each species of monomer can be cumulative and collectively have a larger effect than any one perturbation would suggest. The most obvious examples are the strongly homopolymer- and strongly heteropolymer-dominated systems. We also calculated the Shannon entropy or information content (section 6) for each of our example scenarios. We find that the amount of uncertainty, or information content, in the homopolymer-dominated case varies with the relative strength of the rate coefficients. If β > R and γ > δ, then the rate at which B’s are added to polymer chains outstrips the A addition rate, and the system becomes swamped with B’s. In such a system there would be little uncertainty. At best the uncertainty is H ) 1, which matches the uncertainty in the heteropolymerdominated system. When all additions are equally likely (the “random” case where R ) β ) γ ) δ), the information content is maximized at H ) 2. Thus with “random addition” of monomers, the information storage capacity of the sequences is maximized; this, however, corresponds to a low run length (with a mean of 2 (eq 6.19) but a relatively large standard deviation of σ ) x2 (eq 6.20)); hence little effect of homochirality would be observed in such systems. These results should be compared with the run lengths of the homopolymerdominated system where the run length is large and the standard deviation is also large and of a similar magnitude (see eq 6.25 and the following text) and the heteropolymer-dominated system in which the average run length is very slightly above unity and the standard deviation is very small (see subsection 6.4). The experimental scenarios that we have interpreted using our model show average sequence lengths in the range of 3-4, which is greater than those for the purely random system, but not really large enough to be classified as homopolymerdominated. Similarly, the information content of the experimentally observed polymers is, uniformly across all systems considered, about H ) 1.9, which is very high and close the “randomly polymerizing system” analyzed in sections 3 and 6.3 where H ) 2, the highest content of all special cases that we have considered here. Finally, we have been able to successfully fit our models to the experimental results from Luisi’s group4,16 on chiral copolymerization, whether this be in a system containing chiral monomers or achiral monomers which form a chiral structure on polymerizing. We show that the over-representation of homopolymers found experimentally can be explained by our model, which relies only on first-order Markov processes. Appendix A: Solution of the Full Temporal Copolymerization Rate Equations In subsection 2.1, our aim is to solve the system of rate equations for the macroscopic quantities, namely, the total numbers of polymers of each type (those ending in an A or a B unit) and the total mass involved in each of polymer. The theory below makes some use of the generating functions that are used to solve the full microscopic description of the system

9558 J. Phys. Chem. B, Vol. 111, No. 32, 2007

Wattis and Coveney

and is detailed in Appendix B. Using eq 2.18, eq 2.15 can be solved yielding

NA(t) )

δ[(R + γ)cA1,0 + (β + δ)cB0,1]t

+ NA(0) e (γ + δ) [δ(γ + δ)N(0) + γ(R + γ)cA1,0 -

-(γ+δ)t

(1 - e-(γ+δ)t)

δ(β + δ)cB0,1] NB(t) )

(γ + δ)2

(A1)

+ NB(0) e-(γ+δ)t + (γ + δ) [γ(γ + δ)N(0) - γ(R + γ)cA1,0 + (1 - e-(γ+δ)t) (γ + δ)2

)( )

(

(R + γ)cA1,0 + RNA + δNB 0

+

γ[(R + γ)cA1,0 + (β + δ)cB0,1]t

δ(β + δ)cB0,1]

() (

F˘AA FAA -γ δ + B ) γ -δ FBA F˘A

() (

)( )

F˘AB FAB -γ δ + B ) γ -δ FBB F˘B

(

Φ)

F˘A ) -GA,x(0, 0, t) - GB,x(0, 0, t) ) RNA + δNB + (R + γ)cA1,0 (A3) F˘B ) -GA,y(0, 0, t) - GB,y(0, 0, t) ) γNA + βNB +

FB )

+

2(γ + δ) ((R + γ)cA1,0 + ΓA)t + FA(0) - [ΓA - (RNA(0) + (1 - e-(γ+δ)t) (A5) (γ + δ)

(A11)

(

)

(A12)

(

)

(A13)

[(β + δ)c0,1B + γNA + βNB] 1 γ + δ δ[(β + δ)c0,1B + βNB + γNA]e(γ+δ)t

A1.1. Random System. At the start of section 3, we quoted results describing the evolution of the mass of polymeric material at large times. This arises by solving eqs A9 and A10 using the theory above in the special case where R, β, γ, and δ are all equal. Setting R ) k(1 + R), β ) k(1 + β), γ ) k(1 + γ), δ ) k(1 + δ), with X , 1 for all X, we find

1 FAA ≈ ((2 + 2R + 2δ)cA1,0 + 4 (2 + R + β - γ + 3δ)cB0,1)t2 (A14)

γ(β + δ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2

+ 2(γ + δ) ((β + δ)cB0,1 + ΓB)t + FB(0) - [ΓB - (γNA(0) + βNB(0))]

)

and the solution of eq A10 yields

we find

δNB(0))]

(

δ e-(γ+δ)t γ -e-(γ+δ)t

[(R + γ)cA1,0 + RNA + δNB] 1 v3 A ) γ + δ γ[(R + γ)cA1,0 + RNA + δNB]e(γ+δ)t

v3 B )

δ(R + γ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2

(A10)

Solving eq A9 we find

(β + δ)cB0,1 (A4)

FA )

)

(A9)

Such systems (R4 ) MR + F) are solved by finding a fundamental matrix (Φ) that solves Φ˙ ) MΦ and then substituting R ) Φv. We thus find v3 ) Φ-1F, which, in our case, leads to

(A2)

These quantities enable us to solve for the masses in the system (eqs B3 and B4). From

0 (β + δ)cB0,1 + βNB + γNA

)

(1 - e-(γ+δ)t) (A6) (γ + δ)

1 FBA ≈ ((2 + 2R + 2γ)cA1,0 + 4 (2 + R + β + γ + δ)cB0,1)t2 (A15)

where

ΓA )

ΓB )

δ(R + γ)(γ + δ)N(0) + γ(R - δ)(R + γ)cA1,0 + δ(δ - R)(β + δ)cB0,1 (γ + δ)2 γ(β + δ)(γ + δ)N(0) + γ(γ - β)(R + γ)cA1,0 + δ(β - γ)(β + δ)cB0,1 (γ + δ)2

1 FAB ≈ ((2 + R + β + γ + δ)cA1,0 + 4 (2 + 2β + 2δ)cB0,1)t2 (A16) (A7) 1 FBB ≈ ((2 + R + β + 3γ - δ)cA1,0 + 4 (2 + 2β + 2γ)cB0,1)t2 (A17) (A8)

A1. Partial Densities. Using eqs B3-B6 and B10-B14, we can find equations governing the evolution of the densities FAA, FBA, FAB , and FBB. These are

A1.2. Heteropolymer-Dominated system. Subsection 4.1 summarizes the temporal evolution of macroscopic quantities in the case where the preferred mode of growth is for chains with alternating monomer units; mathematically this corresponds to R,β , γ,δ. At large times the partial densities grow according to

Sequence Selection during Copolymerization

FAA ≈ FAB ≈ FBA



FBB



J. Phys. Chem. B, Vol. 111, No. 32, 2007 9559

γδ2(γcA1,0 + δcB0,1)t2

which are generating functions describing the evolution of the pure homopolymers composed of only A or only B monomers;

2(γ + δ)2 γ2δ(γcA1,0 + δcB0,1)t2 2(γ + δ)2

(A18)

A1.3. Homopolymer-Dominated System. In subsection 5.1, the system dominated by like-like polymerization was analysed; this corresponds to R,β . γ,δ. At large times, the solution of eqs A9 and A10 using the method described above leads to the results that the partial densities grow according to

1 FAA(t) ≈ FAA(0) + R(cA1,0 + NA(0))t + R2cA1,0t2 (A19) 2 FBB(t)



FBB(0)

+

β(cB1,0

∑ ∑ cr,sA (t) e-rx-sy r)1 s)0

GB(x,y,t) )

∑ ∑ cr,sB (t) e-rx-sy r)0 s)1



(B1)

NA ) GA(0,0,t) NB ) GB(0,0,t) N ) NA + NB

(B2)

FAA ) -

∂GA ∂GA FA ) F ) FAA + FAB | | ∂x 0,0 B ∂y 0,0 A

(B3)

FBA ) -

∂GB ∂GB |0,0 FBB ) | F ) FAB + FBB ∂x ∂y 0,0 B

(B4)

These are all useful quantities for characterizing the behavior of the system. For the ensuing analysis, we decompose both of the generating functions GA and GB into three parts, one corresponding to each of the three equations listed in eqs 2.62.11

GA(x,y,t) ) HA(x,t) + e-xDA(y,t) + CA(x,y,t)

(B5)

GB(x,y,t) ) HB(y,t) + e-yDB(x,t) + CB(x,y,t)

(B6)

Here we have defined ∞

A cr,0 (t) e-rx ∑ r)1



(B8)









A cr,s (t) e-rx-sy ∑ ∑ r)2 s)1 B cr,s (t) e-rx-sy ∑ ∑ r)1 s)2

(B9)

Translating the kinetic eqs 2.6-2.11 into these generating functions we obtain

H˙ A(x,t) ) cA1,0 e-x(R + γ) - (R + γ - R e-x)HA(x,t) (B10) H˙ B(y,t) ) cB0,1 e-y(β + δ) - (β + δ - β e-y)HB(y,t)

(B11)

e-x D˙ A(y,t) ) δ e-x HB(y, t) - (R + γ)e-x DA(y,t)

(B12)

e-y D˙ B(x,t) ) γ e-y HA(x,t) - (β + δ) e-y DB(x,t)

(B13)



GA(x,y,t) )

the formulas that GA and GB satisfy depend on the conditions imposed on the monomer concentrations. Even if GA(x,y,t) and GB(x,y,t) cannot be explicitly found, equations for various moments of the distribution can be constructed from them. For example

HA(x,t) )

B cr,1 (t) e-rx-y ∑ r)1

CB(x,y,t) )

We make use of generating functions to analyse the kinetics of the copolymerization process. From any such analysis we can find equations for the moments of the distributions. The first stage of our analysis is to define the generating functions



e-y DB(x,t) )

CA(x,y,t) )

together with, at this order, FBA ) 0 ) FAB .



A c1,s (t) e-x-sy ∑ s)1

describe the polymers that are almost pure, being composed of sequences of the forms AnB or BnA; finally, generic polymers are described by

1 + NB(0))t + β2cB1,0t2 (A20) 2

Appendix B: General Solution of the Kinetic Eqs 2.6-2.11



e-x DA(y,t) )



HB(y, t) )

B c0,s (t) e-sy ∑ s)1

(B7)

( ) (

C˙ A R e-x - R - γ δ e-x ) C˙ B γ e-y β e-y - β - δ

(

)( )

CA CB +

R e-2xDA + δ e-x-yDB β e-2yDB + γ e-x-yDA

)

(B14)

Before analyzing the kinetics of the system in subsection B2, including the kinetics of the quantities N, FA, and FB (subsection 2.1), we shall analyze the form of the steady-state solution to which we expect the system to be attracted. B1. Steady-State Solution. We solve eqs 2.6-2.11 for their steady states by analyzing the latter equations first; eqs 2.10 and 2.11 imply A B ) Rˆ r-1cA1,0 c0,s ) βˆ s-1cB0,1 cr,0

(B15)

which in turn imply ∞

Hss A (x)

)

∑ r)1

Ass cr,0

-rx

e



Hss B (y)

)

∑ s)1

Bss c0,s

e

)

cA1,0 e-x 1 - Rˆ e-x

-sy

)

cB0,1 e-y 1 - βˆ e-y

(B16)

where we use the superscript ss to denote steady state. Turning B A now to eqs 2.8 and 2.9 we aim to solve for cr,1 and c1,s , obtaining

9560 J. Phys. Chem. B, Vol. 111, No. 32, 2007

(R +δ γ)βˆ γ )( Rˆ β + δ)

A c1,s )

s-1 B c0,1

B cr,1

r-1 A c1,0

Wattis and Coveney

(B17)

In terms of the generating functions, we have ∞

∑ s)1

Ass -x-sy c1,s e ) e-x Dss A (y) )



∑ r)1

Bss cr,1

-rx-y

)e

e

-y

Dss B (x)

)

δ e-x

Hss A (y)

R+γ γ e-y β+δ

(B18)

Hss A (x)

(B19)

We solve the rest of the system using generating functions directly. Equation B14 implies that the steady-state solution is given by

( )

Css 1 A ) ∆ Css B

(

δ e-x-y Dss B R+γ γ -2y ss -x-2y ss -x-y ss e βˆ e DB + (1 - Rˆ - βˆ )e DB + DA β+

Rˆ e-2x Dss ˆ - βˆ )e-2x-y Dss A + (1 - R A +

)

(B20)

Figure 10. Illustration of the propagation of the steady-state solution of eq B30 into two-dimensional aggregation space (r,s). Assuming the system is initiated with no polymers present, below the line the system is at steady state (ψ ) 1), and above the line the concentrations are zero (ψ ) 0).

treat the arguments of the functions ψA and ψB as continuous variables, which allows us to approximate discrete differences by Taylor series of the functions ψA and ψB; thus subscripts on ψ denote derivatives. The functions ψA and ψB tend to unity in the large-time limit, and following the results of ref 26, we expect this to be via a travelling diffusive wave that propagates from small (r,s) values to larger ones. Returning now to the general system in eqs 2.6 and 2.7 and noting that the steady state satisfies A A B (R + γ)cjr,s ) Rcjr-1,s + δcjr-1,s

where

∆(x,y) ) 1 - βˆ e-y - Rˆ e-x + (Rˆ + βˆ - 1)e-x-y (B21)



)





dp,q e-px-qy ∑ ∑ p)0 q)0

p+q

Rˆ m-qβˆ m-p(1 - Rˆ - βˆ )p+q-mm!

m)max{p,q}

(m - p)!(m - q)!(p + q - m)!



dp,q )

ψAt )

[

δ R+γ

A 2cr,s

(B22)

ψBt ) Rˆ cB0,1dr-2,s-1 + (1 - Rˆ )(1 - βˆ )×

r-2

cA1,0

B ) cr,s

∑ Rˆ

r-p-2

p)0

γ

dp,s-1 +

δ R+γ

s-2

cB0,1

∑ βˆ

s-q-2

]

βˆ cA1,0dr-1,s-2 + (1 - Rˆ )(1 - βˆ )×





]

B2. Kinetics of the Approach to Steady State. At large times, the system will approach its steady state via equations that can be constructed using a continuum limit. To find this we substitute A A A B B B (t) ) cr,s ψ (r,s,t) cr,s (t) ) cr,s ψ (r,s,t) cr,s

+

B δcr-1,s A 2cr,s

ψBrr (B27)

B 2cr,s

ψBss +

A γcr,s-1 B 2cr,s

ψAss (B28)

Assuming that, to the leading order, ψA ) ψB ) ψ, we obtain the equations

β+δ r-2 s-2 γ Rˆ r-p-2dp,s - 2 + cB0,1 βˆ s-q-2dr-1,q (B24) cA1,0 β+δ p)0 q)0

[

ψArr

A B A γcr,s-1 βcr,s-1 γcr,s-1 A B B (ψ ψ ) ψ ψAs + s B B B cr,s cr,s cr,s B βcr,s-1

dr-2,q (B23)

q)0

B A B δcr-1,s Rcr-1,s δcr-1,s B A A (ψ ψ ) ψ ψBr + r A A A cr,s cr,s cr,s A Rcr-1,s

which implies that dp,0 ) Rˆ p, d0,q ) βˆ q, and d0,0 ) 1. Ultimately, we find A ) cr,s

(B26)

we next assume that at the leading order the kinetics can be A A A B B B (t) ) cr,s ψ (r,s,t) and cr,s (t) ) cr,s ψ (r,s,t). We described by cr,s then find

Expanding the inverse of this we find

1

B B A (β + δ)cjr,s ) βcjr,s-1 + γcjr,s-1

(B25)

A B where cr,s and cr,s are the steady states for concentrations of chains ending in A and B, respectively (eqs B23 and B24). We

ψt ) -(R + γ)ψr ψt ) -(β + δ)ψs

(B29)

which together yield the solution

ψ ) Θ((R + γ)(β + δ)t - (β + δ)r - (R + γ)s)

(B30)

where Θ(‚‚‚) is the Heaviside function. So along the r-axis (s ) 0) the front moves with a velocity (R + γ) so that its position is given by r ≈ (R + γ)t, and along r ) 0 (the s-axis) its position is given by s ≈ (β + δ)t, as illustrated in Figure 10. The higher-order (second-derivative) terms, which determine the shape of the wavefront ψ(r,s,t), are not as straightforwardly solved for as in single-component systems,26 since at second

Sequence Selection during Copolymerization

J. Phys. Chem. B, Vol. 111, No. 32, 2007 9561

order there are differences between ψA and ψB that need to be accounted for. Assuming that the system starts with initial data in which there are no polymers present, the evolution proceeds via the motion of a diffusive wave as illustrated in Figure 10. Thus, to A with r,s . 1, for example, one form a polymer of the form cr,s has to wait a time t ) r/(R + γ) + s/(β + δ). B3. Asymptotes for a Random System. The “random” system corresponds to the case where we have R ) β ) γ ) δ ) 1. This implies that Det ≈ 1 - 1/2e-x - 1/2e-y, and somewhat more simply than eq B22, we find

dp,q )

(p + q)! p+q

2

p!q!

≈ exp

(

)

-(p - q)2 2(p + q)

(B31)

where the asymptotic approximation is valid for large p and q and is obtained by using Stirling’s approximation for the factorial function.1 B3.1. Temporal BehaVior of Macroscopic Quantities. Largetime asymptotics are given by NA ) NB ) t and FA ) FB ) (cA1,0 + cB0,1)t2. For the O() terms we have

1 1 NA(t) ≈ 1 + R + δ cA1,0 + 2 2 1 1 1 + β - γ + δ cB0,1 t (B32) 2 2

[(

) (

[(

) ( ) ] 1 1 F (t) ≈ [(1 +  +  +  )c + 2 2 (1 + 21  + 21  +  )c ]t 1 1 F (t) ≈ [(1 +  +  +  )c + 2 2 (1 +  + 21  + 21 )c ]t R

γ

δ

A 1,0

R

B

R

β

A cr,r (t) ≈

δ B c H(Vt - r - r0) γ 0,1

B cs,s (t) ≈

γ A c H(Vt - s - s0) δ 1,0

γ

β

δ

B 2 0,1

(B34)

δ

B 2 0,1

(B35)

A 1,0

β

γ

B (t) ≈ cB0,1H(Vt - s - s1) cs,s+1

(B36)

In the main body of the text (section 3), we simply consider the leading order terms, that is, with R ) β ) γ ) δ ) 0. B3.2. Large-Time Kinetics. Following subsection B2 we A assume that at large times the solution is given by cr,s (t) ) A A B B B cr,sψ (r,s,t) and cr,s(t) ) cr,sψ (r,s,t) with the time-independent A B and cr,s defined by eqs 3.1 and 3.2. Substituting R ) states cr,s β ) γ ) δ into eq B30 we obtain ψ ) Θ(2Rt - r - s); thus, at large times, the part of the system satisfying r + s < 2Rt is in the steady state. In this case ψA ) ψB ) ψ, so eqs B27 and B28 are identical and equal to

ψt ) -2ψr + ψrr ) -2ψs + ψss

(B39)

This leads to a system of four simultaneous equations (for r0, r1, s0, and s1), with the consistency condition giving

V)

γδ γ+δ

(B40)

This agrees with the general theory outlined in subsection B2 and eq B30 in particular. The result (eq B40) can be checked A A B by calculating NA(t) ) ∑r(cr,r + cr+1,r ), NB(t) ) ∑s(cs,s + B cs,s+1), FA(t), and FB(t) in two ways. From eqs B38-B40 we have NA ≈ Vt(cA1,0 + δcB0,1/γ), NB ≈ Vt(cB0,1 + γcA1,0/δ), FA ≈ V2t2(cA1,0 + δcB0,1/γ)/2, and FB ≈ FA. Comparing these results for NA, NB, FA, and FB with the eqs 4.1 and 4.2, we find perfect agreement. B5. Asymptotes for a Homopolymer-Dominated System. Here both Rˆ and βˆ are 1 + O() with  , 1. To the leading order, the form of dp,q is given by Det ≈ (1 - e-x)(1 - e-y), which implies dp,q ) 1. Including higher-order corrections we find

(

)

β e-y R e-x 1 1 1 ) 1 + Det 1 - e-x 1 - e-y 1 - e-y 1 - e-x

The average polymer length, L ) F/N, is thus given by

1 L ) t + (R + β + γ + δ)t 4

(B38)

A (t) ≈ cA1,0H(Vt - r - r1) cr+1,r

) ]

1 1 NB(t) ≈ 1 + R + γ - δ cA1,0 + 2 2 1 1 1 + β + γ cB0,1 t (B33) 2 2 A

B4. Asymptotes for a Heteropolymer-Dominated System. In this case monomers are most likely to bind to chains that terminate with the different species. Mathematically this corresponds to the rates satisfying γ,δ , R,β so that Rˆ ,βˆ , 1. The determinant and inverse (eq B22), which are vital to finding the steady-state solution, simplify considerably. We then have Det ≈ 1 - e-x-y to the leading order, and so dp,q ) δp,q where this δp,q is the Kronecker delta; this simplification leads to the steady-state concentrations (eq 4.4). Since there are only a few O(1) concentrations, we seek a large-time solution of the form

(B37)

which gives ψ ) Θ(2t - r - s). Thus the waiting time from A,B the trivial initial data for the polymer cr,s to reach steady state is 1/2(r + s).

(B41)

where β ) 1 - βˆ and R ) 1 - Rˆ , which implies dp,q ) 1 + pR + qβ. B5.1. Large-Time Kinetics. In this case, we can compare the results with known results,26 since to the leading order the problem decouples into two single-component equations. The single-component models predict a front propagation speed that depends on the product of the constant monomer concentration and the aggregation rate (in the absence of fragmentation). Thus A to progress at a in the present case we expect the front in cr,0 B rate equal to R and that in c0,s to progress at rate β. In the case γ,δ , R,β, eq B30 simplifies to ψ ) H(Rβt βr - Rs), so along the r-axis (s ) 0) the wave front moves according to r ≈ Rt as t f ∞, and along the s-axis (r ) 0), s ≈ βt. This agrees with our intuition that the system decouples into two single-component systems for r,s ) O(1). From our previous analysis of single-component systems we expect the large-time asymptotic solution for pure homopolymers to be of the form

9562 J. Phys. Chem. B, Vol. 111, No. 32, 2007

( ) ( )

Wattis and Coveney

r - Rt 1 cr,0 ) cA1,0 erfc 2 x2Rt 1 s - βt c0,s ) cB0,1 erfc 2 x2βt

(B42)

Note that these solutions agree with the results of the earlier section, since after time t they predict values of NA, NB, FA, and FB that match the results quoted in eqs 5.1 and 5.2. In the limit R,β . γ,δ in eqs B27 and B28, we have

1 ψAt ) -RψAr + RψArr 2 1 ψBt ) -βψBs + βψBss 2

(B43)

for s ) 0 and r ) 0 respectively, confirming the results of eq A B B42. However, in r,s g 1, we have cr,s ) δcB0,1/(R + γ) and cr,s A ) γc0,1/(β + δ). For larger r,s values with r,s g 1 we have the solution A cr,s

δcB0,1 Θ(Rβt - βr - Rs) ≈ R+γ

B ≈ cr,s

δcA1,0 Θ(Rβt - βr - Rs) β+δ

(B44)

Acknowledgment. We are grateful to Pier Luisi Luigi for originally motivating us to address this problem and for interesting discussions over many years. We also acknowledge Grant No. GR/R94916/01 from the Engineering and Physical Sciences Research Council (EPSRC) of the United Kingdom. P.V.C. is grateful to the EPSRC for support under EPSRC Reality Grid Grant No. GR/R67699. References and Notes (1) Handbook of Mathematical Functions; Abramowitz, M., Stegun, I. A., Eds., Dover: New York, 1972. (2) Allcock, H. R.; Lampe, F. W. Contemporary Polymer Chemistry, 2nd ed.; Prentice Hall: Englewood Cliffs, NJ, 1990. (3) Becker, R.; Do¨ring, W. Kinetische behandlung der keimbildung in u¨bersa¨ttigten da¨mpfern. Ann. Phys. 1935, 24, 719-752. (4) Blocher, M.; Hitz, T.; Luisi, P. L. Stereoselectivity in the oligomerisation of racemic tryptophan N-carboxyanhydride (NCA-Trp) as determined by isotope labeling and mass spectrometry. HelV. Chim. Acta 2001, 84, 842-848.

(5) Bolton, C. D.; Wattis, J. A. D. Model of size-templating in vesicle formation I: A microscopic model and analysis. J. Phys. Chem. B 2003, 107, 7126-7134. (6) Brilliantov, N. V.; Krapivsky, P. L. Non-scaling and source-induced scaling behaviour in aggregation models of movable clusters. J. Phys. A: Math. Gen. 1991, 24, 4787-4803. (7) Campbell, D.; Pethrick, R. A.; White, J. R. Polymer Characterization, 2nd ed.; Stanley Thornes: Cheltenham, U. K., 2000. (8) Carraher, C. E., Jr. Introduction to Polymer Chemistry; Taylor & Francis: Boca Raton, FL, 2007. (9) Carr, J.; Dunwell, R. M. Kinetics of cell surface capping. Appl. Math. Lett. 1999, 12, 45-49. (10) Coveney, P. V.; Wattis, J. A. D. A Becker-Do¨ring model of selfreproducing vesicles. J. Chem Soc., Faraday Trans. 1998, 102, 233-246. (11) Coveney, P. V.; Wattis, J. A. D. Coarse-graining and renormalisation group methods for the elucidation of the kinetics of complex nucleation and growth processes. Mol. Phys. 2006, 104, 177-185. (12) Denbigh, K. G.; Denbigh, J. S. Entropy in Relation to Incomplete Knowledge; Cambridge University Press: Cambridge, U. K., 1985. (13) Elias, H.-G. Macromolecules; Wiley-VCH: Weinheim, Germany, 2005. (14) Flory, P. J. Principles of Polymer Chemistry; Cornell University Press, Ithaca, NY, 1953. (15) Garey, M. R.; Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W.H. Freeman: San Francisco, 1979. (16) Hitz, T.; Blocher, M.; Walde, P.; Luisi, P. L. Stereoselectivity aspects in the condensation of racemic NCA-amino acids in the presence and absence of liposomes. Macromolecules 2001, 34, 2443-2449. (17) King, J. R.; Wattis, J. A. D. Asymptotic solution of the BeckerDo¨ring equations with size-dependent rate coefficients. J. Phys. A: Math. Gen. 2002, 35, 1357-1380. (18) Reza, F. M. An Introduction to Information Theory; Dover: New York, 1994. (19) Sandars, P. G. H. A toy model for the generation of homochirality during polymerisation. Origins Life EVol. Biosphere 2003, 33, 575-587. (20) Wattis, J. A. D. An introduction to the fundamental mathematical models of coagulation-fragmentation processes. Physica D 2006, 222, 1-20. (21) Wattis, J. A. D. A Becker-Do¨ring model of competitive nucleation. J. Phys. A: Math. Gen. 1999, 32, 8755-8784. (22) Wattis, J. A. D. Exact solutions for multi-component coagulation: cluster-growth with evolving size and shape profiles. J. Phys. A: Math. Gen. 2006, 39, 7283-7298. (23) Wattis, J. A. D.; Coveney, P. V. The origin of the RNA world: A kinetic model. J. Phys. Chem. B 1999, 103, 4231-4250. (24) Wattis, J. A. D.; Coveney, P. V. Analysis of symmetry-breaking in chiral polymerisation kinetics. Origins Life EVol. Biosphere 2005, 35, 243-273. (25) Wattis, J. A. D.; Coveney, P. V. Chiral polymerisation and the RNA world. Int. J. Astrobiol. 2005, 4, 63-73. (26) Wattis, J. A. D.; King, J. R. Asymptotic solution of the BeckerDo¨ring equations. J. Phys. A: Math. Gen. 1998, 31, 7169-7189. (27) Wittcoff, H. A.; Reuben, B. G.; Plotkin, J. S. Industrial Organic Chemicals; Wiley: New York, 2004. (28) Wu, D. T. General approach to barrier crossing in multi-component nucleation. J. Chem. Phys. 1993, 99, 1990-2000.