Molecular Weight Distributions of Starch Branches Reveal Genetic

Nov 8, 2010 - ... branching enzyme), and debranching (usually the debranching enzymes, isoamylase, and pullulanase);(6-8) enzymes catalyzing these thr...
0 downloads 0 Views 4MB Size
Biomacromolecules 2010, 11, 3539–3547

3539

Molecular Weight Distributions of Starch Branches Reveal Genetic Constraints on Biosynthesis Alex Chi Wu and Robert G. Gilbert* University of Queensland, Centre for Nutrition and Food Sciences and LCAFS, Hartley Teakle Building, Brisbane, Qld 4072, Australia Received August 31, 2010; Revised Manuscript Received October 18, 2010

Modeling the chain-length distributions (CLDs, the molecular weight distributions of individual branches) in a polymer system can be exploited to obtain information on the underlying (bio)synthesis mechanisms. Such a model is developed for starch (a highly branched glucose polymer), taking into account multiple isoforms of the three types of enzymatic mechanisms contributing directly to the CLD: propagation, branching, and debranching. The resulting CLD is given by two parameters and can thus be represented by a point in a two-dimensional phase diagram. The model implies that all native-starch amylopectin CLDs are confined to a line in this phase diagram, an inference supported by fitting data for a wide range of plants. This gives new ways to classify mutants and suggests useful directions for plant engineering (e.g., which isoforms could be targeted to give long branches, which are nutritionally desirable).

Introduction The molecular weight distribution (MWD) in a sample of a linear polymer contains information about the processes of formation, growth, and termination that led to this distribution. In synthetic polymers, this has been exploited to uncover the mechanisms of these processes (e.g., as used in investigating free-radical polymerization1,2). The present paper applies this methodology to starch biosynthesis. Starch is a branched polymer of glucose, containing R-1,4 and R-1,6 glycosidic links (Chart 1); it is the long-term energy storage polymer in pulses, grains, and tubers (starch-accumulating organisms) and a short-term storage polymer (“transient starch”3) in stem, bark, and leaves. Starch comprises two glucans: amylose (molar mass ∼105-106 g mol-1, with a small number of long-chain branches) and amylopectin (molecular weight about 2 orders of magnitude higher and containing a vast number of short branches). In a plant, the amylopectin component of starch forms alternating crystalline and amorphous lamellae in water-insoluble granules (Figure 1, which is drawn to be consistent with inferences from the work of Thompson4 and Delatte et al.5). The primary biological function of starch is efficient energy storage, which is due largely to the compact crystalline lamellae; these are formed by clusters of amylopectin branches having relatively small degrees of polymerization (DPs), typically below ∼35, and mainly confined to a single crystalline lamella. The less organized amorphous regions contain portions of amylopectin chains and amylose chains (if present). Starch is produced by the concerted actions of numerous enzymes, of several types (including isoforms of the same type) in complex pathways, which vary from species to species.6 The distribution of the DPs of individual branches in starch, the chain length-distribution (CLD), is the result of these biosynthetic pathways. There are three types of enzymatic processes contributing directly to the formation of the CLD (Figure 2): propagation (the enzymes for which include starch synthase and

granule-bound starch synthase), branching (the enzymes being various isoforms of starch branching enzyme), and debranching (usually the debranching enzymes, isoamylase, and pullulanase);6-8 enzymes catalyzing these three processes are denoted here, SS, SBE, and DBE, respectively, irrespective of any particular isoforms. There are many isoforms of each type of enzyme: SSI, SSII, SSIIa, SSIIb, SSIII, SSIV, GBSSI, GBSSIIb, BEI, BEII, BEIIa, BEIIb, and various types of debranching enzymes (pullulanase PUI and isoamylases ISAI, ISAII, ISAIII). A CLD is denoted here Nde(X), the number of branches with DP X. This is obtained experimentally by treating starch with a debranching enzyme and then measuring the molecular weight distribution of the resulting linear chains by several alternative techniques: FACE (fluorophore-assisted capillary electrophoresis9),high-performanceanion-exchangechromatography(HPAEC), or size-exclusion chromatography (SEC, also called gel permeation chromatography, GPC). These CLDs are presented here as log10Nde(X), which brings out features which are often not apparent in the more conventional ways of presenting as number or as difference distributions10 (Figure 5, given later, includes an example comparing conventional and logarithmic representations). Considerable information on starch has been obtained empirically from CLDs (e.g., using difference plots in defining phenotypes of starch mutants and transgenics8). One application is seeing which mutants give longer chains; plants with this feature tend to have more “resistant” starch,11 with beneficial nutrition properties. Typical CLDs from FACE and HPAEC are given in Figure 3. Chart 1.

Structure of Starch (Haworth Representation)

* To whom correspondence should be addressed. E-mail: b.gilbert@ uq.edu.au. 10.1021/bm1010189  2010 American Chemical Society Published on Web 11/08/2010

3540

Biomacromolecules, Vol. 11, No. 12, 2010

Figure 1. Amylopectin structure giving rise to the crystalline and amorphous lamella structures in the semicrystalline growth rings of starch.

A kinetic model for the CLD has the potential for obtaining in-depth biosynthesis information from observed structure, analogous to what can be obtained from MWDs for synthetic polymers. Now, any model for the CLD incorporating all enzymes involved in starch biosynthesis would require so many unknown parameters that comparison with experiment would be no more than a curve-fitting exercise. We here derive a “reductionist” approach to this modeling (greatly extending our earlier work in this regard12): assuming that the number of monomer units in a given branch is determined by a single “isoform set” of one enzyme for each of the three contributing processes (propagation, branching, and debranching). An isoform set comprises one particular branching enzyme (e.g., BEIIa, and no other type of branching enzyme), one particular propagation enzyme, and one particular debranching enzyme. The CLD model is specified by just two rate parameters: the ratio of the rate at which a branching enzyme (generically denoted SBE) proceeds to that at which a propagation enzyme (generically denoted SS) proceeds, and the corresponding ratio of a debranching enzyme (generically denoted DBE) to that of SS. The model gives both quantitative fitting and qualitative understanding of the underlying rates in starch biosynthetic pathways. Knowledge gained from fitting experimental CLDs with this model will be exploited to justify mechanisms in starch biosynthesis and provide directions for plant engineering by manipulating known mechanisms. It predicts that CLDs with a starch-like structure can only be obtained when the enzymatic rates and, hence, underlying genetics are highly constrained: there is a biological imperative in these constraints. These inferences suggest both limitations and targets for plant engineering aiming for certain properties of starch. The model also has the potential to provide a new way of quantitatively characterizing native starch and systematically studying mutants.

Theoretical and Experimental Section Theoretical Development and Comparison with Experiment. As exemplified in Figure 3, amylopectin CLDs are always similar. The log10Nde(X) always shows a global maximum followed by a rapid decrease, typically over DP 10-33, corresponding to chains confined to a single lamella. DPs around the global maximum are assumed to be dominated by a first set of isoforms: “isoform-set 1”. A small shoulder after the global maximum in this range is frequently apparent (at DP ∼ 16), which is postulated to correspond to where a second set of isoforms, “isoform-set 2”, becomes significant. The small change

Wu and Gilbert

Figure 2. Schematic of the enzymatic processes considered here for starch biosynthesis.

Figure 3. Points (joined by black lines for clarity): experimental CLDs (logarithmic number distributions as functions of DP X) for starchaccumulating species, wheat, maize, rice, barley, and potato, and one transient-starch synthesizing species, Arabidopsis; data replotted from refs 10 and 16. All data were obtained using FACE, except for Arabidopsis, which were obtained using HPAEC. Red lines: fits using eq 5 to single-lamella (first range, dominated by isoform-set 1; second range, dominated by isoform-set 2) and trans-lamella branches (dominated by isoform-set 3). The ranges are indicated by vertical lines. Arrows: shoulders observed in experimental CLDs at low DP, as predicted by the model. Each data set has been translated on the y-axis by a convenient factor to improve clarity.

in the shape of log10Nde(X) after this second-isoform-set shoulder, if present, suggests that isoform-set 2 acts similarly to isoform-set 1. For DPs g 33, there is a second maximum or shoulder with distinctly different shape, corresponding to trans-lamella branches, and this region is dominated by isoform-set 3.10 The current model uses some concepts from our earlier approach,12 in which it was assumed that the entire CLD was determined by only a single isoform set. The CLD model was then derived from the kinetic equations describing its formation and is specified entirely by two ratios: that of the rate at which SBE proceeds to that at which SS proceeds and the corresponding ratio of DBE to SS. This earlier treatment ignored three important phenomena and, thus, oversimplified the biosynthetic pathway: (i) SBE can only create a branch greater than some minimum DP (e.g., Guan et al.13), denoted here Xmin, which in vitro studies suggest is ∼7 (Figure 2), (ii) studies

Genetic Constraints on Biosynthesis

Biomacromolecules, Vol. 11, No. 12, 2010

of starch in Arabidopsis14 suggest that the branch remaining after this action of SBE must be more than some minimum DP (∼6), which is here denoted X0, that is, SBE acts only on chains of DP g Xmin + X0, and (iii) assuming that only a single isoform-set controls the whole CLD. While CLDs predicted from this earlier model qualitatively resembled some aspects of experiment, quantitative agreement could not be obtained. This paper presents a major extension of the earlier treatment, which as will be seen sheds considerable light on the kinetics of starch biosynthesis. First, the Xmin and X0 effects for SBE are introduced. Second, the restriction of a single isoform-set is lifted, replaced with the lesser restriction that an individual branch is assumed to grow under a single isoform-set, allowing multiple isoform-sets to be included in the model (naturally, the model is equally applicable for simpler starchforming systems such as red algae which have only one isoform for each type of enzyme15). The multiple isoform-sets are assumed to dominate different ranges in CLDs (Figure 3). In the present model, a crystalline lamella is assumed to form quickly once an appropriate CLD is obtained (and so crystallization is assumed not to be rate-determining), thereby releasing enzymes which can act on other chains. This leads to the assumption that enzymatic activity is constant with respect to its corresponding substrates, independent of total molecule size, chain length or location on a molecule of starch (except for the Xmin and X0 dependences of SBE), and availability of glucose monomer. Effects that may impose additional restrictions, such as spatial confinement in a granule (which may be significant for amylose), are ignored in this first approach. The resulting kinetic equations for the CLD in the present model are specified by the same two rate ratios, as previously, plus Xmin and X0. The test of these assumptions will be through a wide range of experimental data for amylopectin CLDs. Any fit must not only be qualitative but quantitative: e.g. reproducing general features of CLDs, and correlating trends in the fitted rate ratios with expected enzymatic activities (e.g., fitting data from a mutant with branching enzyme silencing should give a lower branching activity). It is emphasized that the present paper describes only the branch (chain length) distribution following debranching of a whole starch molecule, not the infinitedimensional distribution describing the structure of the whole (undebranched) starch molecule, although the model in the present paper could be used (for example, following some of the methods in our earlier work12) within a model for this fully branched distribution. DeriVation of the Model. Any individual branch in starch is assumed to grow under the influence of a single isoform-set. A single isoformset gives rise to a population of branches (a “component” of the overall CLD) that is controlled by a single set of rate parameters: the values of two rate ratios (β and γ, defined in eq 5 below), plus the values of Xmin and X0, corresponding to that isoform-set. The overall CLD is the sum of the components from all isoform-sets. The growth of translamella branches is assumed to be independent of what is present in the preceding (originating) crystalline lamella: the portion of a translamella chain that grows beyond a crystalline lamella then behaves kinetically the same as single-lamella chains. The region dominated by trans-lamella branches (under the control of isoform-set 3) may also include a small number of moderate length single-lamella branches (isoform-sets 1 and 2). The equation for the development of a given component of the CLD over time is given by eq 1:

∂Nde ) f(Nde, X) ∂t

(1)

The time dependence of Nde is denoted Nde(X,t), where time needs to be explicitly considered, and Nde(X), where time-independent (steadystate) solutions to the equation are discussed. Here f has contributions to that component of the CLD from the three processes, SS, SBE, and DBE, denoted fSS, fSBE, and fDBE, respectively. These are derived using

3541

the same conventional chemical-kinetic development used to quantify the molecular weight distribution of synthetic polymers (for example in ref 17). The propagation process, through a generic SS process described in the model, is addition of glucose (as ADP-glucose). This converts a chain of DP X - 1 to one of X and, similarly, one of DP X to X + 1. The rate of addition (propagation) of a single monomer unit to a growing nonreducing end of a chain, aSS, is taken to be the enzyme activity multiplied by the concentration of that enzyme and divided by the concentration of substrate (the nonreducing end of chains) available. As stated above, the model assumes these rates to be constant. The propagation contribution is then given by eq 2:

fSS ) aSSNde(X - 1) - aSSNde(X)

(2)

The debranching process, through a generic DBE, in this model is assumed to completely remove branches in a random and nonselective fashion without substrate specificity (although, as discussed later, the model can be fine-tuned by including specificity such as the description of the debranching process in the glucan-trimming model18). In the present mathematical modeling of starch biosynthesis, debranching is essential to give the type of CLDs observed in amylopectin: a suitable chain-length distribution for efficient energy storage in crystalline lamellae. The debranching rate aDBE is defined analogously to aSS, describing the loss of chains due to debranching contribution, eq 3:

fDBE ) -aDBENde(X)

(3)

The branching process, through a generic SBE, involves snipping off a chain of any number of units of DP greater than or equal to Xmin to form a branch with this snipped chain anywhere on the same, or another, chain, and leaving a remnant. As pointed out by an anonymous reviewer, in vivo studies on Arabidopsis14 suggest that the resulting branch remnant cannot be less than some minimum DP, denoted X0, typically DP 6 in Arabidopsis (i.e., SBE cannot operate on a branch of DP < X0 + Xmin). The rate of this process per originating growing chain, aSBE, is defined in the same way as aSS and aDBE. SBE thus creates two smaller growing branches from one: the branch that has been split off and the remnant (a behavior very different from that resulting in branches in synthetic polymers). This results in three terms for the branching contribution: loss of a chain and gain of both a snipped chain and a remnant chain. The rates for these are given by the following equations (see Supporting Information):

loss of a chain ) -aSBENde(X)H(X - (Xmin + X0))



snipped chain ) aSBE

Nde(k + X0) H(X - Xmin) min + 1

∑k - X k)X



remnant chain ) aSBE

Nde(k + Xmin) H(X - X0) k - X0 + 1 k)X



(4)

Here the step function H quantifies the Xmin and X0 constraints; H(Y) ) 0 for Y < 0; H(Y) ) 1 for Y g 0. The total branching contribution is the sum of the proceeding three terms and is given by eq 4:

3542

Biomacromolecules, Vol. 11, No. 12, 2010

Wu and Gilbert

-aSBENde(X)H(X - (Xmin + X0)) + ∞

aSBE

Nde(k + Xmin) H(X - X0) + k - X0 + 1 k)X





Nde(k + X0) H(X - Xmin) min + 1

∑k - X

aSBE

k)X

Putting the three enzymatic contributions into eq 1 and taking the steady state (which effectively means the end result of the development of the component of the CLD) yields eq 5, which is the mathematical form for that resulting component of the entire CLD from a given isoform-set. The overall CLD is then the sum of the components from each isoform-set, each of which obeys eq 5 with a given set of values of β, γ, Xmin, and X0. Nde(X - 1) - (1 + γ)Nde(X) - βNde(X)H(X - (Xmin + X0)) + ∞

β

∑ k)X

Nde(k + X0) H(X - Xmin) + β k - Xmin + 1



Nde(k + Xmin) H(X - X0) ) 0; k - X0 + 1

∑ k)X

aSBE aDBE β) , γ) aSS aSS

(5)

Equation 5 is an infinite set of linear simultaneous equations. While there does not appear to be any analytic solution to these, they are easily solved numerically by Gaussian elimination, a standard and readily available procedure (see Supporting Information for details). This gives Nde(X) for X ) 1, 2, ..., ∞ within an arbitrary normalization constant (which can be chosen, for example, to have the total number distribution add up to unity). DeVelopment of the Phase Diagram. For a given Xmin and X0, the component of a CLD arising from a single isoform-set is specified by two parameters, β and γ, and, hence, the CLD for this component can be expressed as a point on a plot with β and γ as axes: a type of phase diagram. This is merely a convenient way to represent data and does not imply any mechanistic inferences per se. This model, like any other mathematical modeling, is subject to appropriate physical constraints. The first is that the calculated Nde(X) should all be greater than zero: that the solutions be “physically possible”. The range of β and γ, which satisfies this constraint, as shown in the shading in Figure 4, is found by numerical solution; this range is not strongly dependent on Xmin or X0 over a physically reasonable range of these parameters (Figures S1 and S2, Supporting Information). The left-hand part of this lower bound is given by the line β ) γ, as is proved rigorously in our earlier treatment, which ignored the Xmin and X0 constraints (Supporting Information).12 The second physical constraint on parameter values is the requirement that the solution of eq 5 must be “physically accessible”: that a stable CLD can be attained as the long-time solution of the time development of the system. The region satisfying this constraint is found from the complete solution of the time-dependent equation, eq 1, which is given in terms of the eigenvalues λi of a matrix whose elements are the coefficients of eq 5 (a well-known mathematical procedure, given in detail in Supporting Information). This results in sums of terms of the form: ∞

Nde(X, t) )

∑e

λit

Ai(X)

i)1

where the Ai(X) are time-independent. If all of the eigenvalues are less than zero, then, although the CLD is stable in time within an arbitrary normalization factor, the number of total branches (which is the normalization) diminishes in time and eventually there are no branches left; this cannot lead to stable starch. If on the other hand one or more eigenvalues are greater than zero, then the CLD will be constantly

Figure 4. Ranges of the rate ratios β and γ for allowed solutions of eq 5 subject to the two mathematical constraints: physically possible (shaded) and physically accessible (thick black line). Calculated with Xmin ) 7 and X0 ) 6.

growing, which again cannot lead to a stable CLD for forming starch. If one or more eigenvalue is exactly zero and the rest are less than zero, then a long-term steady state is formed. For any give Xmin and X0, the region where all (numerically evaluated) eigenvalues are negative is found to be the same as that where all Nde(X) > 0 (the “physically possible” region). At least one eigenvalue is found to be positive in the region where at least one Nde(X) is less than zero (the “physically impossible” region). The region where one eigenvalue is zero must therefore be on the boundary separating these two regions; this boundary line is thus the only region where a steady state can be formed. This leads to the important conclusion that a stable amylopectin CLD can only be formed if the values of the rate ratios β and γ lie on the lower bound of the “physically possible” region. Indeed, CLDs calculated with values of the rate ratios away from the boundary do not resemble those observed experimentally, as exemplified in Figure S4: off the boundary, one finds CLDs with shorter branches, resembling amylopectin, but with a vastly greater number of branches at higher DPs, resembling amylose. This suggests a major constraint on what might be observed in nature: the inference that stable starch (i.e., starch containing amylopectin that can form crystalline lamellae) can only be formed in a system where the enzyme rate ratios lie on the boundary line. This infers that only crystalline-lamellae-forming amylopectin molecules can form whose CLDs are generated by rate parameters on the boundary, provided the assumptions leading to this inference are obeyed (as discussed later, the assumption that the rate ratios are truly constant is reasonable for amylopectin but may not hold for amylose, glycogen, and phytoglycogen). It is therefore expected that observed CLDs for amylopectin would lie close to the boundary in the phase diagram in Figure 4. (It is noted that the time-dependent solution used to obtain this conclusion opens the way to quantifying the short-time period in starch growth during which the steady state is attained using experimental methods such as that of Nielssen et al.,14 that is, this mathematical development provides the means whereby short-time data could be treated to provide additional information.) Some calculated Nde(X) from this relation (i.e., confined to the boundary) are given in Figure 5. Those on the right-hand part of the boundary line have CLDs typical of amylopectin. Those on the lefthand part have CLDs typical of amylose branches with progressively longer chains (an increase in DP at the maximum) as one progresses to the left along the boundary. The CLDs are presented both as the logarithm of the number distributions Nde(X) (as in Figure 3) and as the SEC distribution w(log X) ) X2 Nde(X) (as obtained using differential refractive index detection in SEC19). It is seen in Figures 3 and 5 that the calculated CLDs in the amylopectin region always have a shoulder (or sometimes a small subsidiary maximum) at DPs below the main maximum.

Genetic Constraints on Biosynthesis

Biomacromolecules, Vol. 11, No. 12, 2010

3543

Figure 6. Deconstructing the experimental CLD (squares) into components from each isoform-set, illustrated here for wheat from Figure 3 [note: although the distributions are presented logarithmically, the overall sum is of the Nde(X), not log Nde(X)]. The contribution from the first (1; red line and points), second (2; green line and points), and trans-lamella (3; orange line and points) components. (4) Sum of all three calculated components (blue line).

Figure 5. Calculated Nde(X) for representative values of β and γ for Xmin ) 10, X0 ) 4 as shown in the phase diagram (A). The data are presented both as number distributions (on a log scale, B, and, for some examples, as a number distribution as often conventionally plotted, C) as obtained with FACE, and as the SEC distribution w(logX), normalized to unity, as obtained with SEC (D).

It was found that X0 ) 4 provides a good fit for all Nde(X) considered in the current paper. The values of β and γ must both lie on the boundary line, which defines a function γ(β) for given values of Xmin and X0; this eliminates the values of γ from the fitting. The method used for fitting experimental CLDs starts with finding the best set of Xmin and β for the first range, wherein a single isoform-set contributes the most to the Nde(X). The CLD component calculated from this fit is then extrapolated to higher DPs and subtracted from the experimental CLD, which yields an approximate experimental CLD for contributions from the second and trans-lamella isoform-sets; this is then fit to yield a first estimate for Xmin and β for the second isoform-set. The β values for both the first and second isoform-sets (and their relative amounts, making three fitting parameters in all) are then least-squares refined together to fit the overall CLD; it is found that a good fit can be obtained with single values of each of Xmin and X0, which are thus not incorporated in the least-squares fitting. This is then repeated for the β value for the trans-lamella chains. There are thus five fitting parameters for the entire CLD: the values of β for each isoform-set (β1, β2, and β3) and the relative contributions of the three isoform-sets to the overall CLD, which because normalization is arbitrary, is equivalent to the ratios of the contributions of isoform-set 2 to that of 1, and of isoformset 3 to that of 1. The procedure is set out in detail in the Supporting Information. Each component is normalized by summing all components to give the optimal global fit to experiment. The normalization of each component gives the relative contribution of each isoform-set to the overall CLD. Figure 6 shows an example of this ‘decomposition’ of the entire CLD into the components from each of the three isoformsets.

Results and Discussion This shoulder arises mathematically because of the requirement that SBE cannot act on branches smaller than a certain length: confining SBE action to branches longer than Xmin creates a “haven” for short branches against loss from the branch-snipping action of SBE. This haven tends to increase the number of small chains, tending toward another maximum at X ) 1; propagation moves this to slightly higher X, resulting in the subsidiary shoulder or maximum. The calculated subsidiary shoulder or maximum diminishes and eventually disappears as Xmin becomes smaller or the branching rate decreases. The “X0 effect” simply shifts the starting DP of this subsidiary shoulder/maximum accordingly. Fitting Experimental Data. Nonlinear least-squares fitting is used to find values of the parameters X0, Xmin, β, and γ for each of the three sets of isoforms encompassing the whole CLD for a range of species.

Fitting the model, eq 5, to amylopectin data from diverse species shows good overall quantitative agreement with experiment (Figure 3). All features are quantitatively reproduced. The data cannot be adequately fitted by allowing off-boundary β and γ values to be included (the resulting CLDs have the wrong shape; see Figure S4 in the SI). Low-DP Shoulder. One significant and novel feature is that, as well as the main maximum, Figures 3 and 5 show that the model predicts a significant shoulder (sometimes a small subsidiary maximum) at lower X. A close inspection of published CLDs going down to sufficiently low DPs reveals the existence this hitherto unremarked feature, for example, for all species in Figure 3 (indicated by the arrows) and in CLDs

3544

Biomacromolecules, Vol. 11, No. 12, 2010

Wu and Gilbert

from wild-type and from mutant wheat20 and phytoglycogen.21 The Supporting Information gives more examples. (The calculated shoulder is usually more pronounced than seen experimentally. Letting the action of branching enzymes change from “all-or-nothing”, that is, a step function H at Xmin, to something more gradual diminishes this shoulder (see Figure S8), although not to the lower but still significant prominence of the shoulder seen in the experiment. This suggests that our model could be fine-tuned by adding a mechanism that selectively removes branches with low DPs, such as the glucan-trimming model.18) Is this low-DP shoulder an experimental artifact (especially because the feature occurs at the lowest molecular weights and thus near to a detection limit)? Possible origins of an artifact are now considered. (1) CLD data obtained by FACE (used for most of the experiments considered here) can be polluted at the lowest DPs by the marker employed,9,22 and for this reason published data often omit very low DPs. However, the shoulders seen experimentally before the main maximum cannot be ascribed to this. Marker “leaking” causes a minimum that indeed can be seen, for example, in Figure 3 for wheat at the very lowest DPs; this marker-derived artifactual minimum is at DPs below the observed shoulder. (2) The shoulder is also seen in HPAEC data (e.g., the Arabidopsis data shown in Figure 3, and for maize in Figure 8 of Rahman et al.23 given in the Supporting Information). HPAEC does not involve use of a marker. (3) Because it appears near the detection limit, this shoulder might arguably be an artifact in any particular data set; however, the fact that it is seen in all data examined in the literature exhibiting sufficiently low molecular weights (a range of which is cited above) supports the supposition that the feature is real. The observation of this shoulder provides the first in ViVo evidence to justify the mechanism that a minimal branch-length requirement for SBE being operative in plants in the field, a requirement which hitherto has been inferred from in Vitro studies (e.g., ref 13). The fitted value of Xmin for the first isoformset ranges from 7 to 10 (Figure 7), generally consistent with the results of in Vitro studies.13 Moreover, the observation of a feature-the small subsidiary shoulder-which has never been noticed before, but is in fact seen in all data examined, provides strong support for the validity of the model and the mechanistic assumptions on which it is based. As shown in the SI, if the “X0” effect observed for Arabidopsis (i.e., the branch remaining after the action of SBE must be more than X0 units long) is ignored (i.e., putting X0 ) 1), then the calculated shoulder extends to a lower DP than seen experimentally (indeed, is always a significant subsidiary maximum). This supports the X0 mechanism inferred from Arabidopsis experiments: that is, it applies not just to transientstarch systems (e.g., in leaves, where the starch synthesized during the day is broken down at night) but also in starchaccumulating systems (e.g., grains). Fitting Representative Data and Inferences from CLD Phase Diagrams. Figure 3 shows the results of fitting experimental amylopectin CLDs from the starch-accumulating species wheat, rice, barley, maize, and potato, and a transient-starch CLD from Arabidopsis. The model quantitatively reproduces the essential features of the CLD, including the main maximum, the small shoulder after this where a second isoform-set becomes significant (DP ∼ 16), and the larger shoulder indicating the trans-lamella branches (DP 32-34). Figure 3 also show that the model predicts a shoulder (or sometimes a subsidiary maximum) a few DP less than the global maximum. Importantly, this feature is seen experimentally (although not as pronounced

Figure 7. Top: values of the fitting parameters β and γ as a phase diagram for the different isoform-sets in the CLDs for the species in Figure 3. Bar charts show corresponding values β and Xmin (γ being a dependent variable specified by the boundary line). The three bars given for each species are, from left to right, the first, second and trans-lamella isoform-set. The phase diagram shows β and γ projected onto a single boundary line (for Xmin ) 10, X0 ) 4). Arabidopsis is not shown in the phase diagram, as its values of β for the first and second isoform-set are very different from those of the starchaccumulating plants, as apparent from the bar chart.

as the model predicts), as indicated by arrows in the experimental data of Figure 3. Values of the parameters β and Xmin obtained by fitting the experimental CLDs given in Figure 3 are given as bar charts in Figure 7; the calculated CLDs giving these fits are shown in Figure 3. The values of γ are not shown in the bar charts, because γ depends entirely on β through the boundary curve γ(β). The same data are also presented in Figure 7 as a phase diagram. As will be seen, the phase diagram gives a useful perspective for comparison. There is a slightly different phase diagram for each of the slightly different values of Xmin and X0 which give the best fits to the data (as given in the bar charts), but for simplicity the phase-diagram representation is given for a single Xmin, by shifting all the points vertically onto this single boundary line. The phase diagram is useful because it shows both β and γ values together and may bring out trends between starch structures and plant species (including mutants). The results of Figure 7 give rise to a number of possible inferences. As these data are only for a small number of species (albeit covering a diverse range), only a few points are noted here, without speculation on the underlying genetics, which must await more data to discern general trends. (i) Different isoform-

Genetic Constraints on Biosynthesis

Biomacromolecules, Vol. 11, No. 12, 2010

3545

Figure 8. Comparisons of mutants with their corresponding wild-type species. Wild-type wheat data from Figure 3, mutant wheat (SBEIIa) from ref 20 and wild-type and SBE mutant (Line 201) potato from ref 29 (all data except wild-type wheat were obtained using HPAEC). No data for mutant wheat above DP 35 were given in ref 20. The increase at the lowest DPs for WT and SBE mutant potato, obtained by HPAEC, may or may not be an experimental artifact.

sets for a given species may be well separated (wheat) or close together (rice) in the phase diagram. (ii) For the species examined here, the first isoform-set always has a greater relative branching rate (larger β) than the second range, which in turn is greater than that for trans-lamella branches. (iii) The behavior of the sole transient-starch species (Arabidopsis) is significantly different from that of the other species shown, which are all starch-accumulating: the transient-starch species has a much greater relative branching rate for the two single-lamella isoform-sets. It was shown above that the constraints on β and γ suggest that plants should have CLDs for amylopectin chains confined to the boundary of the physically allowed region in the (β, γ) phase diagram. Indeed, applying this restriction provides a good fit to the data for the diverse species considered here. The mathematics indicates that any mutation resulting in an amylopectin CLD lying off this boundary would not produce stable starch as observed in nature, which could well be an evolutionary disadvantage with regard to energy storage in a given ecological niche, compared to similar plants with native starch structure. Glucans with intermediate CLDs could well be found with (β, γ) between amylopectin and amylose regions, as indeed has been reported (e.g., 24 and 25). Mutant Data. Systematic study of mutants in comparison to their corresponding wild-type species allows in-depth interpretations of CLDs. Two important facts support the precepts of the model. First, the fit (Figure 8) reproduces the features observed in the SBEIIa mutant-wheat CLD. This is significantly different from that of its wild-type: the mutant CLD shows a more pronounced shoulder and a flatter global maximum. The fitted relative branching rate for isoform-set 1, β1, is decreased by a factor of ∼2 (Figure 9), while Xmin and γ for isoform-set 1 are close to those for the wild-type (Xmin stays the same, γ is 27% lower). The relative rates of DBE and SS processes must change slightly (following the boundary line), as could happen through formation of a complex with another enzyme,26 but the large change is in branching. This is as expected: the relative branching activity, β1, is reduced significantly, which supports the precepts of the model, and is found to affect mainly the first isoform-set. SBEIIa in wheat can be interpreted operate in the first range of wheat CLDs. An elevated branching rate in

Figure 9. Values of β and the relative contributions of isoform-sets 1 and 2 and corresponding Xmin, deduced from fitting the five CLDs in Figure 8.

the second range of SBEIIa mutant-wheat CLD (38% increase) suggests that SBEIIa sources its substrate from this DP range. (While there are some instances where loss of a particular isoform makes minimal difference to the CLD, there are many studies where mutations in different enzymes or isoforms affect the CLD in a very significant way: systematic studies showing the effect of knocking out different enzymes or isoforms to yield different chain-length distributions.20,27,28) Inhibition of SBE in potato mutant data, Figure 9, shows an increase in the relative component from isoform-set 2. Again, the quantitative fit afforded by our model supports the precept that an overall CLD is made up of multiple components synthesized by different isoform-sets. Further enzyme knockout may provide more information on these less prominent components. By knowing the means to alter the contribution of particular components, one could design a desired CLD. Figure S9 in the SI gives an example of what might be expected in species such as red algae and glaucophytes, where the number of isoform sets is less.30,31 Amylose, Phytoglycogen, and Glycogen. The present model is most applicable to amylopectin chain-length distributions. Amylose, phytoglycogen, and glycogen may also be incorporated in the basic model, but with a different type of steady state under some circumstances. Equation 5 is solved by assuming that the rates of the three enzymatic process, aSS, aSBE, and aDBE (propagation, branching, and debranching), are truly constant; these rates are the enzyme activity multiplied by the concentration of that enzyme and divided by the concentration of substrate. At first glance, this assumption appears questionable, because as the total amount of starch increases, the concentration of each enzyme would not increase proportionally. However, it is assumed in the model that amylopectin branches undergo crystallization as soon as the appropriate CLD is obtained; this would then physically exclude most of the enzymes from the crystalline structure, releasing them into the region where new branches are being formed. Hence, this assumption of constant enzyme concentration is probably quite adequate for amylopectin.

3546

Biomacromolecules, Vol. 11, No. 12, 2010

However, for amylose, glycogen, and phytoglycogen, where crystalline structures do not form, enzymes would be shared between a greater number of the corresponding substrates as the molecule grows larger. This implies that the rates of the three processes (aSS, aSBE, and aDBE) per substrate would decrease during the growth of an amylose or other noncrystalline molecule. This could be incorporated into a more general version of the model, for example one including a variable number of substrates. The form of eq 5 would be the same, but this equation would now be nonlinear because of these dependences of aSS, aSBE, and aDBE. Within the precepts of the model used for amylopectin, the presence of the debranching process (i.e., a nonzero value of γ) is necessary for formation of crystalline lamellae. However, it may well be the case that the debranching process is not essential for the biosynthesis of amylose, glycogen, and phytoglycogen, and indeed, the nonlinearity of the preceding paragraph would result in a different steady-state condition; a steady state could then be achieved without the presence of the debranching process. Moreover, the model is probably inapplicable to mutant systems where all DBEs have been completely knocked out (e.g., ref 32).

Conclusion Adapting methods familiar in interpreting molecular weight distribution data in synthetic polymers to the branching distribution in a natural polymer has revealed new understanding of starch biosynthesis. The applicability of the assumptions made and the precepts in the model are supported by (i) quantitative fitting of all features in amylopectin data for a wide range of starch-accumulating plants, (ii) the prediction of a shoulder before the main maximum in CLDs, a feature that has never been pointed out before but can in fact be distinguished in a wide range of experiments, and (iii) fitting mutant data shows the expected trends in rate parameters (silencing a branching enzyme greatly reduces the fitted relative branching rate). The model shows that chain-length distributions for a given plant variety can be represented as points on a phase diagram, giving the values9 of the relative rates of branching to propagation and of debranching to propagation. General requirements on the solutions of the equations for the CLD show that the two rate ratios for crystalline amylopectin branches can only lie on the boundary line in this phase diagram, as seen both by fitting experimental data and by mathematical considerations of the requirements that a stable CLD in a plant must be attained over time. Presenting CLD data as points on a phase diagram can pinpoint kinetic differences in variant lines of a given plant variety, or between different species, and as a quantitative tool for comparisons and classifications between CLDs. This suggests that it might eventually be used for systematic data fitting and studying starch biosynthesis in a quantitative manner. The resulting information may well provide quantitative prediction of the effects of mutations on the CLD. The sets of values of β and other fitting parameters (presented here as bar charts) for each isoform-set provides a numerical comparison between mutants. Modeling the molecular weight distribution of starch branches has potential as a tool for biotechnologists and plant breeders to create, select, and quantify organisms with beneficial properties. It suggests which isoforms could be targeted for genetic modification aiming to create viable plants with more intermediate chain lengths for high resistant-starch content, as is already

Wu and Gilbert

appreciated by workers in the area (e.g., refs 33 and 34). Moreover, the CLDs predicted by the range of β and γ that lie along the boundary show what chain length distributions could possibly be achieved while maintaining the crystalline lamellae necessary for plant viability. A long-term possibility is targeted genetic modification aiming at specific structural characteristics within the limitation that the amylopectin CLDs lie close to the boundary line. Acknowledgment. We thank the Australian Research Council (DP0986043) for financial support, an anonymous reviewer for bringing ref 14 to our attention, Dr. Matthew Morell, Dr. Steven Ball, and Professor Mike Gidley for insightful discussions, and also Professor Ian Godwin, Dr. Jovin Hasjim, Dr. David Stapleton, Mitch Sullivan, and Mitch Gooding. Dr. Rosa Paula Cuevas (International Rice Research Institute, Manila) kindly made available raw FACE data for the CLD of IR5 rice starch. Supporting Information Available. Mathematical details of the model, including details on numerical evaluation; HPAEC data showing shoulder; dependences on X0 and Xmin; model single isoform-set data. This material is available free of charge via the Internet at http://pubs.acs.org.

References and Notes (1) Monteiro, M. J.; Hodgson, M.; De Brouwer, H. J. Polym. Sci., Part A: Polym. Chem. 2000, 38, 3864–3874. (2) Thickett, S. C.; Gilbert, R. G. Macromolecules 2005, 38, 9894–9896. (3) Buleon, A.; Colonna, P.; Planchot, V.; Ball, S. Int. J. Biol. Macromol. 1998, 23, 85–112. (4) Thompson, D. B. Carbohydr. Polym. 2000, 43, 223–229. (5) Delatte, T.; Trevisan, M.; Parker, M. L.; Zeeman, S. C. Plant J. 2005, 41, 815–830. (6) Ball, S. G.; Morell, M. K. Ann. ReV. Plant Biol. 2003, 54, 207–233. (7) Nakamura, Y. Plant Cell Physiol. 2002, 43, 718–725. (8) Myers, A. M.; Morell, M. K.; James, M. G.; Ball, S. G. Plant Physiol. 2000, 122, 989–997. (9) Morell, M. K.; Samuel, M. S.; O’Shea, M. G. Electrophoresis 1998, 19, 2603–2611. (10) Castro, J. V.; Dumas, C.; Chiou, H.; Fitzgerald, M. A.; Gilbert, R. G. Biomacromolecules 2005, 6, 2248–2259. (11) Zhang, G.; Sofyan, M.; Hamaker, B. R. J. Agric. Food Chem. 2008, 56, 4695–4702. (12) Gray-Weale, A.; Gilbert, R. G. J. Polym. Sci., Part A: Polym. Chem. 2009, 47, 3914–3930. (13) Guan, H.; Li, P.; Imparl-Radosevich, J.; Preiss, J.; Keeling, P. Arch. Biochem. Biophys. 1997, 342, 92–98. (14) Nielsen, T. H.; Baunsgaard, L.; Blennow, A. J. Biol. Chem. 2002, 277, 20249–20255. (15) Deschamps, P.; Colleoni, C.; Nakamura, Y.; Suzuki, E.; Putaux, J.L.; Buleon, A.; Haebel, S.; Ritte, G.; Steup, M.; Falcon, L. I.; Moreira, D.; Loffelhardt, W.; Raj, J. N.; Plancke, C.; d’Hulst, C.; Dauvillee, D.; Ball, S. Mol. Biol. EVol. 2008, 25, 536–548. (16) Wattebled, F.; Planchot, V.; Dong, Y.; Szydlowski, N.; Pontoire, B.; Devin, A.; Ball, S.; D’Hulst, C. Plant Physiol. 2008, 148, 1309–1323. (17) Gilbert, R. G. Emulsion Polymerization: A Mechanistic Approach; Academic: London, 1995. (18) Ball, S.; Guan, H.-P.; James, M.; Myers, A.; Keeling, P.; Mouille, G.; Buleon, A.; Colonna, P.; Preiss, J. Cell 1996, 86, 349–352. (19) Castro, J. V.; Ward, R. M.; Gilbert, R. G.; Fitzgerald, M. A. Biomacromolecules 2005, 6, 2260–2270. (20) Regina, A.; Bird, A.; Topping, D.; Bowden, S.; Freeman, J.; Barsby, T.; Kosar-Hashemi, B.; Li, Z.; Rahman, S.; Morell, M. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 3546–3551. (21) Dauvillee, D.; Colleoni, C.; Mouille, G.; Morell, M. K.; d’Hulst, C.; Wattebled, F.; Lienard, L.; Delvalle, D.; Ral, J. P.; Myers, A. M.; Ball, S. G. Plant Physiol. 2001, 125, 1723–1731. (22) O’Shea, M. G.; Samuel, M. S.; Konik, C. M.; Morell, M. K. Carbohydr. Res. 1998, 307, 1–12. (23) Rahman, A.; Wong, K.-S.; Jane, J.-L.; Myers, A. M.; James, M. G. Plant Physiol. 1998, 117, 425–435. (24) Perera, C.; Lu, Z.; Sell, J.; Jane, J. Cereal Chem. 2001, 78, 249–256. (25) Vilaplana, F.; Gilbert, R. G. Macromolecules 2010, 43, 7321–7329.

Genetic Constraints on Biosynthesis (26) Tetlow, I. J.; Morell, M. K.; Emes, M. J. J. Exp. Bot. 2004, 55, 2131– 2145. (27) Cuevas, R. P.; Daygon, D.; Morell, M.; Gilbert, R. G.; Fitzgerald, M. A. Carbohydr. Polym. 2010, 81, 120–127. (28) Regina, A.; Kosar-Hashemi, B.; Ling, S.; Li, Z.; Rahman, S.; Morell, M. J. Exp. Bot. 2010, 61, 1469–1482. (29) Schwall, G. P.; Safford, R.; Westcott, R. J.; Jeffcoat, R.; Tayal, A.; Shi, Y.-C.; Gidley, M. J.; Jobling, S. A. Nat. Biotechnol. 2000, 18, 551–554. (30) Deschamps, P.; Guillebeault, D.; Devassine, J.; Dauvillee, D.; Haebel, S.; Steup, M.; Buleon, A.; Putaux, J. L.; Slomianny, M. C.; Colleoni, C.; Devin, A.; Plancke, C.; Tomavo, S.; Derelle, E.; Moreau, H.; Ball, S. Eukaryotic Cell 2008, 7, 872–880.

Biomacromolecules, Vol. 11, No. 12, 2010

3547

(31) Plancke, C.; Colleoni, C.; Deschamps, P.; Dauvillee, D.; Nakamura, Y.; Haebel, S.; Ritte, G.; Steup, M.; Buleon, A.; Putaux, J. L.; Dupeyre, D.; d’Hulst, C.; Ral, J. P.; Loffelhardt, W.; Ball, S. G. Eukaryotic Cell 2008, 7, 247–257. (32) Wattebled, F.; Dong, Y.; Dumez, S.; Delvalle, D.; Planchot, R.; Berbezy, P.; Vyas, D.; Colonna, P.; Chatterjee, M.; Ball, S.; D’Hulst, C. Plant Physiol. 2005, 138, 184–195. (33) Yao, Y.; Thompson, D. B.; Guiltinan, M. J. Plant Physiol. 2004, 136, 3515–3523. (34) Klucinec, J. D.; Thompson, D. B. Cereal Chem. 2002, 79, 1923.

BM1010189