Lumry−Eyring Nucleated-Polymerization Model of Protein Aggregation

Apr 15, 2009 - Citation data is made available by participants in Crossref's Cited-by Linking service. For a more comprehensive list of citations to t...
0 downloads 0 Views 2MB Size
7020

J. Phys. Chem. B 2009, 113, 7020–7032

Lumry-Eyring Nucleated-Polymerization Model of Protein Aggregation Kinetics. 2. Competing Growth via Condensation and Chain Polymerization Yi Li and Christopher J. Roberts* Department of Chemical Engineering, Colburn Laboratory, 150 Academy Street, UniVersity of Delaware, Newark, Delaware 19716 ReceiVed: September 18, 2008; ReVised Manuscript ReceiVed: January 17, 2009

The Lumry-Eyring with nucleated polymerization (LENP) model from part 1 (Andrews, J. M.; Roberts, C. J. J. Phys. Chem. B 2007, 111, 7897-7913) is expanded to explicitly account for kinetic contributions from aggregate-aggregate condensation polymerization. Experimentally accessible quantities described by the resulting model include monomer mass fraction (m), weight-average molecular weight (Mw), and ratio of Mw to number-average molecular weight (Mn) as a function of time (t). Analysis of global model behavior illustrates ways to identify which steps in the overall aggregation process are kinetically important on the basis of the qualitative behavior of m, Mw, and Mw/Mn vs t, and on whether bulk phase separation or precipitation occurs. For cases in which all aggregates remain soluble, moment equations are provided that permit straightforward numerical regression of experimental data to give separate time scales or inverse rate coefficients for nucleation and for growth by chain and condensation polymerization. Analysis of simulated data indicates that it may be possible to neglect condensation reactions if only early time data are considered and also highlights difficulties in conclusively distinguishing between alternative mechanisms of condensation, even when kinetics are monitored with both m and Mw. 1. Introduction Non-native aggregation commonly refers to the process of forming protein aggregates in which the constituent monomers have significantly altered secondary structure compared to the native or folded state.1-3 Aggregates may be soluble or insoluble, with soluble aggregates potentially ranging in size from dimers to so-called high molecular weight species (∼10-103 or more monomers per aggregate).4-6 Formation of non-native aggregates is problematic for protein-based pharmaceuticals and other biotechnology products due to increased manufacturing costs, regulatory concerns, and product marketability.3,6-8 Non-native aggregates are also implicated in a number of chronic diseases9,10 and are suspected immunogenic agents in biopharmaceuticals.11,12 Because non-native aggregation (hereafter referred to simply as aggregation) is typically net-irreversible under the conditions that aggregates form, elucidating key mechanistic details that control aggregation kinetics is of general importance for these systems. However, even apparently simple experimental kinetics can be a convolution of multiple stages.2 These may include (partial) monomer unfolding, reversible self-association or prenucleation, nucleation of the smallest irreversible aggregates, and subsequent aggregate growth via chain polymerization or aggregate self-association or phase separation. Furthermore, many of the kinetically relevant intermediates are often too poorly populated or transient to be directly characterized with available experimental methods.2,6,13,14 As a result, proper deconvolution of different stages of the aggregation process requires qualitative and quantitative comparison with mechanistic mathematical models that are couched in experimentally accessible quantities, such as mass-percent loss of monomer and time-dependent scattering data.2,4,15-19 * Corresponding author. Phone: 302-831-0838. Fax: 302-831-1048. E-mail: [email protected].

A large majority of available mathematical models for aggregation kinetics can be categorized in terms of which stage or stages in the overall aggregation process that they treat explicitly or implicitly. Currently, no available model treats all of the above stages with equally detailed descriptions for natively folded proteins. Rather, most models fall into one of two categories.2 Those in the spirit of Lumry-Eyring treatments primarily consider only unfolding and folding in mechanistic detail and use phenomenological or empirical treatments for assembly steps. Alternatively, polymerization models typically ignore conformational transitions and treat only assembly steps in detail.2 A previous report20 presented a first-generation LumryEyring nucleated polymerization (LENP) model that included thermodynamics of monomer conformational stability and prenucleation, along with dynamics of nucleation and of growth via chain polymerization. It is also one of only two models20-22 that consider the effects of aggregate (in)solubility on experimental kinetics of monomer loss or soluble aggregate size distributions. In this context, soluble aggregates are those that are available to consume additional protein monomers,15-24 rather than being defined in terms of a particular size range.25 The previous LENP model did not include detailed treatments of the kinetics and mechanism of aggregate-aggregate coalescence or condensation leading to soluble or insoluble aggregates.2,20,22 Incorporating details of condensation is important if one is interested in quantifying the resulting aggregate size distribution, but it can lead to significant added mathematical complexity.26 This may explain why, for non-native protein aggregation, there are relatively few experimentally tested kinetic models that describe condensation in considerable detail, and those models have typically been system-specific. For example, Pallitto and Murphy incorporated size-dependent, diffusion-limited lateral and end-to-end association to describe soluble filament and insoluble fibril formation based

10.1021/jp8083088 CCC: $40.75  2009 American Chemical Society Published on Web 04/15/2009

LENP Model of Protein Aggregation Kinetics

J. Phys. Chem. B, Vol. 113, No. 19, 2009 7021

TABLE 1: List of Key Symbolsa name Aj Ax aj C0 Cref KIU Ki KNI KRA ka ka,x kB kd kd,x kg knuc kobs ν δ kr kr,x ki,j κi,j τg τn τc

definition b

agg composed of j monomers nucleusb [Aj]/C0 initial monomer concentrationb std state monomer concnb eq. const for I T U eq. const for iR T Ric eq. const for N T I eq. const for Aj + R T AjRd monomer assoc rate coeffe ka for nucleation stepe Boltzmann’s constantf dissociation rate coeffg kd for Rx-1 + R T Rxg growth rate coeffe nucleation rate coeffg observed rate coeff for monomer lossg apparent reaction order for monomer loss no. monomers added per chain polymerization step rearrangement rate coeffg kr for nucleation stepg condensation rate coeffe ki,j/kx,x characteristic time scale of chain polymerization characteristic time scale of nucleation characteristic time scale of condensation polymerization

name N I U R fR n* Ri Rx x m Mnagg Mmon Mwagg βgn βcg σ λ1 λ2 θ µ σµ κjn κjw τg(0) τn(0) τc(0)

definition b

native monomer intermediate state (monomer)b unfolded state (monomer)b reactive monomerb fraction reactive monomer size at which precipitation occurs reversible oligomer of i monomersb reversible prenucleusb nucleus stoichiometry ([N] + [I] + [U])/ C0 number-av aggregate Mwh monomer Mwh weight-av aggregate Mwh τn/τg τg/τc Σ[Aj]/C0 first moment of soluble agg. size distribution second moment of soluble agg. size distribution dimensionless time (t/τn) mean of aj/σ distribution variance of aj/σ distribution number average κi,j weight average κi,j τg at Cref and fR ) 1 τn at Cref and fR ) 1 τc at Cref

a Abbreviations: agg ) aggregate, aggn ) aggregation, concn ) concentration, const ) constant, eq. ) equilibrium, Mw ) molecular weight, unf. ) unfolding. b mol/volume. c (mol/volume)1-i. d (mol/volume)-1. e (mol/volume)-1 · time-1. f energy/K. g time-1. h mass · mol-1.

on a priori knowledge of stoichiometry and geometry in Aβ aggregation.16 In simpler treatments, Modler et al.17 and Speed et al.18 considered irreversible condensation polymerization to form soluble aggregates, with rate coefficients that were assumed to be independent of polymer size (degree of polymerization). In each case, kinetic models were regressed against timedependent measurements of one or more aspects of the aggregate size distribution; for example, weight-average molecular weight16-18 or z-average hydrodynamic radius.16 Condensation was determined to be an important or even dominant contribution in each case. However, in each case, the models were developed for only a specific protein system without considering global model behavior. Furthermore, it is also common practice to fit monomer loss data to models in which condensation is inherently neglected,15,23,27,28 even though corroborating structural evidence to support such an assumption may be available in only a fraction of reported cases.2,6 Overall, this highlights a need for more general analysis of aggregation kinetics within a mechanistic framework that can easily distinguish which contributions are important, and that can also provide a means to quantify those contributions by regression against experimental kinetics. The present report extends the previous LENP model to include explicit and detailed descriptions of condensation. Particular questions that are addressed include (1) Which experimental signatures easily allow one to qualitatively determine whether neglecting condensation20,23,24,27-29 is appropriate? (2) Can one quantitatively separate contributions from condensation, chain polymerization, and nucleation without detailed a priori knowledge16 of the association mechanism or aggregate morphology? (3) How sensitive are experimentally accessible kinetics to mechanistic details, such as size-dependent vs sizeindependent condensation steps? (4) How are the answers for questions 1-3 altered if one considers only early time data (i.e., only the first few percent loss of monomer)? These questions

are important for deconvoluting the effects of chemical additives or protein stabilization strategies on different stages of aggregation,2,30-32 inferring mechanistic details of aggregateaggregate assembly16 and in applications such pharmaceutical product stability that typically focus on only small extents of reaction or percent loss of monomer.3,6 Finally, this report provides the global behavior of the improved LENP model and illustrates an application of the model to experimental data using recently reported results for aggregation of R-chymotrypsinogen A (aCgn).5 2. Model Description and Derivations Table 1 summarizes key symbols and definitions used throughout this report. Figure 1 schematically shows the six stages of non-native aggregation that are included in the model developed and analyzed here. Stages 1-4 are the same as those employed in the previous LENP model.20 Briefly, the six stages in Figure 1 are as follows: (1) Conformational transitions of monomers between folded (F) and unfolded (U) states, with the possibility for stable folding intermediates (I). The monomer conformational state (e.g., F, I, or U) that is most prone or reactive with respect to aggregation is denoted R. (2) Association of R monomers to form reversible prenuclei or oligomers (Ri) composed of i molecules. (3) Nucleation of the smallest aggregate species that is effectively irreversible (Ax) by a conformational rearrangement step (Rx f Ax).16,20 (4) Growth of soluble aggregates via chain polymerization. (5) Soluble aggregate growth due to aggregate-aggregate association, such as condensation polymerization.5,16 (6) Removal of aggregates via phase separation to form macroscopic particles or precipitates.21,33,34 In stage 6, all aggregates composed of n* or more monomers are treated as insoluble.20-22 As in the previous report,20 stages 1 and 2 are assumed to be fast and, thus pre-equilibrated compared to stages 3-6. As a result, only equilibrium constants for unfolding (KFI, etc.) and

7022

J. Phys. Chem. B, Vol. 113, No. 19, 2009

Li and Roberts defines the concentration scale of the standard state for association free energies and equilibrium constants. The respective intrinsic time scales (denoted with superscript (0)) are defined x-1 -1 as τn(0) ≡ (knucKx-1Cref ) , τg(0) ≡ (kgKRAδ-1Crefδ)-1, and τ(0) c ≡ -1 (kx,xCref) . They are termed intrinsic because they are independent of initial monomer concentration and the free energy of monomer conformational transitions. kg ≡ kakr/(kd + kr) is the effective rate coefficient for chain polymerization, and knuc ≡ ka,xkr,x/(kd,x + kr,x) is that for nucleation.20 The above definitions along with the derivations elsewhere20 and in the Appendix show that although there are numerous parameters in Figure 1 and Table 1, the assumptions of preequilibration for stages 1 and 2 and local steady state for stages 3 and 4 reduce the total to only seven distinguishable parameters or functions: τn and x account for stages 1, 2, and 3; τg and δ account for stage 4; and n* accounts for stage 6. Stage 5 is accounted for by τc and κi,j ≡ ki,jC0τc.κi,j may be a function of i and j, but its (i, j) dependence is uniquely set by the choice of mechanistic model describing size-dependent condensation (see also below and Section 2.3). Therefore, there are six adjustable model parameters once the condensation mechanism is selected. The Appendix provides additional details regarding derivations of the kinetic working equations for monomer and all soluble aggregates. Equations A1, A4, and A5 are the dynamic material balances based on Figure 1 and mass-action kinetics. They can be rewritten in nondimensional form by defining θ ) t/τn, βgn ) τn/τg, and βcg ) τg/τc to give

dm ) -xmx - δβgnmδσ dθ Figure 1. Reaction scheme with associated model parameters for the six key stages in the LENP model. The steps shown in each panel are treated as elementary, irreversible (single arrow) steps or as preequilibrated or steady state (double arrow) when translating them to mass-action kinetic equations.

prenucleation (Ki, i ) 2, ..., x - 1) appear in stages 1 and 2, respectively. The kinetics of conformational rearrangement as part of nucleation in stage 3 are treated by assuming a concerted, unimolecular rate-limiting step with rate coefficient kr,x.20 The balance of rearrangement (Rx f Ax) and association (R + Rx-1 f Rx) steps in stage 3 is treated with a local steady-state approximation. For association, ka,x and kd,x denote forward and reverse rate coefficients. Similar considerations and nomenclature are included for growth via chain polymerization (stage 4).20 R monomers can reversibly self-associate with pre-existing soluble aggregates, followed by a conformational rearrangement step that makes monomer addition effectively irreversible. The rate coefficients ka, kd, kr and equilibrium constant KRA in stage 4 are the same as in the earlier LENP model.20 In stage 5, ki,j denotes the rate coefficient for irreversible association of aggregates composed of i and j monomers to form a soluble aggregate of i + j monomers. Stage 6 is effectively instantaneous phase separation of any aggregate that contains n* or more monomers. 2.1. LENP Model Equations. The following derivations are based on the reaction scheme in Figure 1, and employ the same nomenclature as previous work20 to the extent possible here. Characteristic time scales are defined for nucleation (τn ≡ τn(0)fR-x(Cref/C0)x-1), growth via monomer addition (τg ≡ -δ (0) δ τ(0) g fR (Cref/C0) ), and condensation (τc ≡ (τc Cref)/(C0)) (see also Appendix). In these definitions, fR )[R]/([N] + [I] + [U]) is the mole fraction of monomer that is in the aggregation-prone conformational state. Cref is a reference state concentration that

(1)

n*-1 dax κx,jaj ) mx - βgnaxmδ - βcgβgnκx,xax2 - βcgβgnax dθ j)x (2)



dai dθ

|

xca. 10) were used. The large-x fits were clearly inferior to the small-x fits, but it was not possible to further distinguish a best-fit x value. This is not unexpected on the basis of previous analysis, which showed reliable determination of x values required kinetic data over a relatively wide range of initial protein concentrations (C0).20 For concreteness, the results in Figure 6 are for x ) 6, the same value of x used to generate the simulated data from eqs 1-3. More generally, this result highlights inherent difficulties in determining nucleus size from data regression vs kinetic models when the data are available at only one or a small range of C0 values. The results in Figure 6A show that regression against eqs 7-9 provides accurate parameter values for a given set of m(t) and Mwagg(t) data. This includes conditions under which condensation is negligible (βcg , 1) and under which it is the dominant mode of growth (βcg . 1). In all cases, the accuracy of fitted parameters was within 5% of the true values, R2 values were >0.99, and residuals were small and evenly distributed. In contrast, Figure 6B shows that fitting with a model in which condensation is neglected clearly produced poor fits and

LENP Model of Protein Aggregation Kinetics

Figure 6. Comparison of values for τg (gray), τn (white), and τc (black) obtained by regression of eqs 7-9 against simulated experimental data from eqs 1-3 (see text for additional details). (A) κjn ) κjw ≡ 1 in both simulated data and fits; simulated data span four half-lives. (B) Same as A, but fits assumed τc f ∞ to imitate condensation-free models. (C) Same as B, but with simulated data sets truncated at the extent of reaction indicated by the label beside each set of bars. Error bars represent 95% confidence intervals from nonlinear least-squares fits.

inaccurate fitted parameter values under conditions when condensation is appreciable (βcg ∼ 1) or dominant (βcg . 1). Figure 6C illustrates instead that if one is able to consider sufficiently early time conditions (m f 1), it is possible to obtain reasonably accurate values of τg and τn with a model that neglects condensation. No values of τc are shown because τc f ∞ for the fits in Figure 6C. The labels above each data set in Figure 6C indicate the value of m at which the data were truncated for fitting. The truncation m value for a given data set was selected as the point at which the polydispersity first rose above a threshold value of Mwagg/Mnagg ) 1.1 (cf. Figure 2D and discussion below). The results in Figure 6C are perhaps not surprising because the initial conditions considered here are ones in which aggregates are not present and because condensation rates are proportional to the square of the total aggregate concentration (i.e, σ2), whereas chain polymerization rates are linear in σ. Thus, condensation rates do not become appreciable until larger amounts of monomer have been consumed to create new aggregates. One can reach the same

J. Phys. Chem. B, Vol. 113, No. 19, 2009 7029 conclusion via an analytical perturbation solution (results not shown), such as applied previously to a condensation-free model.23 The above arguments notwithstanding, even with early time data, it is not possible to deconvolute τg and τn unless both m(t) and Mwagg(t) data are employed. In practical terms, it is unlikely that one will know a priori whether experimental data are collected for sufficiently early times to ensure condensation can be neglected. The results in Figure 6C, when compared to those in Figure 2D, support the empirical practice of considering condensation to be negligible if the sample polydispersity remains relatively low (Mwagg/Mnagg ∼ 1.1-1.2).4,5 The results in Figure 2C suggest an additional criterion for neglecting condensation is that Mwagg scales linearly with (1 - m). Ideally, however, it seems most prudent to instead consider models that include growth via both monomer addition and aggregate-aggregate condensation when attempting to regress accurate and mechanistically sound parameter values from experimental kinetics. An example of this approach applied to experimental data for aCgn aggregation is provided below (see section 3.3). For simplicity, all preceding examples in this section used only the case of size-independent rate coefficients for condensation (κi,j ) 1). From a practical standpoint, it also is often convenient to assume size-independent condensation so as to reduce the computational burden and complexity of models for regression.17,18,50 Furthermore, it is not clear a priori that typical experimental kinetic measurements provide sufficient information to reliably distinguish between different condensationmediated growth mechanisms. This motivates the question, can experimental m(t) and Mwagg(t) data robustly distinguish between different models for condensation-mediated growth? To address this question, eqs 7-10 were solved with a simple diffusion-limited Smoluchowski model for κi,j (cf. Section 2) to provide simulated kinetic data that were then regressed against eqs 7-9 with the size-independent condensation model used above. Illustrative results are shown here for simulated data (size-dependent κi,j) with βgn ) 1000, βcg ) 1, 10, 20. Figure 7A shows results for βcg ) 20. The size-independent model provided excellent fits to size-dependent simulated data in all cases, with R2 > 0.99 and small, evenly distributed residuals (not shown). Despite the seemingly high quality fit for m and Mwagg in Figure 7A, the true value of κjn increases dramatically as aggregation proceeds, although κjw remains reasonably close to 1 throughout (data not shown). Thus, although the sizeindependent model fits the simulated {m, Mw} data well to within the precision of typical experimental data, the fitted value for τc is only a rough approximation to its true value. Figure 7B further shows that for βcg ) ∼10 or higher, deviations are found not only in τc, but also in all three fitted parameters (τg, τn, τc). Thus, although the fits appeared to be good in all test cases, the fitted values of (τg, τn, τc) were inaccurate except when condensation was not dominant over chain polymerization (βcg ∼ 1 or smaller). The last two columns in Figure 7B are for fits using a size-independent model of condensation, but with data truncated at low extents of reaction. In this case, accurate (τg, τn, τc) were obtained even when condensation is dominant (high βcg). Intuitively, this is reasonable because at low extents of reaction, the aggregate size distribution will lie relatively close to the nucleus size (x), and the assumption that all ki,j values are the same as kx,x is reasonable. The above results clearly illustrate that aggregation kinetics monitored experimentally in terms of m and Mw can qualitatively identify whether condensation steps are appreciable but that

7030

J. Phys. Chem. B, Vol. 113, No. 19, 2009

Figure 7. (A) Representative simulated aggregation kinetics (symbols) with size-dependent condensation (eqs 7-14 βgn ) 1000, βcg ) 20, x ) 6, δ ) 1); curves are fits to the size-independent model (eqs 7-9, with κjn ) κjw ≡ 1). (B) Comparison of fitted τg (gray), τn(white), and τc(black) values from the size-independent model versus the true values, based on results analogous to panel A but for a range of βcg values. Asterisks indicate simulated data sets that were truncated at m ) 0.95 before regression (see also details in text).

obtaining good fits to a kinetic model will not necessarily provide fitted parameter values that accurately reflect the true values for the system. Of course, true values of model parameters cannot be known a priori for an experimental system, and so it would not be possible to statistically distinguish these mechanisms in such a situation. As a result, it cannot be generally concluded that m and Mw kinetic data on their own will be sufficient to conclusively distinguish between alternative models for aggregate condensation. Preliminary results (not shown) indicate that this limitation might be overcome if one can experimentally measure higher moments of the distribution, as well as if one can accurately quantify sample polydispersity. In practice, this may remain an outstanding challenge because these quantities are difficult if not impossible to accurately quantify with currently available commercial equipment for the typical size ranges of soluble protein aggregates (∼1-102 nm). Qualitatively, however, it may be possible to distinguish between different condensation mechanisms with information regarding aggregate morphology. For example, different types of condensation mechanisms may result in aggregates with different characteristic fractal structures.51 In such cases, this argues for the importance of using additional data, such as aggregate structure or morphology, when elucidating mechanistic details of aggregation.16,51 3.3. LENP Model Applied to Aggregation of aCgn. Figure 8 illustrates fits of the LENP model (eqs 7-10) to experimental aggregation kinetics for R-chymotrypsinogen A (aCgn) monitored by size-exclusion chromatography with inline static laser light scattering.5 The data are from two different solution conditions (summarized in the figure caption; additional details

Li and Roberts

Figure 8. (Adapted and reproduced with permission from ref 5.) Illustrative fits of the LENP model to two cases of experimental aggregation kinetics for aCgn. For both cases, the protein concentration (c0) is 1 mg mL-1 aCgn, and buffer conditions are pH 3.5, 10 mM sodium citrate buffer. The conditions differ in terms of incubation temperature and NaCl concentration: 60 °C with no NaCl (squares); 50 °C with 0.1 M NaCl (triangles). The curves are best fits from leastsquares regression vs eqs 7-10b with a size-independent condensation mechanism. The best-fit parameter values5 for the first data set (squares) are x ) 3, δ ) 1, τg ) 0.1 ( 0.01 min, τn ) 103 ( 102 min, and τc > 1012 min; the corresponding parameter values for the second data set (triangles) are x ) 3, τg ) 0.8 ( 0.1 min, τn ) 500 ( 200 min, and τc ) 0.1 ( 0.01 min. Panels A and B show the same data in two different formats for easier comparison of the qualitative features from simulated data in Figures 2 and 3. The open symbols in panel A are Mwagg values from light scattering; the filled symbols are the corresponding m values from chromatography. Details of the experimental protocols are given elsewhere.5

in ref 5), and are plotted in the same format as Figures 2 and 3. In both cases, the aggregates are soluble throughout the experimental time scale, and therefore, n* f ∞ for fitting with the LENP model. As was done in Section 3.2, τn, τg, and τc were regressed for a range of integer values of δ and x to obtain the best least-squares fits to m(t) and Mwagg(t) data simultaneously. The best-fit values for each case, along with 95% confidence intervals are given in the caption to Figure 8. Qualitative comparison with Figures 2 and 3 shows that the selected conditions correspond to types II (squares) and Ia (triangles) behavior. The qualitative features for the type Ia conditions cannot be produced without including condensation steps in the model (stage V, Figure 1): for example, the pronounced upturn of Mwagg vs 1 - m in Figure 8B and a concomitant, large increase in polydispersity5 (results not shown here). In quantitative terms, the best fit parameter values give βgn ∼ 103 in both cases. They give βcg ∼ 10 and βcg , 1, respectively, for the type Ia and II cases. These results are qualitatively and quantitatively consistent with the analysis and discussion in Section 3.1. Finally, the different experimental conditions for aCgn in Figure 8 correspond to aggregates with qualitatively different morphology; the aggregates for the type II conditions in Figure 8 are linear polymers,4,5 and those for the type Ia conditions are more globular and compact.5 These

LENP Model of Protein Aggregation Kinetics

J. Phys. Chem. B, Vol. 113, No. 19, 2009 7031

morphological differences are consistent with qualitative differences in growth mechanisms for limiting cases Ia and II in the LENP model; however, they do not provide sufficient information to discern additional details of the condensation mechanism (e.g., size-dependent vs size-independent ki,j). A more global search of solution conditions that give rise to behaviors other than types II and Ia for aCgn is currently underway and will be included as part of a future report.

n*-1 dax kx,jC0aj) ) mx /τn - axmδ /τg - kx,xC0ax2 - (ax dt j)x (A2)



dai dt

|

) (ai-δ - ai)mδ /τg - ki,iC0ai2 -

x