Model-Based Thermodynamic Analysis of Reversible Unfolding

Jun 11, 2010 - piece-by-piece verification of the proposed thermodynamic model using individual fits is frequently inappropriate and can result in an ...
0 downloads 0 Views 2MB Size
J. Phys. Chem. B 2010, 114, 8713–8722

8713

Model-Based Thermodynamic Analysis of Reversible Unfolding Processes Igor Drobnak,* Gorazd Vesnaver, and Jurij Lah* Faculty of Chemistry and Chemical Technology, UniVersity of Ljubljana, AsˇkercˇeVa 5, 1000 Ljubljana, SloVenia ReceiVed: January 19, 2010; ReVised Manuscript ReceiVed: April 29, 2010

Folding and unfolding of many biological macromolecules can be characterized thermodynamically, yielding a wealth of information about the stability of various conformations and the interactions that hold them together. The relevant thermodynamic parameters are usually obtained by employing spectroscopic and/or calorimetric techniques and fitting an appropriate thermodynamic model to the experimental data. In this work, we compare the traditional approach of fitting the thermodynamic model to experimental data obtained from each experiment individually and the global approach of simultaneously fitting the model to all available data from different experiments. On the basis of several specific examples of DNA and protein unfolding, we demonstrate that piece-by-piece verification of the proposed thermodynamic model using individual fits is frequently inappropriate and can result in an incorrect mechanism and thermodynamics of the studied unfolding process. We find that while the two approaches are complementary in some aspects of analysis global fitting is essential for the appropriate selection and critical evaluation of the model mechanism. Only a good global fit thus gives us confidence that the obtained thermodynamic parameters of unfolding have real physical meaning. Introduction The determination of thermodynamic parameters characterizing structural transitions in biomolecular systems allows one to predict the behavior of these systems at different conditions. It also leads to some fairly universal quantities that can be compared for different systems and used to assess their properties. For instance, the standard free energy of denaturation at a given temperature is a common measure of the stability of proteins1 and nucleic acids.2 In addition, thermodynamic profiles may be correlated to changes in structural properties3-5 and potentially used to estimate the contributions of various intraand intermolecular interactions that drive conformational changes and macromolecular association.6,7 Aside from the fundamental interest in the driving forces of processes like the folding of proteins and nucleic acids or the association of biomolecules, thermodynamic data can also be very valuable for some practical applications. Of particular interest to the pharmaceutical industry, for example, is the quantification of biomolecular stability as well as finding ways to improve it (e.g., by select mutations or modification of the storage medium).8 Thermodynamic profiles of unfolding provide both a reliable measure of stability and suggestions of ways to improve it. They can also be used to assess the binding of ligands to biomolecules, which is directly applicable to rational drug design.9 Determination of thermodynamic parameters that characterize structural transitions of biomolecules is usually based on two techniques: spectroscopy and calorimetry.2,3,10-12 Spectroscopic methods follow the dependence of a spectral property (absorbance, fluorescence, ellipticity, NMR peak strength, etc.) on an independent experimental variable (usually the temperature, T, denaturant concentration, d, sample molar concentration, c, or, in the case of kinetic experiments, time, t). The calorimetric method (differential scanning calorimetry, DSC), on the other hand, registers the dependence of the partial molar enthalpy of * To whom correspondence should be addressed. E-mail: igor.drobnak@ fkkt.uni-lj.si; [email protected]. Phone: +386-1-2419-422; +386-12419-414. Fax: +386-1-2419-425; +386-1-2419-425.

j 2) on temperature. In both cases, the experimental the sample (H signal (F) is a linear combination (sum of the contributions) of the characteristic signals of each species i (Fi) weighted by its fraction in the solution (Ri)10

F)

∑ FiRi i

Ri )

ci c

(1)

In many cases, the Fi values for the various species in the studied biomolecular system differ significantlysfor example, the circular dichroism (CD) and fluorescence (FL) spectra of proteins normally exhibit large differences between the native and unfolded species.13 In such systems, the transformation of one species into another may be monitored by measuring F at different conditions (for example, with increasing T).10 Model-based thermodynamic analysis of structural transitions of biomolecules is a well-proven and frequently used method of extracting thermodynamic parameters from both spectroscopic2,10 and calorimetric11,14,15 data. It is based on a model reaction scheme (mechanism) that specifies the species involved in the studied system and the transitions (reactions) enabling conversions of one species into another. The suggested mechanism is usually discussed as an equilibrium process (the van’t Hoff approach),10,11,14 but kinetic models have also been used.14,16,17 For the chosen model, general laws of thermodynamics are applied to express the fractions (Ri) in terms of several thermodynamic parameters (such as the changes in standard free energy, ∆G°, enthalpy, ∆H°, and heat capacity, ∆c°, p for each reaction step) and several independent experimental variables (τ ) T, d, c, etc.). Since most spectroscopic and calorimetric experimental observables (F) are rather straightforwardly related to the Ri values (eq 1), they too can be expressed in terms of the same thermodynamic parameters and independent variables.10,11,14 This calculation of the F vs τ model function for any set of thermodynamic parameters parallels the experimental data of F vs τ. By fitting the modelcalculated curve to experimental data points (changing the thermodynamic parameters in the calculation so that the

10.1021/jp100525m  2010 American Chemical Society Published on Web 06/11/2010

8714

J. Phys. Chem. B, Vol. 114, No. 26, 2010

Drobnak et al.

discrepancies with experimental data are minimized), we obtain the set of thermodynamic parameters that give the best agreement with the experimental data set.10,11 It should be pointed out that using DSC one can determine the thermodynamic parameters of conformational transitions in a model-independent manner as well.11 The agreement between these values and the model-based values is usually considered to be a good indication of the appropriateness of the model.12 The model-based analysis is usually applied separately to each of the experimental curves (data sets) obtained from different experiments. If the model is appropriate, the results of all such individual fits should agree within the limits of the estimated errors. Additional (secondary) parameters may also be determined by fitting a relatively simple function to the multiple sets of parameters obtained from individual fits (secondary fit). On the other hand, since all the model curves describing the studied system should be governed by the same thermodynamic parameters, they may be fitted simultaneously, using only a single set of parameters that best describes the entire ensemble of experimental data.10,18 Such global fitting procedures have been described previously, particularly for the analysis of spectroscopic data.18-20 They have been suggested to improve the precision of the parameters and allow for greater confidence in a model that is able to describe all experimental data simultaneously.10,18,21 However, a detailed comparison of the global and individual approaches to the thermodynamic analysis of biomolecular unfolding is lacking. In this paper, we compare both approaches, applied to several equilibrium processes, with the goal of determining their strengths and weaknesses.

experimental variables (τ). By describing these dependencies, we will now link the Ri values to the most common experimental variables. Temperature. The dependence of ∆G°j on temperature is expressed by the Gibbs-Helmholtz relation

Methods

∆G°(T j o) and ∆H°(T j o) are the standard Gibbs free energy and enthalpy of the reaction at an arbitrarily chosen reference temperature To. Denaturant Concentration. The dependence of ∆G°j on the concentration of a denaturant (index d) can be described in a very general way as1

Thermodynamics of Conformational Transitions. The objective here is to obtain the relations between the independent experimental variables (τ) and the fractions (Ri) of all the species i involved in the transition process using only a single set of thermodynamic parameters. In our analysis, all processes are treated as chemical reactions. We start with a reaction scheme of the studied process, assigning each step j an apparent equilibrium constant Kj [apparent because we are using concentrations instead of activities (an approximation acceptable at low biomolecular concentrations normally used in experiments) and because we omit here the contributions of other species that may be involved in the transition (cosolvents, ions-discussed later in the text)]. Kj is directly related to the concentrations or fractions of the species involved

((

∂ ∆Gj◦ /T) ∂T

)

) p

∆Hj◦

(4)

T2

∆H°j is the corresponding standard molar enthalpy of the reaction and is itself temperature-dependent (Kirchhoff’s law)

( ) ∂∆Hj◦ ∂T

p

◦ ) ∆cp,j

(5)

∆c°p,j is the standard molar heat capacity of the reaction and is usually considered to be constant in the temperature ranges within which biological systems are typically studied. The final relation describing the dependence of ∆G°j on T is obtained by the integration of eqs 4 and 5

∆Gj◦(T) ) ∆Gj◦(To)

(

( ) ∂∆Gj◦ ∂µd

(

)

T T ◦ + ∆Hj◦(To) · 1 + ∆cp,j · To To T T - To - T ln (6) To

P R ) -(Γmd - Γmd ) ) -∆Γmd,j

)

(7)

p,T,ni*d

[Rm] and [Pn] are the equilibrium concentrations of the reactants and products, respectively; νm and νn are the corresponding stoichiometric coefficients; and ∆νj is defined as ∆νj ) ∑nνn ∑mνm (the index m runs over all reactants and n over all products). Kj is related to the standard Gibbs free energy of the reaction, ∆G°, j by the well-known relation

In this expression, ni denotes the amount of species i present in the solution, and µi denotes its chemical potential. The superscripts P and R indicate the products and reactants of reaction j. Γmd is the macromolecule-denaturant preferential binding coefficient, defined as Γmd ) (∂nd/∂nm)µd. Γmd relates to the concentration of denaturant molecules in the vicinity of the macromolecule m, compared to its bulk concentration (a positive Γmd indicates the concentration in the vicinity of the macromolecule is higher than in the bulk, while a negative Γmd implies the opposite). nd and nm are the amounts of denaturant and macromolecule, respectively. Equation 7 is not limited to denaturantssit applies to all cosolvents.1 In the specific cases of urea and guanidinium chloride, two denaturants most frequently used in the studies of protein stability, the relation between their concentrations (d) and ∆G°j of denaturation of globular proteins has been empirically found to be linear at a given T.5 It can be presented as

∆Gj◦ ) -RT ln Kj

∆Gj◦(T, d) ) ∆Gj◦(T, d ) 0) - mjd

Kj )



[Pn]νn

products



νm

∏ Rνn

n

) c∆νj

[Rm]

reactants

n

∏ Rmν

(2)

m

m

(3)

in which R is the gas constant and T is the absolute temperature. Kj, and consequently all the Ri depend on various ∆G°, j

(8)

where ∆G°(T, d ) 0) refers to the denaturant-induced transition, j extrapolated to zero denaturant concentration and mj is an

Analysis of Reversible Unfolding Processes

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8715

empirical parameter that correlates strongly with the amount of protein surface exposed to the solvent upon unfolding and with ∆c°p,j.5 Equation 8 also follows from eq 7 for ∆Γmd,j(T, d) ) (mj/RT)d. pH. The dependence of protein stability on the solution pH can also be treated as a dependence on the concentration of cosolvent (eq 7)sthe cosolvent in this case being H+. The exact shape of the ∆ΓmH+,j(pH) function depends on interactions between the protein, the H+ ions, and the water molecules. In principle, it may be estimated from the differences between the pKa values of the protolytic groups of reactants and products;22 however, as a first approximation a linear dependence on pH may be used

∆ΓmH+,j ) aj + bj · pH

(9)

in which the pH dependence coefficients for the reaction, aj and bj, are treated as fitting parameters. The linear function, when inserted into eq 7, yields

(

)

(10)

Equation 10 is the simplest (parabolic) ∆G°(T, pH) function j consistent with the empirical observation that proteins are characterized by an optimal pH, at which ∆G° of denaturation is the highest.23 To describe the dependence of ∆H°j on pH, a temperature derivative of eq 10 needs to be taken, and two new model parameters are introduced: ∂aj/∂T and ∂bj/∂T, the temperature derivatives of aj and bj, respectively. ∂pH/∂T, related to the ionization enthalpy of the buffer, also needs to be considered in this analysis. Salt Concentration. The dependence of ∆G°j on the concentration of salt in the sample solution may again be described in terms of eq 7 with the salt ions considered as cosolvents. However, this would require the determination of ∆ΓmM+,j as a function of [M+]. Since such determinations are always problematic, a much simpler approach is commonly used. It is based on the approximation that every transition Kj

Rj h Pj occurs actually as Kj*

Rj h Pj + sjM+ where Rj and Pj are the reactants and products, respectively, and sj is the number of M+ ions released. sj is usually assumed to be independent of T and can take negative values, signifying an uptake of ions in the reaction. Kj* is the actual equilibrium constant, assumed to be independent of [M+], and can be expressed as

) K*(T) j

[Pj][M+]sj ) Kj(T, [M+])[M+]sj [Rj]

(11)

By applying eq 3, this relation translates into

∆Gj◦(T, [M+]) ) ∆Gj◦(T) + sjRT ln[M+]

∆Gj◦(T, d, pH, [M+]) ) ∆Gj◦(To, 0, pHo)

(

(12)

)

(

T + To

)

T T ◦ + ∆cp,j T - To - T ln To To bj mjd + RT ln aj(pH - pHo) + (pH2 - pH2o) + 2 sjRT ln[M+] (13)

∆Hj◦(To, 0, pHo) 1 -

∆Gj◦(T, pH) ) ∆Gj◦(T, pHo) + RT ln aj(pH - pHo) + bj (pH2 - pH2o) 2

where the actual reaction free energy (corresponding to Kj*(T)) and is independent of [M+]. Equation 12 is denoted as ∆G°(T) j holds only as long as sj remains independent of [M+], so it may not be applicable across large ranges of [M+] values. Moreover, [M+] should be considered as the activity rather than concentration of M+ ions since ion-ion interactions are strong and longranged and can contribute significantly to activity coefficients, especially at higher salt concentrations. Sample Concentration. As a standard molar quantity, ∆G°j is independent of the sample concentration c. However, as can be seen from eq 2, the Ri values are concentration dependent if ∆νj * 0. When observed, a dependence of Ri values on c is thus an indication that an association or dissociation process accompanies the transition j. Global ∆G°j Function. By putting together all the dependencies described above, we can express ∆G°j as a single function of several experimental variables (τ ) T, d, pH, [M+])

(

)

This is the function used to calculate the Kj (through eq 3) and subsequently all the Ri (by considering eq 2 and ΣiRi ) 1) in d, the global fits described below. The corresponding ∆H°(T, j pH, [M+]) is obtained by differentiating this equation according to eq 4. Spectroscopic Data. Spectroscopic data can usually be related to Ri using the linear eq 1, as long as one is able to determine the various Fi. This can be done straightforwardly, provided an interval of τ values exists where only one species i is present (ci ) c): F(τ) ) Fi(τ). Then, this Fi(τ) can simply be extrapolated over the entire range of measured τ values. For instance, a protein in the native conformation (i ) N) is present only at low temperatures, so an extrapolation (usually linear) is used to estimate FN(τ) at higher temperatures. No additional experiment is needed since we can simply use the pre- and posttransitional baselines of the F(τ) vs τ denaturation curve.10,24 For those species that are not present alone over a sufficiently long interval of τ values to allow such an extrapolation, an assumption must be made about the shape of the Fi(τ) vs τ function, and the parameters of this function have to be fitted to the experimental data along with the thermodynamic parameters. Spectroscopic data can be presented in normalized form, which enables an easy comparison between the data from different experiments and helps to simplify the fitting algorithms. By subtracting the extrapolated initial baseline (FN(τ)) from the experimental F(τ), we obtain the excess signal ∆F(τ)

∆F(τ) ) F(τ) - FN(τ) )

∑ ∆Fi(τ)Ri(τ)

(14)

i

where for each species i, ∆Fi(τ) ) Fi(τ) - FN(τ). We complete the normalization by dividing the experimental ∆F(τ) by the extrapolated final excess signal (∆FD(τ)). The normalized signal f(τ) is thus

8716

J. Phys. Chem. B, Vol. 114, No. 26, 2010

f(τ) )

∆F(τ) ) ∆FD(τ)

∆F (τ)

∑ ∆FDi (τ) Ri(τ) ) ∑ fiRi(τ) i

Drobnak et al.

(15)

i

While fN ) 0 and fD ) 1, the fi coefficients of intermediates may differ between experimental methods. Therefore, the fi values for the same intermediate species i should be determined independently for each spectroscopic method. A useful advantage of this normalization, based on our experience with the model analysis of numerous experiments, is that the fi coefficients may be considered as τ-independent model parameters as long as the τ intervals are not too large. DSC (Calorimetric) Data. Calorimetric data can be treated much like spectroscopic data, and eqs 1 and 14 still apply, with j 2) taking the place the partial molar enthalpy of the sample (H j2 of F. However, the experimentally measured quantity is not H but rather its temperature derivative, the partial molar heat j 2/∂T)p. This is the quantity capacity of the sample, jcp,2 ) (∂H obtained when the measured total heat capacity of the solution is corrected for the buffer contribution and normalized per amount of sample in the cell. For further analysis the experimental quantity is more conveniently expressed as the excess heat capacity, ∆cp,2 ) jcp,2 - jcp,N, where jcp,N is the partial molar heat capacity of the native species (the initial baseline). ∆cp,2 can also be calculated for the suggested model as the temperature j 2) at constant pressure p. The derivative of eq 14 (with F ) H result is a model function

∆cp,2(τ) )

( )

∑ ∆cp,i◦ Ri(τ) + ∑ ∆Hi◦(τ) i

i

∂Ri (τ) ∂T p

(16)

in which ∆H°i and ∆c°p,i are the standard molar enthalpy and heat capacity of formation of species i from its native state (∆H°i ) ∆H°Nfi).15 The use of standard quantities is justified at ji sufficiently low biomolecular concentrations where ∆Hi ) H j - HN is concentration independent. Equation 16 reveals an important advantage of DSC over spectroscopic data; the ∆Fi are not spectroscopic properties (such as differences in extinction coefficients) but thermodynamic quantities. This means that they can be expressed in terms of the thermodynamic parameters used to calculate the Ri (∆H°i and ∆c°p,i can be expressed in terms of ∆H°j and ∆c°p,j by employing a thermodynamic cycle). Another important consequence of working with thermodynamic rather than spectroscopic properties is that from the measured DSC data a number of model-independent thermodynamic quantities can be obtained.11 Parameter Fitting and Error Estimation. For the actual fitting procedure, we wrote our own fitting program in C++. The program first determines the baselines for each experimental data set using a linear least-squares fit25 in the τ intervals specified by the user. These baselines are then used to normalize the data before fitting. For any set of fitting parameters the model-based f(τ) or ∆cp,2(T) values are calculated for all experimental data points using the relations described above. χ2 is calculated as

χ2 )

∑ k

(fk,mod - fk,exp)2 σ2k

global fitting, the sum runs over the entire ensemble of data (global data set), which means that all the experimental data sets are included in the single global fit. The set of best-fit parameters is obtained by minimizing χ2 using the downhill simplex method introduced by Nelder and Mead, as described in ref 25. The quality of the fit is assessed using the goodness-of-fit (gof) criterion.25 The gof is the probability that the observed discrepancies between the model and experimental data points are the result of random errors in the experimental points, assuming that all errors are independent and normally distributed and that the model is linear. Because of the assumptions it should not be interpreted literally but more as a general measure of the quality of the fit. A gof value g10-3 is usually taken as an indication of a good fit and a confirmation that the model used is appropriate.25 Error estimation is carried out using the Monte Carlo approach.25 This approach consists of generating a large number of pseudoexperimental data sets by adding random errors to the original best-fit model points. The random errors are generated with a normal distribution around an expected (mean) value of fk,mod with a variance (scattering) of σ2k . The randomgenerated pseudoexperimental data sets mimic the real experimental errors. The model is then fitted to each pseudoexperimental global data set independently, yielding slightly different pseudofit parameter values. The scattering of each parameter provides an estimate of the error in the original best-fit parameter arising from random errors in the experimental data. There is at least one additional source of error in the fitting proceduresthe human factor in deciding which regions represent the initial and final baselines. To estimate its influence on the fitted parameters, the borders defining baseline regions were also randomly varied around their initial (user-determined) values. Finally, we calculated the correlation coefficients (cor) for each pair of adjustable parameters (X, Y)

cor(X, Y) )

j ·Y j XY - X sXsY

(18)

The bars denote an average over all Monte Carlo iterations, while sX and sY denote the standard deviations (over all Monte Carlo steps) of the parameters. The correlation between two parameters provides a measure of their interdependenceslarge correlation coefficients (near 1) indicate that each change in one of them can consistently be compensated by a proportional change in the other to maintain a good fit, which can hinder the accurate determination of parameter values.26 The correlation coefficients are generally not a consequence of any physically meaningful relation between parameterssthey arise in the fitting process due to a limited amount of data and experimental uncertainties.26 The fitting, error estimation, goodness-of-fit, and correlation calculations were also carried out in the same way for each experimental data set separately (an individual fit). We compared the results and characteristics of the global and individual fitting procedures performed on several sets of data measured previously in our laboratory.27-29

(17)

where fk,mod and fk,exp are the model-based and experimental values for each data point k and σk is its estimated standard deviation (describing the scattering of experimental data). In

Results DNA. The thermodynamic stability of a short self-complementary DNA oligonucleotide (5′-CGAATTCG-3′) was studied by performing DSC measurements at different NaCl concentrations.27 A simple two-state helix to coil transition model (Scheme

Analysis of Reversible Unfolding Processes

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8717

1, see Supporting Information (SI) for a detailed derivation of the model function) was found to fit well to each experimental data set separately. SCHEME 1: Two-State Model of Unfolding Coupled with Dimer Dissociation

The comparison of the individual best-fit parameters obtained at different salt concentrations allowed us to estimate the number of Na+ ions released upon unfolding of a single duplex. This was achieved using a secondary fit of eq 12 to the four ∆G°(To, [M+]) points (Figure 1b). The number of released Na+ ions was also estimated by incorporating salt dependence into a global fitting procedure which fits all four DSC curves simultaneously. The results of both the global and individual approaches, presented in Figure 1 and in the SI Table 1, show that the overall goodness of the global fit is much lower than that of each individual fit. This reveals a weakness of the model which is not so readily apparent when we combine the results of very good individual primary fits and the linear secondary fit (Figure 1, a and b). Regardless of what the cause of this weakness is (most likely the slope s in eqs 12 and 13 is not constant over the entire range of Na+ concentrations used here), the applied model is evidently not able to describe the unfolding event in detail. Such a conclusion could easily be missed if an analysis of only the individual fits based on the same model was performed (unless one noticed the curvature of the four points in Figure 1b). It should be mentioned that using Na+ activities instead of concentrations did not improve the fit. Moreover, the determination of activity coefficients of Na+ from the modified Debye-Hückel theory30 is itself questionable at such high concentrations. rEPO. Recombinant human erythropoietin (rEPO) is a monomeric protein, for which the data on urea-induced and thermal denaturation as well as on the pH dependence of its thermal denaturation are well documented.28 Both urea- and temperature-induced denaturation processes are consistent with a two-state mechanism (Scheme 2), but their final (denatured) states are significantly different, thus leading to different thermodynamic parameters of denaturation.28 SCHEME 2: Two-State Model of Unfolding without Dissociation

Fitting each experimental denaturation curve individually gave d ) 0) vs T good quality fits, and the secondary fit of ∆G°(T, u points to eq 6 allowed us to describe the temperature dependence of urea-induced denaturation by obtaining estimates of ∆G°(T u o, d ) 0), ∆H°u (To, d ) 0), and ∆c°p,u (Figure 2d). The experimental data may also be divided into two semiglobal data sets, one consisting of all thermal denaturation experiments (performed at different pH) and the other of all urea unfolding experiments (performed at different temperatures

Figure 1. Individual and global fits for DNA oligonucleotide unfolding at different salt concentrations. The lines represent the best-fit model curves to the experimental data points of the same color. (a) Global (solid lines) and individual (dashed lines) fits of each DSC data set. (b) Best-fit parameter ∆G°(To, [M+]) values at To ) 45 °C, obtained from each of the primary individual fits. The lines represent the linear secondary fit (black) and the result of the global fit (red). (c) Molar fraction of oligomer strands in the unfolded state (RD) as a function of T and [Na+], calculated using the global best-fit parameters from Supporting Information Table 1. The lines on the (T, [Na+]) plain are the contour projections at RD values 0.1, 0.3, 0.5, 0.7, and 0.9.

and constant pH). Each semiglobal data set can then be fitted independently with the two-state model (Scheme 2). Alternatively, thermal and urea denaturation experiments can be used simultaneously to fit a global three-state equilibrium model (Scheme 3) with two distinct denatured states (Dt for temperature- and Du for urea-induced denatured states). The detailed derivation of model functions is given in the SI.

8718

J. Phys. Chem. B, Vol. 114, No. 26, 2010

Drobnak et al.

Figure 2. Individual, semiglobal, and global fits of the thermal and urea-induced denaturation of rEPO. The lines represent the semiglobal (solid), individual (dashed), and global (dotted) best-fit model curves to the experimental data points of the same color. The global and semiglobal best-fit curves often overlap very closely. (a) DSC temperature scans at different pH in the absence of urea. (b) Urea denaturation curves, measured by CD at pH 7.2 and different temperatures. (c) Spectroscopic temperature scans at pH 7.2 in the absence of urea. (d) ∆G°(T, d ) 0) vs T at pH 7.2. The u points represent best-fit values from the four individual fits; the black line is the secondary fit of eq 6 to the four points; and the red and blue lines represent the semiglobal and global model curves, respectively. (e) ∆G°(60 °C, pH) vs pH at d ) 0. The points represent best-fit values from the t four individual fits; the black line is the secondary fit of eq 10 to the four points; and the red line represents the semiglobal model curve. (f) Molar fractions of the thermal (RDt, red grid) and urea-induced (RDu, blue grid) denatured states as a function of T and d at pH 7.2, calculated from the global best-fit parameters. The lines on the (T, d) plain are the contour projections at R values 0.1, 0.3, 0.5, 0.7, and 0.9.

SCHEME 3: Three-State Model of Unfolding with Two Distinct Denatured States

We were not able to include pH dependence into the global three-state model since no experimental data are available on the pH dependence of urea-induced rEPO denaturation. Without this data, the fit was unable to converge to a reasonably narrow global χ2 minimum. On the other hand, the global fit at a constant pH and the semiglobal fit including pH dependence for thermal denaturation only were successful, and their results are discussed below. The results of the individual, semiglobal (two-state), and global (three-state) approaches for obtaining the thermodynamic

Analysis of Reversible Unfolding Processes

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8719

Figure 3. Global fit of the thermal denaturation of MazG. The lines represent the best-fit model curves to the experimental data points of the same color. (a) DSC thermal denaturation curves in the absence of urea. (b) FL thermal denaturation curves in the absence of urea. (c) CD thermal denaturation curves at different concentrations of MazG monomers, c, and urea, d. (d) ∆G°SD(To,d) vs d at To ) 60 °C. The points are the best-fit values from primary individual fits; the black line represents the secondary fit of these six points; and the red line is the model curve based on the global best-fit parameters from SI Table 3.

parameters are presented in Figure 2 and SI Table 2. All three approaches yield good fits, although the goodness-of-fit is somewhat better for the individual fits. The only inadequate fit is the semiglobal fit of thermal denaturation at different pH, most likely due to the inability of eq 10 to properly describe pH) (Figure 2a). By contrast, the pH dependence of ∆G°(T, t the secondary fit of eq 10 to the individual best-fit ∆G°(T t o, pH) values is good (Figure 2e), suggesting that eq 10 is a reasonable approximation. This, however, may be misleading since the secondary fit is actually based on fewer assumptions than the global fit. While each individual fit may be characterized by in the global fit independent ∆H°(To, pH) and ∆c°(pH), p assumptions have to be made about the pH dependence of these quantities as well (∂a/∂T, ∂b/∂T, and ∆c°p are all assumed to be independent of pH and T). The quality of the global fit confirms the three-state mechanism to be consistent with the data but does not conclusively support the mechanism, particularly since none of the experiments used here register the transition between Du and Dt directly. In terms of errors, all three fitting approaches do rather poorlyserrors are large particularly for the N h Du transition, reflecting the shortage of data on urea-induced denaturation and the high dependence of the final fitted parameters on the initial choice of baselines. Realistically, the actual errors may be slightly lower than the estimates. This is because the error estimation algorithm, unlike the human eye, chooses the baseline boundaries independently of the random-generated data set (it uses the user-supplied boundaries plus a random-generated variation). Nevertheless, we may expect the values from the

global fit to be more accurate since they take into account more (and more diverse) data. For example, thermal denaturation experiments show that the Du state is not significantly populated even at high temperatures in the absence of urea.28 This limits the range of possible values for the urea-denaturation parameters in the global fit but not in the individual fits where each fit is completely independent of all others (an individual f(d) fit does not “know” about any thermal denaturation experiments). The complete mutual independence of individual fits thus allows somewhat unrealistic parameter valuessfor instance, ∆c°p,u is too large compared to ∆c°p,t, suggesting that the Du state is significantly populated at higher temperatures (above 70 °C where ∆G°(T, d ) 0) < 0 in Figure 2d) in the absence of urea. u For comparison, see the more realistic molar fractions of different species, calculated from global best-fit parameters and presented in Figure 2f. More experimental data at a combination of high T and high d would be necessary to enable further confirmation of the threestate equilibrium model (Scheme 3) and to determine the parameters more precisely. MazG. As presented in our recent study of the thermodynamic stability of MazG,29 the concentration dependence of its CD melting curves indicates that the unfolding of the native dimeric protein is accompanied by its dissociation into monomers (Figure 3c). However, this thermal unfolding cannot be described in terms a simple two-state mechanism (Scheme 1) due to its incompatibility with the DSC (Figure 3a) and fluorescence data, the latter showing a single concentration-independent transition (Figure 3b). A threestate model 1/2N2 h 1/2I2 h D was found to fit well to all the

8720

J. Phys. Chem. B, Vol. 114, No. 26, 2010

spectroscopic data simultaneously and separately to the DSC data, but the parameters obtained from the two fits were still incompatible. The global fit of all spectroscopic and DSC data revealed discrepancies in the concentration dependence which were resolved using a four-state mechanism (see SI for details). SCHEME 4: Four-State Model of MazG Thermal Denaturation

Fitting this model to individual experimental data is difficult since the FL experiments are sensitive only to the first transition and CD experiments primarily to the third. Because of this, we could only fit a two-state model with ∆ν ) 0 (Scheme 2) to FL temperature scans and a two-state model with ∆ν ) 1/2 (Scheme 1) to CD temperature scans. The four-state model (Scheme 4) can be used to describe DSC data, but these data on their own can also be described very well by the three-state model, meaning that the four-state model is overparametrizedsit uses more parameters than are necessary to describe the shape of the experimental curve. Several parameters can therefore take a broad range of different values without significantly affecting the quality of the fit. This can be seen quantitatively not only in the relatively large Monte Carlo error estimates but also in an unrealistically high goodness-of-fit (very near to 1, too good to be true). The results of the fits are presented in SI Table 3. By contrast, a global fit of the model function based on the four-state mechanism (Scheme 4) to all available spectroscopic and DSC data is not over-parametrized. Thus, the Monte Carlo error estimates were reduced (SI Table 3). Unfortunately, the goodness of fit suffered considerably. This is probably a reflection of some approximations made in the thermodynamic models (such as the constant mj or ∆c°p,j) and in the normalization procedure (linearity of the baselines), which are likely to cause minor systematic discrepancies in some parts of a wide (global) data set. The poor goodness-of-fit value is also the result of unrealistically low standard deviations assigned to fluorimetric datasthese were necessary to ensure sufficient weight was given to the relatively few fluorimetric data points. The fluorimetric data are crucial to the mechanism because of their sensitivity to the first transition. Nevertheless, the model curves describe experimental data relatively well (Figure 3), and the diversity of the data described by a single set of parameter values gives strong credibility to both the model and the corresponding bestfit thermodynamic parameters. It should be pointed out that without a global fit or a much larger amount of experimental data it would be nearly impossible to validate and assign reliable parameter values to a model as complex as the one presented by Scheme 4. Discussion When studying biomolecular unfolding, relying on any single experimental technique is problematic since each technique is sensitive only to some specific property of the system. Spectroscopic techniques only register changes in a limited part of the molecule (fluorophore, chromophore, NMR-active atom, etc). Similarly, calorimetric techniques are limited in that they only register transitions accompanied by an enthalpy change of a large enough magnitudestwo macroscopic states with sufficiently similar enthalpies are indistinguishable by DSC. The

Drobnak et al. only apparent solution to this problem is the use of as many different (complementary) experimental techniques as possible. This produces a multitude of experimental data sets which may be analyzed using the individual or global approach. The strengths and weaknesses of both are discussed below. Individual fits have an obvious advantage in the goodnessof-fit for each individual data set and the relative simplicity of the algorithms and the models themselves. The model reaction scheme explaining a given data set may be simplified to cover only those transitions that occur in the range of τ values probed by the given experiment. An example is the denaturation of rEPO above, where each denaturation curve could be described quite safely as a two-state process, even if the global model consists of three states (Figure 2). Additional parameters may be obtained by a relatively simple secondary fit. There are, however, drawbacks to this simplicitysoverparametrization can be a problem in individual fits when the model used is too complex for the limited informational content of a single experimental curve, for instance when fitting the fourstate model to a single DSC thermogram of MazG. The problem is even more obvious in secondary fits, where the number of data points per fitted parameter is much lower. An example is d ) 0) vs T secondary fit rEPO denaturation with the ∆G°(T, u (Figure 2d) in which a model curve containing three parameters is fitted to only four points. The fit is very good, but even small errors in one of the points can result in large inaccuracies of the fitted parameters (in this example, ∆c°p,u is unrealistically large, as discussed above). To avoid overparametrization it is often necessary to simplify the model, but this also creates the risk of oversimplification. A simple model may produce a good fit despite the presence of other (neglected) transitions at the given experimental conditions. This was observed in the case of MazG, where an oversimplified model for the description of DSC data (the three-state model) can be compensated by incorrect parameter values, resulting in an apparently good fit. Similarly, the low number of points in the secondary fit can cause the fit to appear very good even when there are systematic discrepancies resulting from approximations in the model. This is evident in the case of oligonucleotide unfoldingsthe curvature in Figure 1b may be overlooked because the secondary data consist of only four points. The global fitting procedure, on the other hand, achieves a higher ratio of data points per fitted parameter. In doing so it sacrifices some goodness-of-fit, which is most likely due to the larger number of assumptions that need to be put into the model to link all the data sets to common parameters. Examples of such assumptions are the linear dependencies of ∆G° on ln[Na+] (eq 12) and on d (eq 8). These assumptions are embedded into the global model and cannot be tested independently unless a global model with a different assumption is constructed and the results of the two models are compared. Another difficulty with the global fitting procedure is its complexity, both in terms of constructing a global model that adequately describes all the experimental data and in terms of writing an appropriate computer program. Nevertheless, the benefits of global fitting are substantial, starting with smaller errors in the fitted parameters. These improvements are not always as large as one might hope because even the compilation of all the experimental data sets does not always contain enough information to determine all the fitted parameters precisely. For example, ∆c°p values can be determined more reliably, and their Monte Carlo error estimates are lower than those obtained from the individual fits; however, they are still heavily dependent on the choice of baselines, particularly

Analysis of Reversible Unfolding Processes

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8721 of should one proceed with determining the final parameter values and estimating their errors. In this final step, again, the global fitting approach is preferred. Conclusions

Figure 4. Dependence of MazG thermal denaturation on total protein concentration. The grids represent the molar fractions (R) of MazG monomeric units taken up by species I2 (blue), S2 (black), and D (red) as a function of T and c. The molar fractions were calculated from global best-fit parameters (SI Table 3). The lines on the (T, c) plain are the contour projections at R values 0.1, 0.3, 0.5, 0.7, and 0.9.

in the case of DSC data, so that their relative errors remain high (see SI Table 3). A more important advantage of global fitting is often its ability to rigorously test the model against all experimental data simultaneously. A successful global fit directly implies that the model is consistent with all the experimental data, while any deviations of model functions from experimental points can be quickly identified. A good example is MazG thermal denaturation, where it is difficult to reject the threestate model or to reliably confirm a four-state model and determine its parameters on the basis of individual fits alone. A secondary fit of the best-fit parameters obtained from individual fits can be used to test some of the assumptions that link all the data sets to common parameters in the global model. However, we have seen this test fail to reject a poor approximation on two occasions in the examples described above. The reason for such a failure may be an insufficient number of secondary data points per fitted parameter (as with oligonucleotide unfolding) or the fact that the secondary fit allows a lot more flexibility in parameter values than the rigorous global fit (as with the pH dependence of ∆G°t of rEPO). A global fit imposes strict adherence to the model functions for all parameters, while a secondary fit fits a model function using only some of the parameters (allowing all other parameters to be completely independent between individual fits). This is the reason why a good secondary fit is a necessary but not sufficient condition for a good global fit. The comparison of correlation coefficients for the same pairs of parameters shows that the values obtained with the global fitting procedure are generally smaller than those obtained from individual fits. This may be explained by the larger number of data points per fitted parameter and, more significantly, by the greater variety of data which provide more experimental constraints to the fitted parameters. However, there are also a number of exceptions where the global correlation coefficients are larger than the corresponding values from individual fits. From the practical point of view, individual fits may be considered an appropriate starting point in constructing a global model as the simplest model (the lowest common denominator) able to describe each data set. They are also useful for determining the initial estimates of the parameter values for the global fit. A global fit should then be used to test the initial model, and if the simpler model fails to give a good global fit, new complexities should be introduced. One should be careful not to reject a model prematurely since it has been our experience that sometimes the fit can be significantly improved simply by a different choice of baselines or different initial parameter estimates. Only when these precautions are taken care

To summarize, the major difference between individual and global fitting approaches is the amount and variety of experimental data that are fitted simultaneously. More data, particularly when coming from different experimental techniques, provide more information about the studied process, which translates into a more accurate and more precise determination of each parameter contained in the model function. When the data are lacking, as is frequently the case with individual fits of complex models and with secondary fits, the parameter errors are high even though the fits appear to be very good. In more extreme cases, the fits may even fail to converge to a single χ2 minimum. Simplification of the model can sometimes solve this problem, but there is a considerable risk that an oversimplified model will lead to a reasonably good fit using incorrect parameter values. The second consequence of having more data in a single fit is that a good fit has more significance (carries more weight) because the model (and its set of best-fit parameters) conforms to a larger set of experimental constraints. Piece-by-piece verification of a model using individual fits is much less reliable since each of the individual primary and secondary fits may still be very good even if the model is inconsistent with the gathered experimental data. By contrast, a good global fit guarantees that the model is consistent with all available experimental data. This makes global fitting a must-have tool for anyone dealing with the model-based thermodynamic analysis of biomolecular unfolding processes. Acknowledgment. This work was supported by the Ministry of Higher Education, Science and Technology, and by the Agency for Research of Republic of Slovenia through the Grants No. [P1-0201] and [J1-6653]. Supporting Information Available: Additional information about the model analysis and the program used may be obtained from the authors upon request. Detailed derivations of all model functions used as well as the numerical results of global and individual fits. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes (1) Scharnagl, C.; Reif, M.; Friedrich, J. Biochim. Biophys. Acta 2005, 1749, 187–213. (2) Marky, L. A.; Breslauer, K. J. Biopolymers 1987, 20, 1–24. (3) Robertson, A. D.; Murphy, K. P. Chem. ReV. 1997, 97, 1251–1267. (4) Jacobs, D. J.; Dallakyan, S. Biophys. J. 2005, 88, 903–915. (5) Myers, J. K.; Pace, C. N.; Scholtz, J. M. Protein Sci. 1995, 4, 2138– 2148. (6) Makhatadze, G. I.; Privalov, P. L. AdV. Protein Chem. 1995, 47, 307–425. (7) Murphy, K. P.; Freire, E. AdV. Protein Chem. 1992, 43, 313–361. (8) MicroCal, L. L. C. Applications of BioCalorimetry 6; Heidelberg: Germany, 2008. (9) Chaires, J. B. Annu. ReV. Biophys. 2008, 37, 135–151. (10) Eftink, M. R. Methods Enzymol. 1995, 259, 487–512. (11) O’Brien, R.; Haq, I. Applications of biocalorimetry: Binding, stability and enzyme kinetics. In Biocalorimetry; Ladbury, J. E., Doyle, M., Eds.; John Wiley & Sons, Ltd.: Chichester, 2004; Vol. 2. (12) Cooper, A. Thermodynamics of protein folding and stability. In Protein: A comprehensiVe treatise; Allen, G., Ed.; JAI Press Inc.: Stamford, 1999; Vol. 2. (13) Schmid, F. X. Optical spectroscopy to characterize protein conformation and conformational change. In Protein Structure: A Practical Approach; Creighton, T. E., Ed.; The practical approach series; Oxford University Press: Oxford, 1997.

8722

J. Phys. Chem. B, Vol. 114, No. 26, 2010

(14) Ladbury, J. E.; Chowdhry, B. Z., Eds. Biocalorimetry: Applications od calorimetry in the biological sciences; John Wiley & Sons, Ltd.: Chichester, 1998. (15) Jelesarov, I.; Bosshard, H. R. J. Mol. Recognit. 1999, 12, 3–18. (16) Sanchez-Ruiz, J. M. Biophys. J. 1992, 61, 921–935. (17) Prislan, I.; Lah, J.; Vesnaver, G. J. Am. Chem. Soc. 2008, 130, 14161–14169. (18) Beechem, J. M. Methods Enzymol. 1992, 210, 37–54. (19) Knutson, J. R.; Beechem, J. M.; Brand, L. Chem. Phys. Lett. 1983, 102, 501–507. (20) Stilbs, P.; Paulsen, K.; Griffiths, P. C. J. Phys. Chem. 1996, 100, 8180–8189. (21) Ibarra-Molero, B.; Sanchez-Ruiz, J. M. Biochemistry 1997, 36, 9616–9624. (22) Yang, A.-S.; Honig, B. J. Mol. Biol. 1993, 231, 459–474. (23) Creighton, T. E. Proteins: Structures and molecular principles; W. H. Freeman and Company: New York, 1984.

Drobnak et al. (24) Pace, C. N.; Scholtz, J. M. Measuring the conformational stability of a protein. In Protein Structure: A Practical Approach; Creighton, T. E., Ed.; The practical approach series; Oxford University Press: Oxford, 1997. (25) Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. Numerical Recipes in C++: The Art of Scientific Computing, 2nd ed.; Cambridge University Press: New York, 2002. (26) Johnson, M. L. Methods Enzymol. 2000, 321, 424–446. (27) Drobnak, I.; Serucˇnik, M.; Lah, J.; Vesnaver, G. Acta Chim. SloV. 2007, 54, 445–451. (28) Lah, J.; Prislan, I.; Krzˇan, B.; Salobir, M.; Francky, A.; Vesnaver, G. Biochemistry 2005, 44, 13883–13892. (29) Drobnak, I.; Korencˇicˇ, A.; Loris, R.; Marianovsky, I.; Glaser, G.; Jamnik, A.; Vesnaver, G.; Lah, J. J. Mol. Biol. 2009, 392, 63–74. (30) Dean, J. A., Ed.; Lange’s handbook of chemistry, 15th ed.; McGrawHill: New York, 1999.

JP100525M