pubs.acs.org/Langmuir © 2010 American Chemical Society
Single-Molecule Force-Clamp Spectroscopy: Dwell Time Analysis and Practical Considerations† Yi Cao‡ and Hongbin Li* Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada. ‡ Present address: Department of Physics, Nanjing University, Nanjing, Jiangsu Province, PR China Received October 13, 2010. Revised Manuscript Received November 8, 2010 Single-molecule force-clamp spectroscopy has become a powerful tool for studying protein folding/unfolding, bond rupture, and enzymatic reactions. Different methods have been developed to analyze force-clamp spectroscopy data on polyproteins to obtain kinetic parameters characterizing the mechanical unfolding of proteins, which are often modeled as a two-state process (a Poisson process). However, because of the finite number of domains in polyproteins, the statistical analysis of the force-clamp spectroscopy data is different from that of a classical Poisson process, and the equivalency of different analysis methods remains to be proven. In this article, we show that these methods are equivalent and lead to accurate measurements of the unfolding rate constant. We also demonstrate that distinct from the constant-pulling-velocity experiments, in which the unfolding rate extracted from the data is dependent on the number of protein domains in the polyproteins (the N effect), force-clamp experiments do not show any N effect. Using a simulated data set, we also highlighted important practical considerations that one needs to take into account when using the single-molecule force-clamp spectroscopy technique to characterize the unfolding energy landscape of proteins.
Introduction Atomic force microscopy (AFM)-based single-molecule force spectroscopy has evolved into a powerful tool for studying the dynamics and the involved weak noncovalent interactions of many processes, including conformational changes of polymers (including synthetic polymers and biomacromolecules1-5 and DNA6-11), † Part of the Supramolecular Chemistry at Interfaces special issue. *To whom correspondence should be addressed. E-mail: hongbin@chem. ubc.ca.
(1) Rief, M.; Oesterhelt, F.; Heymann, B.; Gaub, H. E. Science 1997, 275, 1295. (2) Zhang, X.; Liu, C. J.; Wang, Z. Q. Polymer 2008, 49, 3353. (3) Liu, C. J.; Shi, W. Q.; Cui, S. X.; Wang, Z. Q.; Zhang, X. Curr. Opin. Solid State Mater. Sci. 2006, 9, 140. (4) Giannotti, M. I.; Vancso, G. J. ChemPhysChem 2007, 8, 2290. (5) Walther, K. A.; Brujic, J.; Li, H.; Fernandez, J. M. Biophys. J. 2006, 90, 3806. (6) Mehta, A. D.; Rief, M.; Spudich, J. A.; Smith, D. A.; Simmons, R. M. Science 1999, 283, 1689. (7) Zhang, W.; Machon, C.; Orta, A.; Phillips, N.; Roberts, C. J.; Allen, S.; Soultanas, P. J. Mol. Biol. 2008, 377, 706. (8) Lee, G.; Rabbi, M.; Clark, R. L.; Marszalek, P. E. Small 2007, 3, 809. (9) Ke, C.; Humeniuk, M.; H, S. G.; Marszalek, P. E. Phys. Rev. Lett. 2007, 99, 018302. (10) Cui, S.; Yu, J.; Kuhner, F.; Schulten, K.; Gaub, H. E. J. Am. Chem. Soc. 2007, 129, 14710. (11) Cui, S.; Albrecht, C.; Kuhner, F.; Gaub, H. E. J. Am. Chem. Soc. 2006, 128, 6636. (12) Ainavarapu, S. R.; Li, L.; Badilla, C. L.; Fernandez, J. M. Biophys. J. 2005, 89, 3337. (13) Bertz, M.; Rief, M. J. Mol. Biol. 2009, 393, 1097. (14) Cao, Y.; Balamurali, M. M.; Sharma, D.; Li, H. Proc. Natl. Acad. Sci. U.S. A. 2007, 104, 15677. (15) Cao, Y.; Er, K. S.; Parhar, R.; Li, H. ChemPhysChem 2009, 10, 1450. (16) Baumgartner, W.; Hinterdorfer, P.; Ness, W.; Raab, A.; Vestweber, D.; Schindler, H.; Drenckhahn, D. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 4005. (17) Hinterdorfer, P.; Baumgartner, W.; Gruber, H. J.; Schilcher, K.; Schindler, H. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 3477. (18) Florin, E. L.; Moy, V. T.; Gaub, H. E. Science 1994, 264, 415. (19) Wiita, A. P.; Ainavarapu, S. R.; Huang, H. H.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 7222. (20) Liang, J.; Fernandez, J. M. ACS Nano 2009, 3, 1628. (21) Wiita, A. P.; Perez-Jimenez, R.; Walther, K. A.; Grater, F.; Berne, B. J.; Holmgren, A.; Sanchez-Ruiz, J. M.; Fernandez, J. M. Nature 2007, 450, 124. (22) Perez-Jimenez, R.; Li, J.; Kosuri, P.; Sanchez-Romero, I.; Wiita, A. P.; Rodriguez-Larrea, D.; Chueca, A.; Holmgren, A.; Miranda-Vizuete, A.; Becker, K.; Cho, S. H.; Beckwith, J.; Gelhaye, E.; Jacquot, J. P.; Gaucher, E. A.; SanchezRuiz, J. M.; Berne, B. J.; Fernandez, J. M. Nat. Struct. Mol. Biol. 2009, 16, 890.
1440 DOI: 10.1021/la104130n
ligand binding,12-18 chemical/enzymatic reactions,19-24 and protein folding and unfolding.25-40 Over the past decade, the use of AFM to investigate the folding and unfolding dynamics of proteins has provided a tremendous amount of new information, opening up new avenues for studying protein folding and unfolding. To identify single-molecule stretching events unambiguously, polyprotein approaches are widely used in typical protein folding/unfolding experiments41-43 in which the protein of interest is constructed into a polymer consisting of identical tandem repeats of the protein. Stretching such polyproteins at a constant velocity gives rise to unique sawtooth-like force-extension curves with each force peak corresponding to the unfolding of a protein domain in the polyprotein chain. Such polyprotein approaches allow for single-molecule force (23) Puchner, E. M.; Alexandrovich, A.; Kho, A. L.; Hensen, U.; Schafer, L. V.; Brandmeier, B.; Grater, F.; Grubmuller, H.; Gaub, H. E.; Gautel, M. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 13385. (24) Gumpp, H.; Puchner, E. M.; Zimmermann, J. L.; Gerland, U.; Gaub, H. E.; Blank, K. Nano Lett. 2009, 9, 3290. (25) Borgia, A.; Williams, P. M.; Clarke, J. Annu. Rev. Biochem. 2008, 77, 101. (26) Junker, J. P.; Ziegler, F.; Rief, M. Science 2009, 323, 633. (27) Schlierf, M.; Berkemeier, F.; Rief, M. Biophys. J. 2007, 93, 3989. (28) Schlierf, M.; Rief, M. Angew. Chem., Int. Ed. 2009, 48, 820. (29) Zhuang, X. W.; Rief, M. Curr. Opin. Struct. Biol. 2003, 13, 88. (30) Brujic, J.; Hermans, R. I.; Walther, K. A.; Fernandez, J. M. Nat. Phys. 2006, 2, 282. (31) Fernandez, J. M.; Li, H. Science 2004, 303, 1674. (32) Garcia-Manyes, S.; Dougan, L.; Badilla, C. L.; Brujic, J.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 10534. (33) Li, H.; Cao, Y. Acc. Chem. Res. 2010, 43, 1331. (34) Li, H. B. Adv. Funct. Mater. 2008, 18, 2643. (35) Li, H. Org. Biomol. Chem. 2007, 5, 3399. (36) Brockwell, D. J. Curr. Nanosci. 2007, 3, 3. (37) Carrion-Vazquez, M.; Oberhauser, A.; Diez, H.; Hervas, R.; Oroz, J.; Fernandez, J.; Martinez-Martin, A. Advanced Techniques in Biophysics; Arrondo, J., Alonso, A., Eds; Springer-Verlag: Berlin, 2006; p 163. (38) Puchner, E. M.; Gaub, H. E. Curr. Opin. Struct. Biol. 2009, 19, 605. (39) Oberhauser, A. F.; Carrion-Vazquez, M. J. Biol. Chem. 2008, 283, 6617. (40) Samori, B.; Zuccheri, G.; Baschieri, R. ChemPhysChem 2005, 6, 29. (41) Carrion-Vazquez, M.; Oberhauser, A. F.; Fowler, S. B.; Marszalek, P. E.; Broedel, S. E.; Clarke, J.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 3694. (42) Steward, A.; Toca-Herrera, J. L.; Clarke, J. Protein Sci. 2002, 11, 2179. (43) Dietz, H.; Bertz, M.; Schlierf, M.; Berkemeier, F.; Bornschlogl, T.; Junker, J. P.; Rief, M. Nat. Protoc. 2006, 1, 80.
Published on Web 11/30/2010
Langmuir 2011, 27(4), 1440–1447
Cao and Li
Figure 1. Schematics of force-clamp experiments and the forceclamp spectrum (red trace). (i) A polyprotein is picked up from the solid substrate by an AFM cantilever and held at a constant stretching force. (ii) The unfolding of a protein domain gives rise to the stepwise elongation of the polyprotein molecule. (iii) The length of the polyprotein molecule will remain constant after all of the domains are unfolded. The length-time profiles of polyproteins show a characteristic staircaselike appearance, where each step corresponds to the mechanical unfolding of an individual domain in the polyprotein chain. As shown in the schematic length-time curve, we define the time interval between consecutive unfolding events as the dwell time (td) and the time between zero and the unfolding event as the survival time (ts).
spectroscopy to be performed in a well-controlled fashion and have made it possible to interpret data without any ambiguity. However, in such experiments, the force that is applied to the protein changes as a function of the extension of polyproteins in a complex manner. Such a complex evolution of force makes it challenging to derive an analytical solution for extracting unfolding kinetic parameters of proteins. Instead, kinetic parameters of mechanical unfolding reactions are often obtained by using Monte Carlo simulation44-46 or numeric fitting.47-49 Moreover, the compliance of the polyprotein chain varies with the number of folded protein domains in the polyprotein and the linker length, resulting in the dependence of the unfolding force on the number of protein domains in the polyprotein chain, the so-called N effect, in constant-velocity singlemolecule force spectroscopy experiments.50 The development of force-clamp and force-ramp spectroscopy has made it possible to stretch proteins under a predefined constant force or a force increasing at a constant rate.31,51 These new methods use a force feedback loop to precisely control the force being applied to a protein molecule and monitor its length changes over time. Stretching a polyprotein in force-clamp mode results in a staircase-like length-time profile where each step (44) Rief, M.; Gautel, M.; Oesterhelt, F.; Fernandez, J. M.; Gaub, H. E. Science 1997, 276, 1109. (45) Rief, M.; Gautel, M.; Schemmel, A.; Gaub, H. E. Biophys. J. 1998, 75, 3008. (46) Oberhauser, A. F.; Marszalek, P. E.; Erickson, H. P.; Fernandez, J. M. Nature 1998, 393, 181. (47) Williams, P. M.; Fowler, S. B.; Best, R. B.; Toca-Herrera, J. L.; Scott, K. A.; Steward, A.; Clarke, J. Nature 2003, 422, 446. (48) Brockwell, D. J.; Beddard, G. S.; Paci, E.; West, D. K.; Olmsted, P. D.; Smith, D. A.; Radford, S. E. Biophys. J. 2005, 89, 506. (49) Schlierf, M.; Rief, M. J. Mol. Biol. 2005, 354, 497. (50) Zinober, R. C.; Brockwell, D. J.; Beddard, G. S.; Blake, A. W.; Olmsted, P. D.; Radford, S. E.; Smith, D. A. Protein Sci. 2002, 11, 2759. (51) Oberhauser, A. F.; Hansma, P. K.; Carrion-Vazquez, M.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 468.
Langmuir 2011, 27(4), 1440–1447
Article
corresponds to a mechanical unfolding event of a protein domain (Figure 1). Because all of the domains are independent and experience the same stretching force, the unfolding traces of a polyprotein can be regarded as the superposition of individual unfolding events, as suggested by Fernandez and co-workers.51,52 Therefore, the unfolding probability or unfolding probability density can be directly measured, and the unfolding rate constant can be readily determined at a given stretching force. This led to two widely used data-analysis methods: length average52,53 and survival time.51,54 In these two methods, the number of domains in the polyproteins is not considered. Recently, Fernandez and co-workers introduced a dwell-time analysis method to test the two-state Markovian process, which considers the total number of domains in the polyproteins and the order of unfolding events.53 In this method, the dwell time is defined as the survival time. In a different effort, we proposed a pseudo-dwell-time method, where the dwell time is defined as the time interval between consecutive unfolding events in force-clamp experiments.55 Apparently, these two methods explicitly considered the number of domains in polyproteins, and the results showed that the number of domains in the polyprotein affects the distribution of each dwell time that is categorized with respect to the order of unfolding events and the total number of domains, raising the question of whether there is a similar N effect in forceclamp experiments on polyproteins and whether these two methods are equivalent to the length-average and survival-time methods. Similar questions also exist for force-ramp experiments. In this article, we reconcile these issues by showing that these different data-analysis methods are equivalent and measure the same unfolding kinetics. We show that the unfolding kinetics of proteins in force-clamp experiments is not affected by the number of domains in the polyprotein and there is no N effect in forceclamp or force-ramp experiments. Using simulated data sets, we also compared the accuracy and versatility of these different methods in some representative experimental situations (e.g., for different numbers of protein domains, small data pools, the absence of some initial unfolding events because of the presence of nonspecific interactions at the beginning of the traces, and the absence of late unfolding events because of the detachment of molecules from the cantilever or substrate). These practical considerations will help to optimize the experimental conditions in single-molecule force spectroscopy experiments to characterize the unfolding energy landscape of proteins.
Materials and Methods Monte Carlo Simulation. The simulated data set for the mechanical unfolding of polyproteins using force-clamp spectroscopy was generated using a Monte Carlo procedure as described.41 The protein unfolding is modeled as a two-state process, and the unfolding rate constant depends on the stretching force following the Bell-Evans model,56,57 R(F) = R0 exp(FΔxu/ kBT), where R(F) is the unfolding rate constant for a force F, R0 is the spontaneous unfolding rate constant in the absence of a stretching force, 4xu is the unfolding distance between the native state and the mechanical unfolding transition state, kB is the Boltzmann constant, and T is the temperature. The parameters (52) Schlierf, M.; Li, H.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 7299. (53) Brujic, J.; Hermans, R. I.; Garcia-Manyes, S.; Walther, K. A.; Fernandez, J. M. Biophys. J. 2007, 92, 2896. (54) Kuo, T. L.; Garcia-Manyes, S.; Li, J.; Barel, I.; Lu, H.; Berne, B. J.; Urbakh, M.; Klafter, J.; Fernandez, J. M. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 11336. (55) Cao, Y.; Kuske, R.; Li, H. Biophys. J. 2008, 95, 782. (56) Bell, G. I. Science 1978, 200, 618. (57) Evans, E.; Ritchie, K. Biophys. J. 1997, 72, 1541.
DOI: 10.1021/la104130n
1441
Article
Cao and Li
used in the Monte Carlo simulations are R0 = 0.0004 s-1, Δxu = 0.25 nm, T = 298 K, Δt = 0.0001 s, and F = 140 pN. The total number of domains is 10, and the total number of traces generated is 2000. This data pool was then used for all of the data analysis in this article. To evaluate the effect of the total number of domains on the unfolding rate, we used different numbers for the total number of domains and kept the remaining parameters unchanged. Data Processing and Fitting. All of the data analysis was done using custom-written procedures in Igor 5.1 (Wavemetrics). For the length-average method, the unfolding traces were first normalized against the total length changes (lengthfinal - lengthinitial). Then the normalized curves were averaged to obtain the final lengthaverage traces. Exponential fitting to the length-average traces gave rise to the unfolding rate of the protein. The survival time and pseudo dwell time were calculated using the method specified in the Results section. Then, the histograms of both the survival time and pseudo dwell time were calculated with a bin size of 0.08 s. These histograms were fit to a single-exponential function to obtain the unfolding rate constant. For the force-ramp experiments and constant-velocity experiments, the unfolding probability densities were calculated using a bin size of 5 pN. Estimation of Fitting Errors. To estimate the fitting errors of different sample sizes, we first randomly select a desired number of traces from the data pool with a total of 2000 traces. Then the selected subdata set was analyzed using length-average, survivaltime, and pseudo-dwell-time methods to extract the unfolding rate. This procedure was repeated 50 times to get a set of unfolding rates with the same data size. The average and standard deviation of the 50 unfolding rates were used to estimate the quality of the fitting.
Results and Discussion Brief Introduction to Different Analysis Methods. AFMbased single-molecule force-clamp spectroscopy uses a force feedback system to maintain the constant force at the preset value by varying the length of the proteins being stretched.51 As shown in Figure 1, the unfolding of a polyprotein in force-clamp mode gives rise to a staircaselike unfolding pattern with each step corresponding to the unfolding of a protein domain in the polyprotein. Here we define the time from the beginning (t0) to the kth unfolding event as the survival time of the kth domain (ts,N,k), where N is the total number of domains being stretched. We define the dwell time (td,N,k) of the kth domain as the time interval between the (k - 1)th and the kth unfolding events. Different methods have been developed to analyze forceclamp data on polyproteins, and we focus on four of them: length-average, survival-time, dwell-time, and pseudo-dwelltime methods. The length-average method is based on the reasoning that because individual domains in the polyprotein are identical and independent, the averaged normalized length Ln(t) is equivalent to the unfolding probability Ps(t).51 For two-state processes, Ln ðtÞ ¼ Ps ðtÞ ¼ 1 - e - Rt
ð1Þ
Similarly, the distribution of the survival time equals the probability density of unfolding at time t. fs ðtÞ ¼ Re - Rt
ð2Þ
The survival-time method fits the distribution of survival time to eq 2 to extract the unfolding rate at a given stretching force. For the dwell-time analysis proposed by Fernandez et al.,53 the dwell time of the kth domain is defined as the survival time of the kth domain. It is of note that this definition is different from the 1442 DOI: 10.1021/la104130n
arrival time used in a classical Poisson process.58 The unfolding probability of the kth domain to unfold at time t is Ps ðt, N, kÞ ¼ ¼
! N ðP ðtÞÞk ð1 - P ðtÞÞN - k s s k N! ðPs ðtÞÞk ð1 - Ps ðtÞÞN - k k!ðN - kÞ!
ð3Þ
The probability density of the unfolding of the kth domain is the probability of having exactly k - 1 domains unfolded at time t multiplied by the unfolding probability of the kth domain in the [t, t þ dt] time interval, which equals (N - k þ 1)R dt. fs ðt, N, kÞ ¼ Ps ðt, N, k - 1ÞðN - k þ 1ÞR ! N ¼ ðPs ðtÞÞk - 1 ð1 - Ps ðtÞÞN - kþ1 ðN - k þ 1ÞR k-1 ¼
N! ð1 - e - Rt Þk - 1 e - ðN - kþ1ÞRt ðN - k þ 1ÞR ðk - 1Þ!ðN - ðk - 1ÞÞ!
ð4Þ This equation is normalized from zero to infinity over t. In the unfolding probability density of the kth domain out of N domains, N is explicitly considered in eq 3. This is different from the gamma function used to describe the classical Poisson processes.58 For pseudo-dwell-time analysis, the dwell time of the kth domain is defined as the time interval between the kth and (k þ 1)th domain,55 which is the same definition as the arrival time in a classical Poisson process.58 When considering the number of domains in the polyprotein, the unfolding probability of the kth domain of a polyprotein with a total of N domains is Pd ðt, N, kÞ ¼ 1 - e - ðN - kþ1ÞRt
ð5Þ
and the probability density function is fd ðt, N, KÞ ¼ ðN - k þ 1ÞRe - ðN - kþ1ÞRt
ð6Þ
The pseudo dwell time t0 is then defined as t0 = (N - k þ 1)t, and eq 6 can be rewritten as fd ðt 0 , NÞ ¼ Re - Rt
0
ð7Þ
The unfolding rate constant can then be obtained by fitting eq 7 to the distribution of the pseudo dwell time. In all four methods, only the length-average method uses the probability function to fit the experimental data. The other three methods extract the unfolding kinetics by fitting the probability density function to the distribution of time. As the first attempt to determine whether these methods are equivalent and lead to the correct estimate of unfolding rate constant, we synthesized a data pool of 2000 unfolding traces (10 unfolding events in each traces) using a standard Monte Carlo simulation as described in the Materials and Methods section. As shown in Figure S1, all four methods can adequately recover the input unfolding rate constant of 2.04 s-1 at a stretching force of 140 pN. Moreover, the equations can also reproduce the distribution of the experimental data, which is a validation of these data-analysis methods. No N Effect in Force-Clamp Experiments. In all four dataanalysis methods, dwell-time and pseudo-dwell-time methods (58) Ross, S. M. Stochastic Processes, 2nd ed.; Wiley: New York, 1996.
Langmuir 2011, 27(4), 1440–1447
Cao and Li
Article
explicitly consider the number of domains in the polyprotein. The probability density functions (eqs 4 and 6) contain N and are different from the corresponding functions in a classical Poisson process. In contrast, length-average and survival-time methods are based on the assumption that the unfolding traces of a polyprotein can be regarded as the superposition of individual unfolding events, and N is not in the probability and probability distribution functions. It is of note that the survival time of the kth domain out of N domains follows eq 4.53 If the dwell-time and survival-time methods are equivalent, then summing all the survival times (from 1 to N) for eq 4 should lead to the same unfolding probability density function, fs(t) = R exp(-Rt). Here we provide a mathematical proof for this equivalency. fs ðt, NÞ ¼
N X
fs ðt, N, kÞ
k¼1
! N ¼ ðPs ðtÞÞk - 1 ð1 - Ps ðtÞÞN - kþ1 ðN - k þ 1ÞR k 1 k¼1 ! N -1 X N ¼ ðPs ðtÞÞk - 1 ð1 - Ps ðtÞÞN - kþ1 ðN - k þ 1ÞR k 1 k-1¼0 N X
! N ðPs ðtÞÞj ð1 - Ps ðtÞÞN - j ðN - jÞR j j ¼0 ! N X N ðP ðtÞÞj ð1 - P ðtÞÞN - j NR ¼ s s j j ¼0 ! N X N ðP ðtÞÞj ð1 - P ðtÞÞN - j jR s s j j¼0 0 ! N -1 X N 1 @ ðPs ðtÞÞj - 1 ð1 - Ps ðtÞÞðN - 1Þ - ðj - 1Þ ¼ NR - NRPs ðtÞ j-1 j-1¼0
fs ðt, NÞ ¼
N X
þ 0Þ ¼ NRð1 - Ps ðtÞÞ
Because Ps(t) = 1 - e-Rt, we can write fs(t, N) = NRe-Rt. The above derivation is based on the following equation: n
ðaþbÞ ¼
N X k¼0
! N bk aN - k k
Let a = 1 - Ps(t) and b = Ps(t), hence N X j¼0
N -1 X j-1¼0
! N ðP ðtÞÞj ð1 - P ðtÞÞN - j ¼ 1 s s j
! N - 1 ðP ðtÞÞj - 1 ð1 - P ðtÞÞðN - 1Þ - ðj - 1Þ ¼ 1 s s j-1
R -Rt . Because ¥ 0 fs(N, t) dt=1, after normalization we get fs(N, t)=Re This result clearly indicated that the dwell-time method is equivalent to the survival-time and length-average methods, and the number of domains N in the polyprotein does not affect the unfolding kinetics. Hence, there is no N effect in force-clamp experiments on polyproteins. For pseudo-dwell-time analysis, the definition of dwell time is the same as that in classical Poisson processes.58 However, Langmuir 2011, 27(4), 1440–1447
Figure 2. No N effect in force-clamp experiments. The simulated data (2000 traces) with different values of N are superimposable on each other in the three different analysis methods. Therefore, it is possible to combine complete length-time traces with different N to increase the data pool without affecting the analysis.
because of finite N, different dwell times no longer follow the same exponential distribution as for classical Poisson processes. Instead, by correcting the effect of a finite number of folded domains in a polyprotein, the pseudo dwell time follows an identical distribution and there is no N effect. The results clearly show that these methods are equivalent. By choosing the appropriate experimental parameter for analysis, there is no N effect. However, it is of note that a finite number of domains in the polyprotein does make the unfolding of the polyprotein slightly different from that in classical Poisson processes in that both the survival time and dwell time are now dependent upon N. On the basis of the proof of equivalency of the methods, we can immediately draw the conclusion that we can take into account of all of the complete unfolding trajectories of different numbers of domains in the data pool for the analysis using length-average, survival-time and pseudo-dwell-time methods. An analysis of the simulated data set generated by Monte Carlo simulations confirmed this point (Figure 2). However, for a dwell-time analysis that relies on N, only trajectories of the same N can be grouped together for analysis. This can significantly limit the size of the data pool and makes this method impractical in most experiments. In the rest of the discussion, we focus on length-average, survival-time, and pseudo-dwell time methods. No N Effect in Force-Ramp Experiments. Different from force-clamp experiments, which measure the dwell time under constant force, in force-ramp experiments the force increases DOI: 10.1021/la104130n
1443
Article
Cao and Li
Figure 3. (A) The unfolding-force histogram measured by the force-ramp technique does not show an N effect. Unfolding-force histograms of individual unfolding events categorized by their order in the force-ramp data are gray and their sum is black. It is evident that the sum of all of the unfolding-force histograms is identical to the unfolding-force histogram of a single domain (in red). (B) The unfolding-force histogram measured in constant-velocity experiments shows a clear N effect. The average unfolding force depends on the number of domains in the polyprotein. Combining data with different numbers of domains may potentially lead to relatively large errors in determining the mechanical stability and mechanical unfolding kinetic parameters of proteins. The ramp rate in A is 100 pN/s, and the pulling speed in B is 400 nm/s. The rest of the parameters are the same as those reported in the Materials and Method section.
linearly and the unfolding force at which each individual domain unfolds is measured to extract the unfolding kinetics.51,52,59 If the unfolding trace of the force-ramp experiments can be considered to be the superposition of N independent single-molecule unfolding events, then the most probable unfolding force should be independent of the number of domains N. Following the same approach for force-clamp experiments, we define the unfolding probability density of the kth domain out of N domains to unfold at force F as f(k, N, F). The sum of f(k, N, F) should be equal to the unfolding probability density of a single domain, which is of the form57,60 f ðFÞ ¼
R0 FΔxu =kB T - R0 kB T=aΔxu ðeFΔxu =kB T - 1Þ e e a
ð8Þ
where a is the force-ramp rate and the other symbols are as defined as before. We also attempted to prove this point mathematically. However, such a derivation cannot be done analytically because the unfolding probability density function of the kth event is conditional on that of the (k - 1)th event and f(1, N, F) has a complex expression similar to eq 8. f ð1, N, FÞ ¼
NR0 FΔxu =kB T - NR0 kB T=aΔxu ðeFΔxu =kB T - 1Þ e e a
ð9Þ
Instead, we used Monte Carlo simulation to show that indeed the results are consistent with this statement. We generated 2000 force-ramp unfolding traces using Monte Carlo simulation and constructed the unfolding-force distribution for the ith (from 1 to N) domain in the data set. It is clear that the sum of the unfoldingforce distribution from the first domain to the Nth domain is the same as the unfolding-force distribution of a single domain measured in force-clamp experiments (Figure 3A), suggesting that unfolding trajectories measured from force-ramp experiments can be considered to be the superposition of N unfolding trajectories of a single domain under the same force-ramp condition. Therefore, we can readily combine force-ramp traces of a different number of domains to construct the unfolding-force histogram (the unfolding probability density function). The most probable unfolding force at the given ramp rate is independent of the number of domains in the polyprotein chain, and there is no N (59) Wang, M. J.; Cao, Y.; Li, H. B. Polymer 2006, 47, 2548. (60) Izrailev, S.; Stepaniants, S.; Balsera, M.; Oono, Y.; Schulten, K. Biophys. J. 1997, 72, 1568.
1444 DOI: 10.1021/la104130n
effect in the force-ramp experiment. This is in sharp contrast to the constant-velocity experiments (Figure 3B). For example, the average unfolding forces decrease from ∼300 pN at N=1 to ∼200 pN at N=10 in constant-velocity experiments. Mixing the data with different N significantly broadens the unfolding-force distribution and may lead to an incorrect estimation of the unfolding distance because the width of the unfolding-force distribution is directly related to the unfolding distance between the native state and the mechanical-unfolding transition state. What is the origin of the N effect? One intuitive thought might be that the total number of domains in the polyproteins speeds up the unfolding rate of the kth unfolding events by a factor of (N - k þ 1) because there are (N - k þ 1) domains folded in the polyprotein. However, the mathematical proof and Monte Carlo simulation showed that this is not the case. As correctly pointed out by Smith and co-workers,50 the unfolding force is directly related to the effective loading rate at the point of rupture. It is the difference in the loading rate that causes the N effect. In constantvelocity experiments, the force increases with the length of the molecule and the length of the molecule depends on the total number of domains (N) and the number of unfolded domains in the polyprotein (k). Therefore, the loading rate is a function of N. A larger value of N results in a smaller average loading rate. However, in force-ramp and force-clamp experiments, the loading rates of all of the domains are identical and independent of N. To extend this point, as long as the force feedback system is used, any waveform of force, linear or nonlinear, can be used in forceclamp experiments and no N effect will be present. This point is of critical importance in defining the mechanical stability of a protein. Mechanical stability is often defined as the average unfolding force at a given pulling velocity in constantvelocity experiments. However, because the effective loading rate in constant-velocity experiments is affected by many factors such as the polyprotein construct (with different values of N and different linker lengths) and the cantilever spring constant, the mechanical stability of the same protein measured in different laboratories is often quite different, making the comparison of protein mechanical stability difficult. However, force-ramp spectroscopy experiments can effectively circumvent this problem, making these the ideal choice in single-molecule force spectroscopy experiments. Hence, we propose that the mechanical stability of proteins should be reported as the average unfolding force at a given ramp rate in force-ramp spectroscopy experiments to facilitate the comparison of results from different laboratories. Langmuir 2011, 27(4), 1440–1447
Cao and Li
Article
Figure 4. Comparison of the accuracy and precision of different data-analysis methods. The accuracy of three different data-analysis methods was estimated by comparing the unfolding rate constants obtained from different analysis methods and the input number (A). There is no significant difference among the three methods. However, these three different analysis methods have distinct precision, as shown in the comparison of the standard deviation of the unfolding rate constant (B). The length average is the most precise method with smaller fitting errors. Therefore, it requires a smaller data set than other methods to estimate the unfolding rate constant accurately. A number (100-200) of unfolding events are sufficient to obtain the unfolding rate constant with less than 5% error.
Length-Average Method May Be More Suitable for Extracting the Unfolding Rate from Small Data Pools. Because of the difficulties in single-molecule experiments, it is challenging to get an experimental data pool as large as the simulated data pools. How much data is sufficient to give an accurate estimation of the unfolding rate constant? Are the fitting errors for all three data-analysis methods similar? To address these issues, we estimated the fitting errors from simulated data pools of different sizes. We randomly selected the desired number of traces from a total of 2000 traces and obtained the unfolding rate constant using the three different methods. This process was repeated 50 times to compare the mean and standard deviation of the unfolding rate constant from different analysis methods. As shown in Figure 4A, the average unfolding rate constants from the 50 trials are around the input number of 2.04 s-1 for all three methods, which again indicates the correctness of these analysis methods and the accuracy of the fittings. We used the standard deviation of the unfolding rate constant from 50 trials as the indicator for the precision of the fit. The length-average method gave much smaller deviations than the other two methods (Figure 4B). Less than 5% error in the unfolding rate constant was obtained when using around 100 unfolding events whereas for the survival-time and pseudo-dwell-time methods obtaining less than 10% error in the unfolding rate constant required the use of around 400 unfolding events (dotted line). This finding is somehow unexpected but reasonable. Among these three methods, only the length-average method calculates the unfolding probability at different times instead of the probability density. In the length-average method, the data at each t is essentially integrated from time 0 to t, which increases the effective data pool and reduces the error in fitting. Choice of the Starting Time Does Not Affect the Estimated Unfolding Rate Constant. In single-molecule AFM experiments, the beginning parts of the force spectroscopy traces are often “contaminated” by features associated with nonspecific interactions, making it difficult to determine the starting time, t0, for each trace. To examine whether the choice of starting time will lead to the deviation of the estimated unfolding rate constant from the actual value, we changed the t0 for all 2000 simulated traces to t00 = 0.2 s. Such a change makes some unfolding traces have less than 10 unfolding events because the unfolding events in between t0 and t00 are missed. Then, these new data were analyzed using all three methods. Such changes did not have any effect on Langmuir 2011, 27(4), 1440–1447
Figure 5. The choice of starting time does not affect the analysis of unfolding kinetics. When the starting time t0 was arbitrarily offset to 0.2 s in the original staircase data, the analysis of the unfolding kinetics using the three different methods led to the correct unfolding rate constant.
the estimated unfolding rate constant for all three methods, as shown in Figure 5. This reflects important features of a Poisson DOI: 10.1021/la104130n
1445
Article
Cao and Li
Figure 7. Missing slow unfolding events due to a short observation time window or detachment of the polyprotein molecule either from the cantilever or the substrate may bias the obtained unfolding rate .
Figure 6. Missing fast unfolding events may bias the data analysis. (A) Initial fast unfolding events may be missed if an improper t0 is chosen in the analysis. (B) Length-average, (C) survival-time, and (D) pseudo-dwell-time methods; black lines represent the data with missing fast unfolding events (t = 0.02 s is the cutoff), and red lines and gray lines correspond to single-exponential fitting. (C, D) Gray lines are fits to all data, and red lines omit the first few data points in the fitting.
process: no memory and history-independent. This is important for experiments because we can effectively include trajectories contaminated by the nonspecific interactions at the beginning of the unfolding trajectories. However, such convenience may not be applicable to unfolding processes that are non-Poisson in nature. Missing Initial Fast Unfolding Events Affect the Estimated Unfolding Rate Constant. We may also face a difficult situation in choosing the starting point of the traces for real 1446 DOI: 10.1021/la104130n
experimental data. Because of the limited response of the forcefeedback loop, the first few fast unfolding events may be missed. This leads to the wrong estimation of the starting point for those traces, as shown in Figure 6A. In this situation, the wrong t0 does not affect all traces but only those with fast initial unfolding events.54,55 We used the simulated data to test whether this biases the estimated unfolding rate constant. We artificially changed the starting points of all traces to the position of the first unfolding event with a dwell time longer than 0.03 s. Although the force feedback for state-of-the-art AFM instruments can be as fast as 1-3 ms,54 we choose 0.03 s as the threshold just to make the effect obvious. As shown in Figure 6B-D, because of missing fast unfolding events, the data for all three analysis methods deviated from a single-exponential distribution. Simply applying a singleexponential fit to the data will cause significant bias in the input unfolding rate constant (red line in Figure 6B and gray lines in Figure 6C,D). However for both survival-time and pseudo-dwelltime methods, if we omit the first few bins of the distribution we can still get the correct unfolding rate constant from the data (red lines in Figure 6C,D). For force-clamp experiments carried out at a high clamp force, the unfolding of protein domains can be very fast at the beginning of the traces. Therefore, choosing a suitable data-analysis method is very important. It is worth noting that missing short unfolding events have been reported in many forceclamp experiments.54,55 To obtain unbiased data, the response Langmuir 2011, 27(4), 1440–1447
Cao and Li
speed of the force-feedback loop may determine the limit of the clamped force in force-clamp experiments used to obtain unbiased data. Including Unfinished Traces Significantly Affects the Estimated Unfolding Rate . In single-molecule AFM experiments, another potential error comes from including unfinished traces in the data analysis. This may occur quite often in low-force force-clamp experiments that require a long time for all of the domains in the polyprotein to unfold. However, if the polyprotein is detached from either the cantilever or the substrate before all domains in the polyprotein unfold, then the total number of domains in the polyprotein is also unknown. In these scenarios, the total N in the polyprotein is unknown. Because we have proven that there is no N effect in force-clamp experiments, does missing some unfolding events affect the estimated unfolding rate ? Because the missed events mainly occur at long survival times, from intuition we may expect that including unfinished traces may bias the unfolding rate to higher value. We then used the simulated data to demonstrate how significant this effect could be for all three data-analysis methods. We randomly detached half of the unfolding traces at time t=1.5 s to mimic the real experimental condition. As shown in Figure 7, including unfinished traces significantly made the estimated unfolding rate s for all three methods faster than the real value. However, among all three methods, the survival time can still provide a reasonably good estimation of the unfolding rate with an error of less than 10%. It is worth noting that if the number of domains in the polyproteins is different, then including unfinished traces may bias the data toward a single-exponential model and show an artificial multiple-exponential distribution. This may result in a wrong interpretation of the unfolding mechanism of proteins. Therefore, excluding the unfinished traces in the data analysis is critical for single-molecule force-clamp experiments. Moreover, for the pseudo-dwell-time method, if the total number of domains in the polyprotein being stretched is known (e.g., in a refolding experiment), then the unfinished traces can still be used for the analysis, as we demonstrated previously.55
Langmuir 2011, 27(4), 1440–1447
Article
Conclusions Force-clamp (including force-ramp) spectroscopy has played an increasingly important role in the study of protein unfolding dynamics. The development of different data-analysis methods greatly facilitates the extraction of the unfolding kinetics of proteins from force-clamp experiments. Here, by combining mathematical derivation and Monte Carlo simulation, we demonstrated that these commonly used methods of analyzing force-clamp data are equivalent and provide an accurate estimation of the unfolding rate of proteins. Moreover, we showed that the force-clamp and forceramp experiments do not have any N effect, making the extraction of unfolding kinetics from them easier than that from constantpulling-velocity experiments. Then, we showed that the accuracy and precision of these methods under some practical conditions can vary because of the different mathematical treatment of the experimental data. This comparison will help to optimize the experimental conditions in single-molecule force spectroscopy experiments conducted to characterize the unfolding energy landscape of proteins. It is of note that the data-analysis methods available for force-clamp experiments are not limited to the ones discussed in this article. Other methods have also been developed.61,62 Further developments of new analysis methods will be of great importance in extending the use of force-clamp techniques to analyze more complex unfolding kinetics of proteins that deviate from simple two-state unfolding kinetics. Acknowledgment. This work is supported by the Natural Sciences and Engineering Research Council of Canada, the Canada Foundation of Innovation, and the Canada Research Chairs Program. Supporting Information Available: Plots of data from four different analysis methods. This material is available free of charge via the Internet at http://pubs.acs.org. (61) Bura, E.; Klimov, D. K.; Barsegov, V. Biophys. J. 2007, 93, 1100. (62) Dudko, O. K.; Hummer, G.; Szabo, A. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 15755.
DOI: 10.1021/la104130n
1447