ARTICLE pubs.acs.org/JPCB
Boosting Protein Dynamics Studies Using Quantitative Nonuniform Sampling NMR Spectroscopy Yoh Matsuki,† Tsuyoshi Konuma,‡ Toshimichi Fujiwara,† and Kenji Sugase*,‡ † ‡
Institute for Protein Research, Osaka University, Osaka, Japan, Bioorganic Research Institute, Suntory Foundation for Life Sciences, Osaka, Japan
bS Supporting Information ABSTRACT: NMR spectroscopy is uniquely suited to study protein dynamics over a wide range of time scales at atomic resolution. However, existing NMR relaxation methods require highly serial, lengthy data collection, ultimately limiting their application to short-lived samples, such as proteins in living cells. In recent years, the utility of nonuniform sampling (NUS) NMR methodologies has been increasingly recognized, but their application has been rare in relaxation measurements where highly accurate spectral quantification is demanded. Recently, Matsuki et al. developed a new NUS-processing method, SIFT (Spectroscopy by Integration of Frequency and Time domain information), which is highly robust and faithful in reproducing signals. In this work, we demonstrate the gains that are possible with more aggressive use of frequency domain information than was employed previously. This improvement is crucial for SIFT to be used in accelerating relaxation measurements while preserving full analytical accuracy. By taking the KIX domain of mouse CREB-binding protein (CBP) as an example, we demonstrate that this quantitative NUS processing method enables total 10-fold expedition of the R2 relaxation dispersion measurements. The advanced SIFT processing should be equally useful for other NMR relaxation measurements.
’ INTRODUCTION Protein dynamics plays an essential role in many biological processes, including ligand binding, enzyme catalysis, molecular recognition, and protein folding.1 Nuclear magnetic resonance (NMR) spectroscopy is uniquely suited to investigate protein dynamics over a wide range of time scales at atomic resolution. Accordingly, a variety of methods are available for the measurements of nuclear relaxation rates (R1, R2, heteronuclear NOE)2 for elucidating local molecular flexibility, relaxation dispersion profiles for (conformational) exchange parameters,3 and1H exchange rates (using CLEANEX-PM,4 and/or 1H/2H exchange5) for solvent accessibility. A common drawback with these methods, however, is their lengthy data collection, which ultimately limits the application of the methods to short-lived or low-sensitivity dilute samples typical of proteins that aggregate quickly or are difficult to express in large quantities. One of the most striking examples is an NMR study using living-cells, or in-cell NMR. In-cell dynamics studies are currently prohibited due to the short lifetime of proteins (3 s) to enhance the accuracy of measured peak intensities. The NMR-based dynamics studies use peak intensities as quantitative sources of information on protein motions, thus they are more stringent about sample stability than other classes of measurements such as those for signal assignments, for which a bare detection of peaks suffices, and partial degradation or precipitation during the measurement does not affect the assignments for a protein consisting of the major part of the sample. Due to these challenges, most of the successful protein dynamics studies so far have been limited to constructs optimized to provide enough sensitivity and stability. The approach that extensively reduces the bandwidth beyond the Nyquist bandwidth to speed the data collection is not generally applicable here, although it has been successfully demonstrated for small molecules.9 In this approach, the avoidance of the peak Received: August 23, 2011 Revised: October 11, 2011 Published: October 12, 2011 13740
dx.doi.org/10.1021/jp2081116 | J. Phys. Chem. B 2011, 115, 13740–13745
The Journal of Physical Chemistry B overlap due to aliasing is not always guaranteed because of the random nature of the peak positions, and will be particularly problematic for large protein systems. Moreover, the smaller bandwidth linearly degrades the sensitivity of the measurement, and is thus difficult to apply to dilute protein samples, which is the aim in this work. In recent years, the utility of nonuniform sampling (NUS) for faster acquisition of NMR data has been increasingly recognized with the advent of a number of efficient NUS-processing methods, such as nonlinear Fourier transform (NLFT),10,11 maximum entropy (MaxEnt)12,13 and multi-dimensional decomposition (MDD).14,15 Although the use of NUS NMR is promising, its application has been limited for protein dynamics studies. This is partly because the success of the studies highly hinges upon accurate quantification of peak intensities, a situation that has long been under discussion for the existing NUS-processing tools. Although NUS with MaxEnt or MDD has been used in a semiquantitative manner for protein structure determination,6,16 quantification of NOE peaks is one of the examples where only rough quantification suffices without serious degradation of the final structures. This is in contrast to protein dynamics measurements that entail direct and precise fitting of peak intensity variations along the serial measurements with theoretical curves to determine dynamics parameters and their confidence limits. Recently, Matsuki et al. developed a new NUS-processing method, SIFT (Spectroscopy by Integration of Frequency and Time domain information),17,18 which is highly robust and faithful in reproducing signals. NUS together with SIFT-processing has been shown to achieve for a decaying signal either higher sensitivity in a given measurement time or faster data acquisition for the same sensitivity.18 While the sensitivity gain has been demonstrated in previous applications of SIFT,17,18 we here show that significant acceleration of relaxation measurements is possible with NUS and SIFT. For this new application of SIFT, we demonstrate a more aggressive use of frequency domain information for SIFT so that it can process NUS data even when the data are recorded without oversampling in the indirect dimension. This improvement is crucial in accelerating the measurement while preserving the quantitative accuracy and reliability of the processing result. The reliability of the measurement is of central importance in the relaxation measurements. With the advanced SIFT processing, we demonstrate a total 10-fold expedition of R2 relaxation dispersion measurements without appreciably compromising the analytical accuracy. The acceleration is afforded by minimizing the number of data samples using NUS (2.5-fold) and the number of scans per FID (4-fold) utilizing the robustness of SIFT against noise. The new SIFT procedure is equally valid for other NMR relaxation measurements.
’ MATERIALS AND METHODS New SIFT Procedure. SIFT is based on the classical errorreduction (GerchbergPapoulis) algorithm19,20 that has been extensively studied and used in picture processing and medical imaging area.2123 The algorithm is an iterative technique based on Fourier/inverse-Fourier transformations, while imposing a priori constraints at each iteration. In this way, information across the domains is integrated without any biasing model or parameters. In the application to NUS NMR spectroscopy, the prior information is about “dark” spectral points, or regions known to be devoid of any signal, and the NUS time-domain data experimentally collected. The SIFT cycle can be consicely formulated for a one-
ARTICLE
Figure 1. A cartoon illustrating time- and frequency-domain data, f(t) and F(ω), respectively, paired with the Fourier transform matrix in the cases of Nyquist sampling (a) and twice-oversampling NUS (b). Gray rectangles represent known or measured data points either in the frequency or time-domain whose information content can be integrated via SIFT. In panel b, four dark spectral points are produced at the spectral edges due to the twice-wider bandwidth (on the left-hand side of the equation), which are used to compensate the information deficiency due to NUS in the time domain (on the right-hand side of the equation).
dimensional (1D) case as f k þ 1 ðtÞ ¼ f 0 ðtÞ þ PðtÞF1 DFf k ðtÞ
ð1Þ
where f p(t) with p = 0 or p > 0 denotes the NUS data collected in an experiment, or the data in the pth SIFT cycle, respectively. F and F1 are the Fourer and inverse-Fourier transform operators, D is the spectral masking operator that zeros all the dark spectral points, and P(t) is the NUS operator, whose element is 0 or 1 for the sampled or unsampled time grid, respectively. The step-by-step description of the data treatment is given in a previous publication.18 SIFT reinstates unsampled time-domain points as much as the information given by “dark” spectral points by integrating this frequency-domain information back into the time domain. Thus, the processing is most accurate and reliable when the number of dark spectral points exceeds the number of unmeasured elements in the NUS time domain. In this most conservative sampling regime, called the above-critical sampling condition, improvement of overall sensitivity of NMR measurements has been shown previously.17,18 In the previous applications of SIFT to NMR data, dark spectral regions were produced by an intentional oversampling in the indirect acquisition dimension,18 or existed due to oversampling for other experimental necessity.17 However, as long as SIFT relies only on the dark information produced by oversampilng, the experimental time can never be saved as compared with the experiments conducted without oversampling as illustrated in Figure 1. Oversampled wider bandwidth produces known dark spectral points, but produces also the same number of unknowns in the time domain due to a smaller Nyquist grid. Namely, overall number of time-domain measurments, shown by gray circles in Figure 1, is invariant. To expedite the measurement, which is the primary goal in this work, one is required either to further thin out the sampling schedule, venturing into the subcritical sampling condition as opposing to the above-critical sampling, or to recruit dark information in a way other than oversampling. Here, we extended the original SIFT program18 so that it can find dark points within a fully native spectrum taken at the Nyquist bandwidth, i.e., without oversampling in the indirect 13741
dx.doi.org/10.1021/jp2081116 |J. Phys. Chem. B 2011, 115, 13740–13745
The Journal of Physical Chemistry B
Figure 2. The reference spectrum of KIX. Fourier transform of fully acquired data (a), and Fourier transform of NUS data before (b) and after (c) SIFT processing. The rectangles in (a) define the peakcontaining “bright” spectral region. The lowest contour level is set to 6% of the highest peak in each panel. The negative contours are shown in red. A representative slice taken at F2 = 8.005 ppm (indicated by a dashed blue line in a, b, and c) is shown in panel d for the full (top), post-SIFT (middle), and pre-SIFT (bottom) spectra in the same vertical scale.
dimension, by adaptively defining the peak-containing regions as many as the number of peaks in the spectrum. A key idea is that in R2 dispersion measurements and many other NMR experiments for dynamics studies, one is interested in accurately measuring the change in peak intensities relative to those in the reference spectrum. Because signal positions are known from the reference spectrum taken in the beginning (Figure 2a), and invariant over the serial measurements, one could use all the dark spectral points existing between NMR peaks for SIFTing to accelerate all the subsequent measurements. This is a demonstration of the scout-and-SIFT approach delineated in the original paper.18 We note that the same expedition is achievable for the measurement regardless of the number of 2D data sets involved in the series. We also emphasize that our method has an important difference from the spectral thresholding method:24 SIFT does not risk throwing away the peak intensity smaller than an arbitrarily set threshold. Sample Preparation. Uniformly 2H- and 15N-labeled KIX domain (residues 586672) of mouse CREB-binding protein (CBP) was expressed in BL21-DE3 cells grown in M9 minimal medium and purified by reverse-phase high-performance liquid chromatography (HPLC).25 The protein was dissolved in NMR buffer [95% H2O/5% D2O, 20 mM Tris-d11-acetate-d4 (pH 5.5 at 25 °C), 50 mM NaCl, 2 mM NaN3] and concentrated to 500 μM. Protein concentration was determined from the absorbance at 280 nm, using an extinction coefficient of 12.95 mM1 cm1. NMR Measurements. 15N R2 relaxation rates were measured on Bruker BioSpin 600 and 750 MHz spectrometers at 25 °C using relaxation-compensated constant-time CarrPurcell MeiboomGill (rcCPMG) pulse sequences with a constant relaxation time of TCPMG = 40 ms.26,27 The spectral width of the indirect dimension was set to a conventional Nyquist width of approximately 25 ppm so that the backbone amide peaks are not folded, and no oversampling was applied to the indirect acquisition dimension. R2 dispersion profiles were generated by measuring effective transverse relaxation rates Reff 2 as a function of τCP, the delay time between two successive 180° pulses in the CPMG pulse
ARTICLE
train. 2D data sets with 1024 64 (t2 t1) complex points were acquired at B0 = 14.1 and 17.6 T, and at τCP = 10, 5, 3.33, 2.5, 2.0, 1.66, 1.43, 1.25, 1.0, 0.83, 0.71, 0.63, 0.55, 0.50, 0.4, and 0.33 ms. Reference spectra were acquired by omitting the CPMG period. Each 2D spectrum was collected with eight scans per FID unless otherwise stated. SIFT Processing and Analysis of R2 Dispersion Profiles. Resonance peaks were automatically picked in the reference spectrum using the program Sparky, followed by manual removal of obvious junk peaks. The peak list was imported into MATLAB and used during SIFT processing to define the peak positions with a fixed box size ((0.94 by (0.07 ppm along the 15N and 1H axes, respectively) around the peak-top. The input for SIFT processing was the NUS time-domain NMR data, the NUS schedule, the peak list, and specification of the SIFT cycle number (= 8 in this study). All steps, including the specification of bright regions, SIFT cycle, final processing with apodizations and FT, and compilation of the peak intensities in 42 spectra, were fully automated, and involved absolutely no user intervention. This entire process took 0.6 min on two 2.26 GHz quad-core Intel Xeon processors. The integral peak intensities in all spectra were obtained using MATLAB as a sum of the intensities at 3 3 grid points centered on the picked peak top. The compiled peak intensities were converted into effective R2 relaxation rates (Reff 2 ) using the equation Reff 2 (τCP) = (1/TCPMG) ln(I(τCP)/I0), where I(τCP) represents the peak intensity at a particular τCP delay, and I0 represents the intensity in the reference spectrum. An error in peak intensity, εI, was evaluated from the standard deviation of the noise intensity in signal-containing F1 slices. Thus, in preSIFT spectra, the errors were dominated by strong sampling noise due to NUS. Errors in the peak intensities were then 28 propagated to those of Reff 2 as: εR = εI/(T(τCP) τCP). The R2 dispersion profiles of 55 nonoverlapping peaks were fit to the Carver and Richards equation29 derived for a two-state conformational exchange model (A T B) using the program GLOVE.8 The fitting parameters were the population-average intrinsic relaxation rate, R20, the exchange rate constants, kex (kex = kAB + kBA), the chemical shift differences between states, |Δω|, and the product of the populations of states A and B, pApB. The population of state B, designated as the lower-populated state, was calculated according to the formula pB = (1 (1 4pApB)1/2)/2. The R2 dispersion profiles recorded at the two external fields were fit simultaneously for each residue. Fits were initially performed for individual residues, and the goodness-of-fit was assessed by the reduced χ2 value (χ2 divided by the degrees of freedom). The resulting kex and pApB values were consistent for all residues with the two-state conformational exchange model, indicating that all sites experience an identical kinetic process.30 The R2 dispersion curves were therefore fit with the global parameters kex and pApB for all residues. Uncertainties in the parameters were estimated by Monte Carlo simulations, using 100 synthetic data sets generated on the basis of the experimental uncertainties.28 Root-mean-square (rms) deviation of the R2 dispersion profile obtained from NUS data from the full data “true” profile was calculated as
ΔR2eff ¼
8vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi9 u > ðR2,effFull ðτCP , B0 Þ R2,effNUS ðτCP , B0 ÞÞ2 > > > 0
> :
∑
τCP
n
> > ;
=2
ð2Þ
where n represents the number of Reff 2 rates in a profile. 13742
dx.doi.org/10.1021/jp2081116 |J. Phys. Chem. B 2011, 115, 13740–13745
The Journal of Physical Chemistry B Nonuniform Sampling. We collected 64 uniform t1 samples for all the spectra, including the reference spectrum, and then omitted the samples to emulate NUS. This allowed us to assess the discrepancy of the exchange parameters obtained from NUS data in comparison to the “true values” derived from the full data set. In practical applications, all spectra would be acquired with NUS to expedite data collection; however, for the most accurate specification of peak-containing “bright spectral regions”, it is recommended to use uniform sampling for the reference spectrum. We generated a NUS schedule as described previously,18 using a standard NUS generator.10 The NUS generator uses a stratified jittered algorithm to constrain the average distances between random sample points, and Gaussian weighting function for the probability density of the samples. The constraint tries to reduce the NUS-noise before any processing is applied, thus it has no bias for SIFT or other NUS processing methods. We used σ = 0.6 for the width of the Gaussian probability distribution; i.e., the inflection point for the decaying probability occurs at ∼60% of the maximum t1 evolution length. We used a NUS distributing 25 t1 samples over 64 Cartesian grids in the time domain with Gaussian probability density; i.e., ∼60% of the observations were omitted. The sparse level of NUS was decided from the following consideration. The proportion of the spectral dark regions in each slice along the F1 axis, or the spectral “darkness”, varied from 60% to 95% for our KIX data (Figure S1a, Supporting Information). Since SIFT can restore a NUS data set most accurately when the spectral darkness is greater than sparseness of the NUS,18 the most conservative approach would be to adopt a NUS whose sparseness is matched to the available minimum spectral darkness (∼60% in this case).
’ RESULTS AND DISCUSSION To validate the new SIFT protocol, we applied it to R2 relaxation dispersion measurements for KIX, which is a modular transcriptional coactivator known to undergo a functionally important two-state conformational exchange between the native and non-native forms.30 R2 dispersion spectroscopy, one of the most powerful tools in the field of protein dynamics, can elucidate the kinetics and thermodynamics of protein motions,3 as well as conformational information on weakly (>0.5%) and transiently (∼ms) populated states.31 The reference 15N HSQC spectrum of KIX is shown in Figure 2a. All regions outside the peak positions (rectangles in Figure 2a) were treated as dark regions for SIFT processing. With this exhaustive use of information in a native spectrum, oversampling for the indirect acquisition dimension was not necessary in this work, although it is another valid way for sourcing dark information when peak positions are not known.18 One may also combine the new procedure with oversampling. The data shown in Figure 2 demonstrate the power of SIFT processing. For the serial 2D measurements, we used a NUS that randomly distributes 25 t1 samples over 64 Cartesian grids in the time domain; i.e., ∼60% of the observations were omitted. The spectrum before SIFT, which is equivalent to that obtained with NLFT,10 was highly corrupted with NUS noise (Figure 2b). The peak intensities were not only severely suppressed due to the sparse time-domain data, but their relative intensities to each other were also upset due to the overlapping NUS noise (Figure 2d, bottom). Therefore, this spectrum is clearly not suited for the present application. After SIFT, the NUS noise was removed (Figure 2c), and the peak intensities were accurately
ARTICLE
Figure 3. R2 dispersion profiles obtained for Glu641 at B0 = 14.1 T (black) and 17.6 T (red) from the full (a), post- (b), and pre-SIFT spectra (c). The best-fit profiles are superposed in the same color. The 1 rms deviations from the true profile, ΔReff and 1.1 s1 for 2 , were 7.9 s the pre- and post-SIFT profiles, respectively. A part of the plot between the highest and lowest Reff 2 values of the full data is shaded in each panel to facilitate comparison. More examples are shown in Figure S3.
Figure 4. 15N chemical shift differences (|Δω ̅ |) for two interconverting states extracted from NUS data (vertical axis) with (a) and without (b) SIFT-processing are plotted against “true” values from the full data (horizontal axis). The coefficient of determination was 0.873 for preand 0.993 for post-SIFT data. The shaded regions in panels a and b are expanded in the panels below. For the data points labeled with a residue number, the corresponding R2 dispersion profiles are shown in Figures 3 or S3.
restored (Figure 2d, middle). Indeed, peak intensities measured in the post-SIFT spectra exhibited an excellent correlation to the “true” values observed in the full spectrum (Figure S2). SIFT took only 0.6 min for processing 42 input 2D spectra together with a full compilation of 2310 peak intensities. In addition to this unparalleled computational efficiency, the SIFT process requires neither user intervention nor parameter tweaking for the process. The total data acquisition time was compressed to 7.2 h with a recycle delay of 3 s, down from the 18 h conventionally needed on each magnetic field, with little loss in spectral resolution and accuracy of observed peak intensities. The accurate peak intensities measured in the post-SIFT spectra were translated into accurate Reff 2 values, hence the accurate τCP-dependence of Reff 2 , or the R2 dispersion profiles. As expected, the rms deviations of the R2 dispersion profiles from the “true” profiles given by the full spectra, ΔReff 2 (eq 1), were 13743
dx.doi.org/10.1021/jp2081116 |J. Phys. Chem. B 2011, 115, 13740–13745
The Journal of Physical Chemistry B
Figure 5. Comparison of the exchange rate and population obtained from NUS data set processed in various ways, relative to the “true” values obtained from the full data set shown in red. For the data labeled as “post-SIFT 1” and “post-SIFT 2”, the number of scans was eight and two, respectively. For the data labeled as “post-MDD 1” and “post-MDD 2”, the parameter sets #1 (number of scans = 8, number of component = 20, subregion size = 0.3 ppm, λ = 0.05) and #2 (number of scans = 8, number of component = 12, subregion size = 0.3 ppm, λ = 0.005) were used, respectively.
generally smaller for the post-SIFT profiles than for the pre-SIFT profiles (Table S1). Figure 3 shows representative R2 dispersion profiles obtained for Glu641. Fitting of the accurate post-SIFT profiles resulted in accurate exchange parameters. Figure 4 shows plots of chemical shift differences |Δω ̅ |(= |Δω/2πB0|) for the two interconverting states obtained by fitting the pre- or post-SIFT profiles against the “true” values. Similar correlations were obtained for the 0 intrinsic relaxation rates R02 (Figure S4). For both |Δω ̅ | and R2, uncertainties and deviations from the “true” values were larger with the pre-SIFT data. Two global parameters extracted from the post-SIFT profiles (the exchange rate, kex = 599.72 ( 4.18 s1, and the population of the state B, pB = 3.62 ( 0.02%) were in good agreement with the “true” values (kex = 600.17 ( 4.18 s1, pB = 3.56 ( 0.02%), while those from the pre-SIFT profiles (kex = 654.74 ( 22.9 s1, pB = 3.29 ( 0.09%) were inaccurate and imprecise (Figure 5). The excellent robustness of SIFT in processing noisy data as shown before17 should allow for accurate relaxation measurements in a noisier data set. With this in mind, we analyzed data recorded with only two scans per FID instead of eight, as typically employed for moderately concentrated samples or for a standard phase cycle. Despite a factor of 2 deterioration in the signal-tonoise (S/N) ratio compared with the eight-scan data shown above, the S/N ratio for the weakest signals found in the spectra taken at the longest τCP, the delay time between two successive 180° pulses in the CPMG pulse train, was more than ∼10 (median S/N was 35) with the 500 μM sample. This was high enough for accurate peak quantification, and resulted in accurate R2 dispersion profiles after SIFT processing (Figure S5), with fairly accurate dynamics parameters: kex = 592.50 ( 21.5 s1; pB = 3.59 ( 0.09% (Figure 5). Indeed, SIFT has been shown to remain quantitative for data with even lower sensitivity,17 thus it is compatible with relatively dilute samples. The approach requires the use of field gradient pulses and precisely calibrated RF pulses to suppress experimental artifacts, but yields another factor of 4 acceleration of the measurement, leading to the total measurement time of