Global Least-Squares Analysis of Large, Correlated Spectral Data

Global Least-Squares Analysis of Large, Correlated Spectral Data Sets: Application to ... A new data processing mode for Fourier Transform Pulsed-Grad...
4 downloads 0 Views 663KB Size
8180

J. Phys. Chem. 1996, 100, 8180-8189

Global Least-Squares Analysis of Large, Correlated Spectral Data Sets: Application to Component-Resolved FT-PGSE NMR Spectroscopy P. Stilbs,* K. Paulsen, and P. C. Griffiths Physical Chemistry, Royal Institute of Technology, S-100 44 Stockholm, Sweden ReceiVed: NoVember 30, 1995; In Final Form: February 7, 1996X

A new data processing mode for Fourier Transform Pulsed-Gradient Spin-Echo (FT-PGSE) data sets is described. Unlike conventional analysis methods, it uses all of the significant spectral information of a data set of typically 16 or 32 different magnetic field gradient settings for 10-1000 significant frequency channels out of a 1-16K FT-PGSE data set. The procedure is based on a global least-squares minimization approach at two levels: an upper level that optimizes the actual global self-diffusion coefficient data and a lower one that optimizes the amplitude(s) of the component(s) for a particular frequency channel. This approach relies on the intrinsic property of FT-PGSE data sets in that the whole bandshape of a particular component attenuates by exactly the same relative amount upon incrementing the field gradient pulse parameters (Stilbs, P. Anal. Chem. 1981, 53, 2135 which was also shown to provide a pathway for separating the spin-echo bandshapes of the constituents of multicomponent systems. As a consequence of the coupled, global minimization approach of the method, the signal-to-noise ratio (S/N) of the FT-PGSE experiment is enhanced by typically a factor of 10 or more, since all of the available spectral information is utilized (effectively, a few 100 frequency channels/peak are combined). The present (global) optimization procedure (named CORE-NMR, COmponentREsolved NMR spectroscopy) fundamentally differs from the diffusion-ordered spectroscopy procedure(s) introduced by Johnson et al., but the two approaches can be regarded as complementary. CORE-NMR is expected to find particular use in current studies on aggregation and binding in polymer and surfactant solutions, solving evaluation problems originating from the poor S/N, overlapping bandshapes, and high dynamic range with regard to relative constituent spectral intensities. Typically these difficulties are all present at the same time in such studies. CORE-NMR is equally well applicable to electrophoretic FT-NMR, where the signals of a particular component also vary coherently with an experimental parameter (the electrophoretic current) with regard to intensity and phase. As outlined, the generic CORE approach is of course also applicable to any other type of spectroscopic data, where individual intensities of separated or overlapping component spectral bandshapes decay/evolve in a similarly correlated manner as in, e.g., FT-PGSE NMR.

Introduction paper1

the unique intrinsic property of Fourier In a previous Transform Pulsed-Gradient Spin-Echo (FT-PGSE) data sets was pointed out and experimentally tested: for a common increment of gradient pulse parameters in the Fourier transform version2-9 of the Stejskal-Tanner pulsed-gradient10 spin-echo11 experiment, the whole absorption bandshape of a particular component will attenuate by exactly the same relative amount. It was concluded therefore1 that the spin-echo bandshapes of different components in a multicomponent sample can thus be separated, providing a pathway to “size-resolved NMR spectroscopy”.1 Computing power, data processing and data transfer possibilities for off-line processing, at that time did not allow more than crude tests of the approach. However, nonoverlapping carbon13 FT-PGSE spectra of complex samples proved clearly its validity:1 it was straightforward to assign peaks to different components in multicomponent samples, based on their common spin-echo attenuation behavior. The potential of the method, and implications thereof, were also discussed in subsection 7.1.1 of a review paper.9 Johnson et al. in a number of elegant papers have developed another approach to the problem of separating FT-PGSE bandshapes and better evaluation of the multiexponential echo decays that are characteristic of polydisperse samples, even for the single-component situation. Their approach (named Dif* To whom correspondence should be addressed. X Abstract published in AdVance ACS Abstracts, April 1, 1996.

S0022-3654(95)03560-X CCC: $12.00

fusion-Ordered SpectrocopY; DOSY ) is normally based on routines developed by Provencher et al., i.e., CONTIN12 (for continuous distributions of diffusion coefficients) and SPLMOD13,14 and DISCRETE15,16 (for discrete components). The notation “DOSY” appears to be meant to be a generic one, including all methods of separating FT-PGSE spectra on the basis of differing component diffusion rates. However all the basic concepts of “DOSY” had been anticipated and described 10 years earlier, in the form of SIR-NMR (SIze-Resolved NMR).1 Provencher’s routines were originally designed with “onedimensional” multiexponential data sets in mind, like those found for example in scattering experiments or time-resolved fluorescence depolarization. Provided these are applied in a sensible way, they appear to provide a very good approach to the difficult (and frequently rather hopeless) problem of separating multiexponentials or distributions of such. One should recall that even in a straightforward nonlinear least-squares approach to analyzing the sum of only two exponentials, it is well-known that such a strategy fails, unless the ratio between the time constants of the exponentials exceeds 2 or 3, and that the exponentials have similar total amplitudes. This is true even at signal-to-noise ratios (S/N) that should be regarded as “high” by any normal standards. In a typical DOSY approach17-19 to FT-PGSE NMR data, CONTIN (for example) is applied to selected (significant, nonzero) individual frequency channels of the Fourier-transformed echo decay (without taking advantage of the above© 1996 American Chemical Society

Global Least-Squares Analysis of FT-PGSE NMR Data Sets mentioned1 global behavior of the component echo attenuation with incremented magnetic field gradient parameters). After some interpolation and smoothing, the combined results are displayed in a two-dimensional (2D) manner, where the second axis is the “diffusion dimension”. The notation “a DOSYspectrum” has become quite established for this type of display. For polydisperse systems, in particular, such a presentation mode provides a better overview of the results of the data analysis than a table and the evaluated (projected) one-dimensional (1D) traces of the spin-echo components alone. In the case of discrete components, such a 2D display mode seldom adds any real information, however. Nevertheless, it is appealing to the eye. By applying the routine SPLMOD to FT-PGSE data sets one can, in principle, achieve a similar (global) type of DOSY data processing as with the present (CORE-NMR) approach since SPLMOD is designed to handle several data sets in parallel. However, the actual number is very much lower than in CORENMR and SPLMOD does not seem to be able to handle a complete FT-PGSE data set (which may be 32 × 16K or more) in a fully global and parallel manner. It is also evident from the original description of DOSY, as based on SPLMOD,18 that without additional postprocessing by sorting and selection criteria, the SPLMOD approach to FT-PGSE data processing is not intrinsically stable. As directly applied to high S/N FTPGSE data having just a few, nonoverlapping components, a lot of severe artifacts already occur (see ref 18). On more “difficult” data sets the problems will rapidly become worse. The internals of SPLMOD have not been described in the available user documentation,13,14 but it appears likely that the problems must originate from its internal, successive function approximation by spline methods and the use of derivatives thereof. A “high-resolution” DOSY approach was recently suggested by Barjat et al. 20 that in essence, scans every significant frequency channel of a very high-field FT-PGSE data set (like 600 MHz) of a low molecular weight mixture, and simply fits a single Stejskal-Tanner exponential to each frequency channel. A very good spectral separation of components in a mixture of unknowns often results in such a “HR-DOSY” 2D diffusion/ chemical shift display, as the components do not normally overlap to begin with in the 1D dimension. The present paper considers a quite opposite situation, however: a data set with extensive spectral overlap and a large dynamic range with regard to the constituent contributions to the composite bandshape. An alternative approach to DOSY that does use the information contained in the intrinsic common attenuation behavior of component FT-PGSE bandshapes which was suggested and tested recently,21 is based on new multivariate statistical methods (NIPALS, combined with Procrustean rotation) that have been developed by Kubista and co-workers.22-24 Their approach has proven to be very powerful and useful in optical spectroscopy. However, as input it requires two data sets, where the spectral amplitudes of the components have been attenuated differentially by some preparatory experiment. In FT-PGSE there is no obvious way to achieve this in an objection-free fashion: two different preparatory PGSE sequences before the common “analysis one” were applied21 to achieve the required differential attenuation in the two data sets. In such complex rf multipulse sequences, proper phase cycling must be considered as well, but this was ignored at the time. Nevertheless, as applied in the way described, the basic Kubista NIPALS/Procrustean approach was shown to work with FT-PGSE data, too. In the present paper, we present a new approach to data processing that performs a total fit of a “two-dimensional” raw FT-PGSE data set, i.e., fits every single significant frequency channel of the Fourier transformed half-echoes (typically 500-

J. Phys. Chem., Vol. 100, No. 20, 1996 8181 1000 channels of a 8-16K data set) to the PGSE pulse parameters (typically 16 or 32 incremented values of g2δ2(∆ δ/3)). The key point is that this is done in a global fashion; i.e., one utilizes the intrinsic common echo attenuation of individual components in the absorption-mode FT-PGSE spectra (Vide supra). Therefore, for n discrete components, n global self-diffusion coefficients are sufficient to model the experiment completely, regardless of whether the data fitting is done on one spectral frequency channel or on several thousand channels simultaneously. The only additional fitting parameters required are the amplitudes of the contribution of each component at a particular frequency channel. These amplitudes are, of course, the spin-echo component bandshapes, which are extracted by the fitting process. If desired (by combining that information with the determined self-diffusion coefficients and their error limits) one can generate a 2D-like data display, with the NMR spectrum on one axis and the self-diffusion coefficient on the other. One should note that in the general case the global fitting strategy of CORE would normally fail. The central requirement for this type of analysis to work is the presence of nonoVerlapping parts of component bandshapes, or parts that oVerlap differentially, and to a Varying extent with other component bandshapes throughout the spectrum. For a typical highresolution NMR spectrum of a multicomponent sample, this is the common situation. This characteristic of FT-PGSE NMR spectra makes the outlined fitting procedure feasible, and is the key element that locks it onto a stable minimization path. Tests, summarized in part in the following, show the minimization strategy to be robust and reliable. Just as important is the fact that all of the spectral information is utilized, rather than that of a few frequency data channels, and this may effectiVely increase the signal/noise of the experiment by a factor of more than 10, using the same raw data that constitute the basis of the traditional data evaluation, based on peak heights alone. To first-order, the CORE approach will smooth out the influence of imperfections such as instability in spectral phase in the absorption-mode FT-PGSE spectra. However, one should keep in mind that mixed dispersion/absorption or magnitude bandshapes are not additive in overlapping regions, so CORE processing is not a general remedy for badly phased spectra, even in the case of a constant phase error throughout the spectral data setsin the case of a multicomponent spectrum. As in FT-PGSE, Fourier transform electrophoretic NMR (EMR)25-28 has the property that the whole bandshape of a particular component in a multicomponent sample shows a common intensity/phase change upon incrementing an experimental parameter, i.e., the electrophoretic current or voltage. A cosinusoidal peak height variation results in the case of a U-tube sample geometry, for example. Different electrophoretic molecular mobilities thus translate into different oscillation periods for peak amplitudes of a particular molecule. The CORE approach is directly applicable in this case too, as has been experimentally tested by us. Results will be presented in a separate publication. Experimental Section All experiments were done on a Bruker AMX-300WB spectrometer, utilizing a custom-built 10 mm diffusion probe from Cryomagnet Systems, Indianapolis. This probe has selfshielded magnetic field gradient coils29-31 and produces a 0.22 T m-1 Z-gradient for a 1 A input current. The maximum gradients used in the present paper were 3.5 T m-1, corresponding to 16 A input current. The gradient generator itself was custom-built by W. S. Woodward, Department of Chemistry, University of North Carolina, but is essentially identical to that described previously.32

8182 J. Phys. Chem., Vol. 100, No. 20, 1996

Stilbs et al.

All experiments were based on the three-pulse stimulatedecho11 pulsed-gradient spin-echo sequence,33 using an eight rf pulse phase cycle, and three equally spaced gradient prepulses (to achieve a first-order instrumental steady-state situation). Typically the first rf pulse interval was 20 ms and the second 100 or 200 ms, leading to gradient pulse intervals (∆) of 120 or 220 ms, depending on the sample studied. These values were kept constant throughout a measurement series. Gradient pulse lengths (δ, 1-7ms) and/or strengths (g, up to 3.5 T m-1) were varied in the experiment which consisted of 16 or 32 different gradient parameter combinations and 8 or 64 accumulated spectra per gradient setting. The stimulated-echo sequence is preferable to the simple Hahn echo sequence in this and many other PGSE applications because of the following: (i) J-modulation effects are reduced, producing spin-echo spectra that essentially resemble the normal absorption spectra. (ii) The bandshape normally stays positive for macromolecular and colloidal systems, otherwise dominating T2-effects on echo attenuation are minimized, by making spin relaxation longitudinal during most of the sequence (T1 is typically much longer and is less influenced by polydispersity effects than T2, making quantitative experiments on such systems more meaningful (see the discussion in a recent review paper,34 section 4.1.7). This, of course, applies to CORE-NMR or DOSY as well. (iii) The third reason is a combination of the above two effects: since J-modulation effects on strongly coupled spin systems lead to efficient T2-relaxation, which may attenuate certain regions of a Fourier transformed Hahn spinecho spectra enormously, a lot of signal is lost, and the spectral bandshape becomes “distorted”, compared to the normal absorption one. Data Processing A standard Fortran program (named CORE) was written to achieve the task outlined in the Introduction. It consists of about 3000 lines of code, 70% of which are two copies of J. P. Chandler’s robust and proven direct-search minimization routine STEPIT (obtainable from Quantum Chemistry Program Exchange (QCPE), Bloomington, IN, as Program No. 307). One controls the global (higher-level) minimization (i.e., the actual self-diffusion coefficient optimization), and the other the (lowerlevel) fitting of component amplitudes of each frequency channel to the Stejskal-Tanner equation for component i (using the common higher-level global self-diffusion coefficient information):

A(i) ) A0(i) exp(-D(i)γ2g2δ2(∆ - δ/3))

(1)

Of course, in a multicomponent situation, sums of such exponentials are fitted instead. The overall least-squares sum to be minimized comprises the sum of the squared errors pertaining to the lower-level, individual frequency channels. Input data contain information about which fitting model one wishes to apply and the number of exponentials and components expected. Starting values for the iteration can be specified, and will normally speed up the process. Good initial estimates of the self-diffusion coefficients are not a requirement for the minimization to work, however. For polydisperse components we have selected to model the self-diffusion behavior according to the empirical KohlrauschWilliams-Watts distribution,35 which leads to a modified Stejskal-Tanner equation of the form36

A(i) ) A0(i) exp(-(D(i)γ2g2δ2(∆ - δ/3))β(i))

(2)

where β(i) characterizes the polydispersity of that component. This parameter may assume values between 0 and 1, the latter

of which pertains to the limiting, monodisperse case. This equation (named a KWW exponential in the following) normally fits typical experimental data on polydisperse systems excellently.36,37 While this does not mean that a KWW exponential represents the true distribution function, its simple mathematical form and physically realistic representation of a naturally polydisperse system makes it a very convenient model equation in the context of FT-PGSE applications to colloidal and macromolecular systems in solution. It normally seems quite pointless to try to proceed beyond the KWW model when the fit of typical experimental data does not show the slightest significant indication of nonrandom fitting residuals. In the majority of applications of CORE-NMR, one knows beforehand the number of components in the sample, and also if one or more of the components are likely to show “polydisperse” behavior. The present version of CORE is designed to handle either 1-5 discrete exponentials or 1-2 KWW exponentials in the presence of 1-3 discrete exponentials. In total there are 10 different selectable “diffusion” modes that would suffice to mimic any reasonable experimental data. In addition, two cosinusoidal/exponential functions pertinent to the processing of electrophoretic FT-NMR data have been included. CORE was originally developed on a 125 MHz Digital Alpha AXP workstation, under the OpenVMS 6.1 operating system. Its prototype version has routines for reading Bruker UXNMR data sets only (others are being developed). CORE is written in standard Fortran IV (with the exception of the use of some NAMELIST I/O) and also compiles and runs on Silicon Graphics Indy machines (the current Bruker NMR console computer). The present version of CORE is designed for data sets of 32 different gradient settings and 16K points of the Fourier transformed and phased absorption-mode FT-PGSE spectra. The program is not particularly memory-demanding by today’s standards, and these constraints can easily be extended. The experimental spectra discussed below were processed with the original single-precision CORE version and the synthetic ones with a newer double-precision version. Computing times do not really differ between single- and double-precision-number representation on 64-bit workstations. Intrinsically, fitting times depend of course on the complexity of the model chosen and the number of frequency channels to be fitted. On the computers mentioned they range from a few seconds (for one exponential, and just a few frequency channels) to an hour or so (for say four exponentials and 1000 frequency channels). CORE contains provisions for automatic selection of significant frequency channels, and for masking out unwanted regions, as well as for making sequential minimizations with increasing complexity, using the previous minimization as a starting point. Output data include a listing of the input data and the fitted parameters of the optimized model, the bandshapes of the components, their projected total contribution to the composite, nonattenuated bandshape, the global fit, and the global difference map. A 2D-like display of the results (with diffusion coefficients on one axis and the component spectra on the other) is being considered as an option. The stacked data presented here are presently generated by copying the output data set in question back to the appropriate UXNMR directory on the INDY and then using the standard display routines of UXNMR. 1D displays are generated through data imported to the program package Matlab (MathWorks Inc., Natick, MA) and processed through its standard plotting routines. More streamlined graphics data processing procedures are being developed. With regard to error analysis in CORE processing, the Monte Carlo approach previously described38 has been considered, but has not yet been implemented. We have used that for more than a decade for one-dimensional FT-PGSE data, and found it

Global Least-Squares Analysis of FT-PGSE NMR Data Sets

J. Phys. Chem., Vol. 100, No. 20, 1996 8183

Figure 1. (a) Sequence of FT-PGSE spectra for a system of EHEC/SDS in D2O, and for clarity also the first trace separately. Particulars with regard to gradient parameter settings are δ ) 2-7 ms, ∆ ) 220 ms, and g ) 0.26-3.52 T m-1. Both δ and g were varied in the experiment, and the irregularity at trace 4 is due to overall gradient settings that happened to be “out of sorted order” at that particular point. Out of 8192 frequency channels 721 were considered. (b) CORE fit to the data set. (c) Difference map at a higher vertical amplification.

to be convenient and to produce much more realistic results than other statistical procedures. Porting to CORE is straightforward, and the anticipated computing time is not overly excessive (perhaps a factor of 5-10 longer than a single optimization). Finally, it should be noted that there is no equivalent in CORE to the traditional semilogaritmic “StejskalTanner plot” (based on eq 1 or 2) of the traditional data analysis procedures; global difference spectra like that presented below provide a very good overall picture of the validity of the minimization procedure, however. Results and Discussion Simulations on Typical Experimental Data. We have performed a number of tests of the software to date, and found it to be robust and stable with regard to the desired optimization

procedure. In the following, a few typical application examples and their fitting results will be discussed and summarized. In general, self-diffusion studies have a wide applicability to binding and aggregation phenomena in solution. With the development of FT-PGSE, numerous physicochemical phenomena of this kind have become accessible for investigation.9,34 Polymer-surfactant interaction is presently the subject of a lot of FT-PGSE work, to quantify binding and aggregation phenomena.36,37,39-50 For several reasons (poor S/N, spectral overlap, and high dynamic range of relative component bandshapes) , accurate FT-PGSE results on these slowly diffusing systems are difficult to obtain by standard evaluation procedures. Also, the polymer part of aqueous polymer-surfactant systems often shows “polydisperse” behavior, not because of polydispersity of the polymer itself, but rather because of slow polymer

8184 J. Phys. Chem., Vol. 100, No. 20, 1996

Stilbs et al. TABLE 1: CORE Results for the EHEC/SDS System in D2O (Overall rms Error ) 2.8%) component

projected, rel total intensity

self-diffusion coeff/(m2 s-1)

EHEC SDS

0.386 0.614

6.99 × 10-12 6.39 × 10-11

β-value 0.32

TABLE 2: CORE Results for the DETAB/n-Propanol System in D2O (rms Error ) 1.8%)

Figure 2. Expanded views of the fitted EHEC/SDS component bandshapes and the overall calculated and experimental bandshapes. The x-axis corresponds to the respective points of the 8K data set. Please understand that the “experimental bandshape” (the dotted line) in the figure is the partly attenuated one of the first trace of Figure 1a. The calculated bandshapes and their sum (solid line) are the nonattenuated ones (corresponding to the A0 in eqs 1 and 2). An obvious amplitude “deviation” occurs when comparing the (attenuated) experimental data with the calculated sum of the EHEC trace (dash-dotted line) and the SDS trace (dashed line). (a) Region around the composite EHEC-SDS peak at approximately 3.6 ppm. (b) Region around approximately 0.51.5 ppm. The very high quality of the fit is particularly evident around channel 5500 (note the outer wing of the middle EHEC signal that decays toward channel 5600, under the much larger SDS peak).

chemical exchange on the NMR time scale between polydisperse aggregates. Polymer-surfactant interaction is definitely an area where FT-PGSE NMR, especially as processed by CORE, has a potential to significantly contribute to our understanding of the aggregation behavior. Figure 1a illustrates a typical proton FT-PGSE run at 25 °C on a polymer-surfactant system; ethyl hydroxyethyl cellulose (EHEC) and the surfactant sodium dodecyl sulfate (SDS) in D2O. Figure 1b shows the simulated spectrum by CORE, and Figure 1c shows the difference spectrum (at an increased amplitude). A fitting model with one KWW exponential (EHEC) and one single exponential (SDS) was used. Water is absent in FT-PGSE spectra at the high field gradient settings used. The overall CORE fit of the EHEC-SDS spectra (Figure 1b) shows no sign of systematic errors in the difference map (Figure 1c), except for the very highest gradient settings. The overall

component

projected, rel total intensity

self-diffusion coeff/(m2 s-1)

n-propanol DETAB HDO

0.10 0.80 0.10

6.32 × 10-10 1.12 × 10-10 1.90 × 10-9

normalized error square sum over all the significant data points is 0.0008, meaning that the root mean square error per significant point in the whole data set is 2.8%. The fitting results are summarized in Table 1. One should note that a CORE produces a nice, single exponential fit for the SDS bandshape, while a CONTIN/DOSY approach to the same data produces a selfdiffusion coefficient distribution that is unphysically wides intrinsically it is a single exponential due to rapid surfactant chemical exchange (cf. Figure 5 of ref 34 for a typical DOSY/ CONTIN result on a similar system). It is illustrative to also examine the quality of the extracted bandshapes. This is done in Figure 2a (peak 1) and 2b (peaks 2-4) . There are no irregularities, and the fit appears very sound. Peak 1 from the left has directly overlapping contributions from SDS and EHEC of similar magnitude. Peaks 2 and 3 overlap, but 2 is predominantly SDS and 3 is predominantly EHEC. Peak 4 is entirely SDS. One can conclude that CORENMR is essential for a proper data evaluation of this particular data set, due to the extensive peak overlap. Another example of how FT-PGSE has been very successfully applied to colloidal surfactant systems is the study of solubilization. Two such systems relevant for experimental studies of solubilization are (i) a D2O solution of primarily micellar decyltrimethylammonium bromide (DETAB) (8%), into which n-propanol (1.5%) has been solubilized, and (ii) a similar solution, but with micellar SDS (8%) and ethanol (2%) and n-butanol (2%) as solubilizates. These are only partly incorporated into the micelles, meaning that their effective selfdiffusion coefficients are weighted averages between that of free alcohol in aqueous solution, and that of the micelles, according to34,51,52

D(obs) ) pD(micelle) + (1 - p)D(free)

(3)

where p denotes the degree of micellar incorporation (0 < p < 1). In both systems one anticipates single FT-PGSE exponentials for all components, due to rapid free-bound chemical exchange on the NMR time scale. The data sets were acquired at temperature of 25 °C and at constant ∆ (120 ms) and δ (2 ms) at magnetic field gradients ranging from 0.09 to 3.52 T m-1. The data set was fitted to sums of three or four exponentials, respectively (thus including that of residual water). Table 2 summarizes the DETAB/n-propanol results. In the SDS-ethanol-butanol case, tests were also made with direct fitting to four exponentials, and by sequential fitting to one, two, three, and finally four exponentials, using the intermediate results as starting points for the next iteration. The direct fit worked, but arrived at a solution with a somewhat larger error square sum than the sequential one. All fitted self-diffusion coefficient information agreed within 10% between these direct and sequential data processing runs, however. The results of the sequential fit are summarized in Table 3. It is reassuring to see that the overall projected spectral intensities of the

Global Least-Squares Analysis of FT-PGSE NMR Data Sets

J. Phys. Chem., Vol. 100, No. 20, 1996 8185

Figure 3. (a) Experimental FT-PGSE data set, for clarity together with the first trace on a separate graph. (b) and (c) FT-PGSE bandshape contributions extracted by CORE from SDS (dash-dotted), ethanol (solid), and n-butanol (dashed) for two regions of the spectral data set. The actual water (solid, heavier dots) region around 4.8 ppm is not included, but some artifacts that have been assigned to water by CORE (probably as a consequence of baseline errors in the spectra (see text)) are obvious around data channels 3320-3350, and around channel 5380. (b) The -CH2- region around approximately 3.5-4.1 ppm. (c) The -CH2- and -CH3 region around approximately 0.6-1.8 ppm.

TABLE 3: CORE Results for the SDS/Ethanol/n-Butanol System in D2O (rms Error ) 1.5%) component

projected, rel total intensity

self-diffusion coeff/(m2 s-1)

SDS ethanol n-Butanol HDO

0.53 0.13 0.19 0.15

3.75 × 10-11 6.82 × 10-10 3.35 × 10-10 1.67 × 10-9

components do agree within experimental error with regard to the known composition of the solution (Vide supra). Figures 3 and 4 illustrate graphically the fits for the SDS-ethanol-butanol system. It is noteworthy that even minor significant details, like the alcohol -CH2- multiplets to the left and below the main SDS -(CH2)n- band are surprisingly well extracted from the composite bandshape. It must be remembered that these are spin-echo spectra, which are affected by some J-modulation effects, and therefore individual FT-PGSE spectra should

resemble, but not exactly mimic the absorption spectra of the same compound. One fitting artifact is obvious: a minor fraction of the fast “water diffusion” component has been assigned by the CORE minimization to areas where it does not belong (frequency channels 3300-3350 and 5300-5400). A reader with sharp eyes may already have noticed that all experimental spectra of the present paper suffer from a baseline distortion that manifests itself in a very broad, negative, mirror image of the spectrum itself; i.e., it decays to zero for high gradient values, where the spin-echo signal has attenuated to zero, too. This affects the fitting to a higher extent the more the exponentials have more differing time constants and intensities. This definitely is the case in the SDS-ethanol-butanol-water system, but far less so in the EHEC-SDS system previously discussed. The origin of the baseline distortion is not known at present. It is reassuring, however, that the CORE procedures do work so well with “typical real data” like the somewhat noisy data sets of

8186 J. Phys. Chem., Vol. 100, No. 20, 1996

Stilbs et al.

Figure 4. (a) Partial view of a synthetic data set of series 1, having a noise level of 0.02, corresponding to a spectral S/N of about 9 (see text) and (b) its CORE-processed counterpart.

Figure 1, and that it can cope with significant baseline errors, too. Later work used the standard Bruker baseline correction routines with a good improvement in the baseline and spectral appearance. Simulations on Synthetic Data. As a final test of the procedures, a large number of simulated PGSE data sets were systematically generated and processed with CORE. Series 1

comprized 25 data sets, each containing two Lorentzian bands of equal area, but of unequal widths (16 and 32 frequency channels wide at half-height, respectively, and a peak height ratio of 2/1), and characterized by diffusion coefficients that differ by a factor of 2 (corresponding to 100 and 200 of some arbitrary units in D, respectively). The band-shape components will in the following be referred to as the narrow/slow and the

Global Least-Squares Analysis of FT-PGSE NMR Data Sets

Figure 5. Results of CORE processing of the data sets of series 1 (see text), displayed as a function of signal separation and spectral noise level. (a) Component 1 (the broader and more rapidly diffusing component, having a nominal D of 200 units). (b) Component 2 (the narrower component, having a nominal D of 100 units). (c) Percentage of deviation of fitted relative integrated areas of the two components, expressed as the positive deviation of the more narrow peak from an 1:1 ratio. Please note that “noise level 0.10”, for example, does correspond to extremely poor S/N ratios; of the order of 2 or less (see text). The typical experimental situation would correspond to “a noise level” less than 0.02 ()S/N better than 10, cf. Figure 4a).

J. Phys. Chem., Vol. 100, No. 20, 1996 8187 broad/fast one, respectively. Each “spectrum” consisted of 1024 data points. Each bandshape is thus represented by about 100 points or so in each trace. Thirty-two different gradient factors were subsequently applied, so as to provide a fairly “optimal” attenuation range within the complete 32 × 1024 size data set. The maximum peak height of the first trace was normalized (arbitrarily to 10 000), and then Gaussian noise of various levels (0.0, 0.01, 0.02, 0.05, and 0.10 times a random Gaussian of standard deviation 1 and intensity 10 000) was added to each 32 × 1024 frequency channels, corresponding to maximum S/N ratios of approximately infinite, 18, 9, 4 and 2, as compared to the largest peak in the composite spectrum. The broader/faster diffusing component had only half of these S/N values, and the data sets are consequently Very much more noisy than encountered in typical experimental investigations. The S/N, of course decreases further with “diffusional” attenuation of the FT-PGSE data set. One should also note that the normalization and noise addition pertains to the composite bandshape, which effectively corresponds to a 50% higher noise level in the case of totally overlapping peaks, as compared to isolated ones. Each point of the whole data set was then CORE processed. The process was repeated 25 times for various noise levels and signal separations. The latter were 0, 12, 50, 100, and 200 frequency channels, corresponding to complete overlap, a slight “bump” on the bandshape, 50% overlap, slight overlap of the spectral bases, and no overlap, respectively. Figure 4 illustrates a synthetic data set having noise level 0.02 (S/N ) 9) and a signal separation of 100 channels, and the result of its CORE processing. Figure 5 summarizes the results of all 25 minimizations of series 1. Above a S/N of 10, CORE excellently extracts the component bandshapes, their integrated intensity, and the diffusion coefficients. At higher noise levels things gradually get worse, in particular with increasing band-shape overlap. The evaluated diffusion coefficient for the narrow/slow component is quite good, even at a terrible noise level and complete spectral overlap, however. It is also interesting to note that the fitting procedure does in fact suppress “insignificant data” as the noise level becomes increasingly worse. CORE/STEPIT systematically assigns a lower overall integrated intensity and a higher relative diffusional attenuation to a broad/fast peak. Also, all very noisy data sets generally result in diffusion coefficients which are systematically “too high”smeaning that CORE/ STEPIT processing forces the fitted data for those frequency channels to decay more quickly into the noise level. All fitted bandshapes did correspond well to the original synthetic data. No spectral artifacts appeared, even at the highest noise levels, and bandshapes did not “mix” to any significant extent either. However, the key feature of CORE processing is not used in series 1. For a multisignal FT-PGSE bandshape, the presence of an isolated peak makes possible a detailed unraveling of overlapping parts of the spectrum as well, and the overall quality of the fitting very much improves. Series 2 thus comprises the same two peaks and noise levels as previously, at complete overlap. To this bandshape was added an isolated peak, with a width at half-height of eight frequency channels and a diffusion coefficient of 200 units. CORE processing will thus regard this peak as belonging to the broad/fast component described previously. Figure 6 illustrates the results of the 25 minimizations of series 2, where the “additional peak” had relative contributions of 0, 4.7, 9.1, 20, and 33% of the total integrated intensity of the first trace of the 32 × 1024 point data sets. Figure 6a, in particular, shows the very significant quality improvement of the fitting that results from an isolated singlecomponent band-shape contribution.

8188 J. Phys. Chem., Vol. 100, No. 20, 1996

A

B

C

Stilbs et al. an automated CORE processing scheme could be applied as a routine procedure. Including functional forms for the data modeling other than Stejskal-Tanner or KWW exponentials or cosinusoidal EMR functions is a trivial matter. The global minimization procedures increase the effective signal/noise ratio by a very significant amount (typically by a factor of 10) by effectively adding together the information content of several 100 frequency data channels per bandshape, which saves a lot of acquisition time and makes possible studies on spectra of poor S/N. It bypasses the typical annoying deficiencies of current spectrometer data processing software by fitting each frequency channel in an unbiased way, and not looking for “the strongest peak” around a data channel selected by the data reduction routines (that frequently cannot find the peak at all). For the same reason, spectral noise itself is treated in an unbiased way; random noise is treated as “random” rather than artificially introduced as a systematic (positive) error through peak-maximafinding routines. This is particularly important for noisy FTPGSE data sets, at gradient settings that have attenuated the echo toward the noise level of the data set. Fitting “computer-evaluated peak integrals”, rather than “peak heights” might be argued to be a limiting case of the CORE approachsin the rare situation that there is no signal overlap whatsoever in a FT-PGSE data set. Provided such “peak integrals” could actually be evaluated properly (through curve fitting to a known bandshape, for example), an equally good S/N gain of the experiment should in principle result. According to experience, however, typical automated spectrometer software procedures fail badly in such an applicationseven in a high S/N, single-component situation. In summary, CORE processing makes possible proper data evaluation of FT-PGSE or FT-EMR data sets for very complex systems. It is meaningful to apply the procedures in a routine manner, and in the single-component situation, since the processing will improve the evaluation of any such data set quite significantly by effectively increasing the S/N ratio and avoiding biased treatment of spectral information. Most significantly, however, it makes possible FT-PGSE experiments on more complex systems than previously considered feasible, in particular studies that entail a monitoring of minor peaks that overlap with much more dominant bandshapes, even at low S/N ratios. Acknowledgment. We thank C. S. Johnson, Jr., K. F. Morris, and D. Wu of the University of North Carolina, Chapel Hill, for many stimulating discussions with regard to the analysis of FT-PGSE data, and for providing their source code and procedures for DOSY processing for inspection. This work has been supported by the Swedish Natural Sciences Research Council (NFR). FRN kindly provided the funding for the AMX300WB spectrometer. References and Notes

Figure 6. Results as of Figure 5, but for series 2 (see text); two exactly overlapping peaks and a minor isolated bandshape contribution to the broader, more quickly diffusing species.

Conclusions The global least-squares minimization of FT-PGSE data sets by the CORE strategy is stable and robust, as an intrinsic consequence of its direct approach to the problem in question. Computing time is not excessive on present-day computers, so

(1) Stilbs, P. Anal. Chem. 1981, 53, 2135. (2) Vold, R. L.; Waugh, J. S.; Klein, M. P.; Phelps, D. E. J. Chem. Phys. 1968, 48, 3831. (3) James, T. L.; McDonald, G. G. J. Magn. Reson. 1973, 11, 58. (4) Stilbs, P.; Moseley, M. E. Chem. Scr. 1979, 13, 26. (5) Stilbs, P.; Moseley, M. E. Chem. Scr. 1980, 15, 176. (6) Kida, J.; Uedaira, H. J. Magn. Reson. 1977, 27, 253. (7) Callaghan, P. T.; Trotter, C. M.; Jolley, K. W. J. Magn. Reson. 1980, 37, 247. (8) Callaghan, P. T. Aust. J. Phys. 1984, 37, 359. (9) Stilbs, P. Prog. Nucl. Magn. Reson. Spectrosc. 1987, 19, 1. (10) Stejskal, E. O.; Tanner, J. E. J. Chem. Phys. 1965, 42, 288. (11) Hahn, E. L. Phys. ReV. 1950, 80, 580. (12) Provencher, S. W. Comput. Phys. Commun. 1982, 27, 229. (13) Provencher, S. W.; Vogel, R. H. In Numerical Treatment of InVerse Problems in Differential and Integral Equations; Deuflhard, P., Hairer, E., Eds.; Birkhauser: Boston, 1983; pp 304-319.

Global Least-Squares Analysis of FT-PGSE NMR Data Sets (14) Vogel, R. H. SPLMOD Users Manual (Ver. 3); Data Analysis Group, EMBL: Heidelberg, 1988. (15) Provencher, S. W. J. Chem. Phys. 1976, 64, 2772. (16) Provencher, S. W. Biophys. J. 1976, 16, 27. (17) Morris, K. F.; Johnson, C. S., Jr. J. Am. Chem. Soc. 1992, 114, 3139. (18) Morris, K. F.; Johnson, C. S., Jr. J. Am. Chem. Soc. 1993, 115, 4291. (19) Johnson, C. S., Jr. In NMR Probes of Molecular Dynamics; Tycko, R., Ed.; Kluwer Acad. Publ.: Dordrecht, 1994; pp 455-488. (20) Barjat, H.; Morris, G. A.; Smart, S.; Swanson, A. G.; Williams, S. C. R. J. Magn. Reson. Ser. B 1995, 108, 170. (21) Schulze, D.; Stilbs, P. J. Magn. Reson. Ser. A 1993, 105, 54. (22) Kubista, M. Chem. Intell. Lab. Syst. 1990, 7, 273. (23) Kubista, M.; Sjo¨back, R.; Albinsson, B. Anal. Chem. 1993, 65, 994. (24) Scarminio, I.; Kubista, M. Anal. Chem. 1993, 65, 409. (25) Saarinen, T. R.; Johnson, C. S., Jr. J. Am. Chem. Soc. 1988, 110, 3332. (26) He, Q.; Johnson, C. S., Jr. J. Magn. Reson. 1989, 81, 435. (27) Johnson, C. S., Jr.; He, Q. In AdVances in Magnetic Resonance; Warren, W. S., Ed.; Academic Press: San Diego, 1989; p 133. (28) Morris, K. F.; Johnson, C. S., Jr. J. Am. Chem. Soc. 1992, 114, 776. (29) Mansfield, P.; Chapman, B. J. Magn. Reson. 1986, 66, 573. (30) Gibbs, S. J.; Morris, K. F.; Johnson, C. S., Jr. J. Magn. Reson. 1991, 94, 165. (31) Crozier, S.; Doddrell, D. M. J. Magn. Reson. Ser. A 1993, 103, 354. (32) Boerner, R. M.; Woodward, W. S. J. Magn. Reson. Ser. A 1994, 106, 195. (33) Tanner, J. E. J. Chem. Phys. 1970, 52, 2523. (34) So¨derman, O.; Stilbs, P. Prog. Nucl. Magn. Reson. Spectrosc. 1994, 26, 445.

J. Phys. Chem., Vol. 100, No. 20, 1996 8189 (35) Williams, G.; Watts, D. C. Trans. Faraday Soc. 1970, 66, 80. (36) Walderhaug, H.; Hansen, F. K.; Abrahmse´n, S.; Persson, K.; Stilbs, P. J. Phys. Chem. 1993, 97, 8336. (37) Abrahmse´n-Alami, S.; Stilbs, P. J. Phys. Chem. 1994, 98, 6359. (38) Stilbs, P.; Moseley, M. E. J. Magn. Reson. 1978, 31, 55. (39) Almgren, M.; Van Stam, J.; Lindblad, C.; Li, P.; Stilbs, P.; Bahadur, P. J. Phys. Chem. 1991, 95, 5677. (40) Thalberg, K.; Van Stam, J.; Lindblad, C.; Almgren, M.; Lindman, B. J. Phys. Chem. 1991, 95, 8975. (41) Persson, K.; Abrahmse´n, S.; Stilbs, P.; Hansen, F. K.; Walderhaug, H. Colloid Polym. Sci. 1992, 270, 465. (42) Bahadur, P.; Pandya, K.; Almgren, M.; Li, P.; Stilbs, P. Colloid Polym. Sci. 1993, 271, 657. (43) Hammarstro¨m, A.; Sundelo¨f, L. O. Colloid Polym. Sci. 1993, 271, 1129. (44) Persson, K.; Wang, G.; Olofsson, G. J. Chem. Soc., Faraday. Trans. 1994, 90, 3555. (45) Zhang, K.; Jonstro¨mer, M.; Lindman, B. J. Phys. Chem. 1994, 98, 2459. (46) Bjo¨rling, M.; Herslo¨f-Bjo¨rling, Å.; Stilbs, P. Macromolecules 1995, 28, 6970. (47) Persson, K.; Griffiths, P. C.; Stilbs, P. Polymer 1996, 37, 253. (48) Persson, K. Thesis, The Royal Institute of Technology, Stockholm, 1995. (49) Veggeland, K.; Nilsson, S. Langmuir 1995, 11, 1885. (50) Walderhaug, H.; Nystro¨m, B.; Hansen, F. K.; Lindman, B. J. Phys. Chem. 1995, 99, 4672. (51) Stilbs, P. J. Colloid Interface Sci. 1982, 87, 385. (52) Stilbs, P. In Solubility in Surfactant Aggregates (Surfactant Science Series); Christian, D., Scamehorn, J. F., Eds.; Marcel Dekker, Inc.: New York, 1995; pp 367-381.

JP9535607