Combining Spectral Ordering with Peak Fitting for One-Dimensional

Mar 25, 2013 - ... and Jacob G. Bundy. Department of Surgery and Cancer, Imperial College London, Sir Alexander Fleming Building, London SW7 2AZ, U.K...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/ac

Combining Spectral Ordering with Peak Fitting for One-Dimensional NMR Quantitative Metabolomics Manuel Liebeke,* Jie Hao, Timothy M. D. Ebbels, and Jacob G. Bundy Department of Surgery and Cancer, Imperial College London, Sir Alexander Fleming Building, London SW7 2AZ, U.K. ABSTRACT: One-dimensional 1H NMR spectra are widely used for metabolic profiling. Such data sets often contain hundreds or thousands of spectra, which typically have variation in their sample chemistry, which leads to chemical shift variation “positional noise”. This is a severe problem for metabolite quantification and data analysis, as peak integrals do not necessarily correspond across all spectra in a set. Various alignment algorithms have been developed to address this problem, but different studies have taken different approaches to evaluating the performance of NMR alignment routines and can be subjective or rely on an arbitrary cutoff. Furthermore, most alignment approaches completely fail to deal with peaks that overlap. We adopt the simple and robust method of ordering spectra with respect to an internally varying peak and use this to compare different alignment algorithms. Furthermore, we use the information from this procedure to help improve a Bayesian approach to automated peak deconvolution by restricting the prior probability distribution of the peak position in a model-free manner and compare the performance to manual peak deconvolution and to binning. This combination of spectral ordering and compound deconvolution improved the quality of the data for quantitative metabolomics.

M

to confusion between which metabolites change between samples. Sample preparation protocols can improve sample quality and stability, e.g., by use of pH buffers, ultrafiltration to remove proteins, chelating agents, etc.4−6 This is still not sufficient in all cases to avoid positional noise in 1H NMR spectra (and lengthy preparation steps risk losing the simplicity and hence lack of bias that is the particular advantage of NMR for metabolite profiling). Chemometric tools for spectral processing are essential to gain the most-interpretable results and are common steps in the pipeline for processing large metabolomic data sets before statistical analysis.7 Peak alignment is often used to improve the correspondence of peaks across spectra, both with data-reduced (binned) or full-resolution data,8,9 and a variety of alignment tools exist.10−15 Early peak alignment methods like correlation optimized warping (COW)16 and dynamic time warping17 were computationally complex and were replaced by faster methods using full-resolution data and fast Fourier transformation (FFT) and recursive FFT.18 Recent advances include recursive segment-wise peak alignment (RSPA)10 and algorithms using intercorrelation shifting14 and variable reference alignment,19 as well as procedures based on spectral clustering.13,15 All of them showed clear improvement of peak alignment for the majority of peaks in each data set presented in the original studies. However, the evaluation of alignment performance is often relatively subjective, generally including a

etabolic profiling by chromatographic or nuclear magnetic resonance analysis of all kinds of biofluids, tissue extracts, and cell extracts has become an essential tool for phenotyping biological systems. Chromatographic approaches try to separate the vast number of metabolites present in a sample, but NMR spectroscopy uses the unique signals derived from nuclei in different chemical environments to identify individual metabolites without physical separation. This has both advantages and disadvantages: the lack of sample preparation reduces many sources of analytical bias, and because NMR is a near-universal detector, its great strength as a metabolomics platform is the ability to detect metabolites from all kinds of chemical classes with high precision and reproducibility.1,2 The commonest approach to profiling is still acquiring one-dimensional (1D) proton spectra, as these are both robust and rapid to acquire (especially important for large epidemiological studies that have thousands or tens of thousands of samples). On the other hand, peak overlap inevitably reduces data quality. Unambiguous assignment of peaks becomes more difficult, and detection limits effectively increase for highly overlapped peaks. Furthermore, the samples are typically subject to the variable correspondence problem:3 the chemical shift of individual resonances can change between spectra. This shift variation is generally because of natural variability in sample matrix composition (e.g., sample pH, concentration of other ions, metabolites, proteins) and makes tracking peaks across hundreds of spectra difficult. This socalled “positional noise” contributes to errors in metabolite quantification by 1H NMR and, in the worst case scenario, leads © 2013 American Chemical Society

Received: January 23, 2013 Accepted: March 24, 2013 Published: March 25, 2013 4605

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

Figure 1. Spectral sorting greatly improves the interpretability of NMR spectra for metabolite profiling. Selected spectral regions: upper panel shows spectra in order of acquisition, and lower panel shows the same spectra after sorting. (a) Bacterial supernatants, sorting based on histidine signal (δ 7.0−7.2 ppm). (b) Urine samples, sorting based on alanine peak at δ 1.47 ppm. (c−e) Tissue extracts, sorting based on Nα,Nα-dimethylhistidine signal at around δ 2.9 ppm.

the earthworm Lumbricus rubellus, taken from different field populations (unpublished data); the tissue extraction protocol was described in detail by Liebeke and Bundy;22 (3) an environmental health study, with 178 human urine spectra from different individuals;23 (4) human serum samples (unpublished data), with 150 spectra prepared and processed using standard protocols.4 Software Tools. We used iNMR 4.1.2c (Mestrelab Research) for processing, ordering, and export of spectral data. The spectra were referenced and normalized in all cases to the internal standard [earthworm tissue extract data set, 2,2dimethyl-2-silapentane-d6-5-sulfonic acid (DSS); urine, serum, and bacterial supernatants, 3-trimethylsilylpropionate-2,3,3,3-d4 sodium salt (TSP)]. The spectra were sorted using a Lua script in iNMR (downloadable from http://inmr.net/library/Lua/ SortObyShift.lua). The different alignment algorithms tested (COW, RSPA, icoshift, and the hierarchical alignment) were run in Matlab R2011a (Mathworks, Cambridge, U.K.).10,13,14 We tested the different algorithms with different parameter settings to try and ensure they were giving optimal results. For RSPA, we tested different values for the maximum allowable peak shift parameter. For icoshift, the spectra are divided into intervals of user-selectable widths. We tested different widths (ranging from whole-spectrum to single-peak range) and selected appropriate widths for the regions considered for the different sets of spectra. The hierarchical alignment approach was run using RSPA as the basis alignment algorithm, and we used the same parameter setting here as the optimal setting for RSPA alone. The COW algorithm was used with default parameters. For comparing different metabolite quantification strategies, we focused on a subset of 30 spectra of the bacterial supernatant samples, as described above. We quantified metabolites by computer-assisted manual fitting with Chenomx NMR Suite 7.0 (Chenomx, Edmonton, Canada), in order to generate concentration data to compare against, as we regard this as the closest current equivalent to a gold standard for realworld samples. Peak integration of specific areas (i.e., targeted binning) was done within Matlab using in-house code. The

step based on simple inspection of aligned compared to unaligned spectra; these methods may not work for more extreme cases. Furthermore, most alignment methods fail when peaks cross over in chemical shift between samples (i.e., when peak A appears upfield of peak B in spectrum 1, but downfield in spectrum 2). The generalized fuzzy Hough transform (GFHT) is the only approach reported so far with the potential to successfully align such crossing-over peaks. The GFHT approach is somewhat different to the “pure” alignment techniques: the spectra are first ordered with respect to an internally shifting peak, the set of ordered spectra is treated as an image, and feature detection is carried out across this “image”.12 This has been extended to consider multiple internally shifting peaks, using a chemometric approach.11 Here we use such a sorting approach, although without the GFHT feature-detection step, to compare, first, the success of different published alignment tools based on COW, RSPA, interval correlated shifting, and hierarchical alignment. All of the tools tested had problems with overlapping peaks and generated spectral artifacts, which was only apparent following the spectral sorting. To overcome these problems, we used metabolite fitting approaches. We combined the latent information available from spectral sorting with a Bayesian approach to automated metabolite deconvolution and quantitation and updated an existing R package (BATMAN)20 with an additional module to make use of this information. We analyzed an exemplar data set with large peak frequency shifts and compared it to peak binning (in conjunction with the above-named alignment methods) and to computer-assisted manual peak fitting by an expert spectroscopist.



MATERIALS AND METHODS Data Sets Used. We tested peak ordering and different alignment tools on four different 1D 1H NMR data sets: (1) a bacterial cell culture study, with 470 spectra of supernatants of Pseudomonas aeruginosa grown on synthetic cystic fibrosis medium (a defined medium designed to represent the nutritional content of cystic fibrosis sputum);21 (2) an environmental study, with 340 spectra of tissue extracts from 4606

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

upgraded R package, BATMAN (version 1.1.0), was developed within this study. Instead of assigning one chemical shift for each multiplet across all spectra, this latest version includes an option to input chemical shift information for any multiplet on a per-spectrum basis in batch processing mode, using information gained from the spectral sorting procedure. Furthermore, this version also incorporates the option to model complex multiplets using templates from spectra of authentic standards; the previous version only allowed modeling of combinations of Lorentzians. The latest version of BATMAN (including a Matlab tool for fitting spline functions to peaks from ordered sets of spectra) is freely available from http://batman.r-forge.r-project.org.



RESULTS AND DISCUSSION For the first aim of our study, evaluating how alignment approaches can affect NMR data quality, we reanalyzed existing 1D 1H NMR spectra from four very different sample matrices: growth medium supernatants, tissue extracts, and human urine and serum. They exhibited different kinds and degrees of spectral “positional noise”. We ordered the spectra within each set with respect to the chemical shift of an internally varying peak (or peaks) and then plotted the ordered spectra as a pseudo-2D or stacked spectra plot. This visualization made it much easier to distinguish shifting peaks in crowded areas: specific peaks could easily be tracked throughout the complete set of spectra, even with large chemical shift changes (>0.5 ppm), or for peaks crossing over in frequency (Figure 1). Furthermore, the ordered data revealed “hidden” peaks (Figures 1 and 2), i.e., metabolite signals which were completely obscured by other peaks in normal spectra, and so appeared to derive from a single metabolite, but were shown to belong to two metabolites when considered across a whole set of spectra. The best results were gained from pH-dependent signals, e.g., the aromatic imidazole protons from histidine for the bacterial supernatants24 or the aliphatic dimethyl peak from Nα,Nαdimethylhistidine for the tissue extracts, implying that pH was the major source of frequency shifting for these sample sets. The serum and urine spectra, though, showed different outcomes depending on which peak was chosen as a sorting reference; for instance, the serum spectra were largely ordered correctly by sorting with respect to an unassigned singlet at δ 3.51 ppm. However, sorting by the alanine peak (δ 1.47 ppm) revealed some metabolite peaks which were not sorted properly otherwise, e.g., glycine (Figure 2). Previously, a single factor was thought to be sufficient for ordering serum spectra.12 For the urine spectra, sorting by pH-sensitive signals led to clear improvements, with a majority of peaks ordered correctly, but some metabolites showed a different pattern introduced by a second or even third factor in addition to pH, e.g., the citric acid peaks at around δ 2.6 ppm showed no significant ordering effect in pH-sorted urine spectra (data not shown). It has long been known that pH differences affect spectra of metabolites in typical biological samples such as biofluids.25,26 Other sample properties, e.g., the concentration of other cations such as Mg2+ and Ca2+, can also affect chemical shifts, as can interactions with other metabolites or proteins. Presumably, these multiple factors were not highly correlated in the set of urine sample spectra that we used here. Some metabolites, e.g., citrate, are subject to chemical shift effects from pH and from different cations,27,28 and the responses for such metabolites are likely to

Figure 2. Sorting by different metabolite peaks uncovers multiple sources of chemical shift variation in batches of spectra, demonstrated for a set of serum spectra: (a) all spectra in order of acquisition, (b) spectra sorted by an unassigned peak at δ 3.51 ppm, and (c) spectra sorted by alanine peak at δ 1.47 ppm. The single resonance at around δ 3.54 ppm (glycine) is well-aligned by the alanine-based sorting approach (c).

be complex and may well not be amenable to the simple sorting approach used in this study. The positional information gained from sorting transforms the user’s ability to judge the success of different alignment tools. Previous studies have used simple visual inspection to judge alignment algorithm success, sometimes in combination with other statistical metrics. We do not think there is anything wrong per se with relying on visual assessmentin fact, this can be extremely powerful as it takes into account the experience of skilled usersbut assessing unsorted spectra fails to make use of the information latent in the whole set of spectra. We compared existing alignment tools and used sorted spectra to help us assess their performance. The COW, RSPA 4607

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

Figure 3. (a) Spectral sorting demonstrates that alignment algorithms incorrectly align peaks from some metabolites, selected spectral regions shown. Serum sample spectra: top panel, unsorted and unaligned spectra; middle panel, sorted and unaligned spectra; bottom panel, RSPA-aligned spectra (sorting based on the peak at δ 3.51 ppm). (b) Alignment can introduce spectral artifacts. Tissue extract spectra: top panel, sorted but unaligned spectra; middle panel, sorted RSPA-aligned spectra; bottom panel, sorted hierarchical aligned spectra. (c) Alignment can impute additional signals. (d) Alignment tools are not consistent in shifting a pair of peaks in the same direction; the overlapping signals of threonine and lactate at δ 1.32 ppm were aligned to different reference shifts between samples A and B. Direction of alignment is indicated by an arrow. (e) Alignment with the missing value imputation can generate additional signal. The top panel shows a selected region from three unaligned spectra, and the bottom panel shows the result from icoshift alignment on these. The arrow indicates the outcome of using the missing value calculation. The bacterial supernatant data set was used for data shown in panels c−e.

algorithm,10 hierarchical clustering approach,13 and icoshift14 all aligned the majority of peaks and minimized peak shift variation. However, we also observed many unaligned signals, particularly for lower-concentration compounds (Figure 3a). The RSPA and hierarchical alignment approaches also led to the creation of spectral artifacts (Figure 3, parts b and c), due to filling in missing values. This problem was addressed by the use of data interpolation using missing values in the icoshift tool,14 but this requires symmetrical peaks to interpolate the values, so it will still fail with overlapping peaks, which do not show symmetrical peak shapes (Figure 3e). Summarizing, the existing alignment approaches do an excellent job at aligning highconcentration peaks that do not overlap; using the pH (or other ion)-sorted spectra reveals that it can actually worsen the situation for low-concentration peaks and/or introduce spectral artifacts. Overlapping peaks are not tractable using the standard alignment tools. Spectral ordering provides a simple method for “tracking” peaks across multiple spectra, whether this is carried out in an automated fashion (e.g., by using the GFHT or another algorithmic technique) or by visual inspection. Comparison to Previous Approaches. The spectral ordering approach was developed in a series of pioneering papers, in the context of combining this with image-processing tools for extracting information from NMR spectra of complex real-world mixtures.12,29,30 How does our current study differ from this previous research? These earlier studies used peak detection combined with the GFHT for feature extraction across spectra. The simplest form just uses a single shape function f(i, α, s) = siα, where si is the ith element of a shape

vector s describing the change in chemical shift of a peak across a set of ordered spectra and α is an expansion factor (described as ranging from 0 ≤ α, although in practice α could also take negative values).12 This was then extended to cope with multiple sources of peak variation (e.g., pH, other cations, overall concentration, etc.)11 and used in combination with multivariate calibration to automatically extract metabolite concentrations.29 However, there are some limitations with this approach. In particular, the assumption that a single shape function describes all resonance shifts caused by a single factor is too simplistic. This can be demonstrated by simple consideration of, for example, pH-related shifts: metabolites differ in pKa and in the extent and direction of resonance shifts.31 This is clearly demonstrated for real-world metabolite mixtures by consideration of, for example, the tissue extract spectra: here, pH is the main driving factor (as shown by the fact that all peaks are well-aligned by sorting with respect to the pH-sensitive imidazole-containing compound Nα,Nα-dimethylhistidine), but many resonances have different shape vectors s (i.e., such that they could not simply be described by different values of α). Our approach relies more on operator input to define shifting peak trajectories across a set of ordered spectra: this is therefore less automated and, hence, has a greater degree of subjectivity than the GFHT approach. A series of points are selected across the course of a metabolite, and a spline function is fitted to thesein effect, we are defining s for each resonance in a way that does not make any assumptions beyond smoothness of the shift curve. We then use this information 4608

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

Figure 4. Automated Bayesian fitting of spectra with chemical shift variation is improved by using information gained from sorting across a whole set of spectra (bacterial supernatant data): (a) unsorted spectra, selected chemical shift range; (b) spectra sorted with respect to pH (histidine aromatic peak). The orange, purple, blue, and green dashed lines represent the intervals for calculating peak integrals for individual metabolites (valine, threonine, glycine, and glucose, respectively); these also represent the chemical shift ranges input as prior per-metabolite information into the Bayesian deconvolution software (BATMAN). Solid orange, purple, blue, and green lines show the spline fit of the peaks to generate per-spectrum chemical shift information. (c) Peak fits from BATMAN without per-spectrum chemical shift information for three example spectra; individual metabolite fits are represented by different colors. The red line represents wavelet fit, i.e., residual signal from unfitted peaks. Arrows indicate peak fitting problems by this automated approach. (d) Peak fits from BATMAN with per-spectrum chemical shift information included: accuracy is improved and residual wavelet fit decreased. (e) Automated fitting using per-spectrum chemical shift information is a clear improvement on taking simple peak integrals for metabolites with overlapping resonances. Concentrations from manual fitting (Chenomx) are shown on the abscissa, and concentrations from taking simple peak integrals (upper panel) and automated fitting (lower panel) are shown on the ordinate; data are for 31 bacterial supernatant spectra. Solid lines represent linear regression with 95% CI for spectra without peak overlaps (filled boxes); spectra with significant peak overlap are not included in the regression and are shown as unfilled circles. For more detailed evaluation of the presented data see Table 1. 4609

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

Table 1. Comparison of Quantitation of Selected Metabolites from 31 Spectra of Bacterial Supernatantsa BATMAN

no alignment

icoshift

RSPA

COW

metabolite

bin center (ppm)

R2

Eno

Eov

R2

Eno

Eov

R2

Eno

Eov

R2

Eno

Eov

R2

Eno

Eov

glycine glucose glucose threonine threonine methionine isoleucine

(3.54) (5.22) (3.53) (4.23) (3.56) (2.63) (1.00)

0.712 0.769 0.769 0.680 0.680 0.797 0.668

8.4 6.5 6.6 14.6 14.6 3.0 4.6

5.7 9.8 9.8 22.5 22.5 4.4 4.2

0.327 0.921 0.094 0.678 0.908 0.780 0.590

7.8 3.1 38.7 12.2 7.1 4.2 7.4

17.0 6.4 38.7 13.3 7.1 16.0 19.7

0.150 0.921 0.519 0.676 0.899 0.752 0.623

6.0 3.1 15.8 12.0 8.2 4.2 6.5

22.9 6.3 13.6 13.1 6.1 20.1 14.9

0.166 0.921 0.561 0.678 0.853 0.742 0.590

5.7 3.1 14.0 12.2 10.2 4.2 7.4

22.4 6.4 5.5 13.3 6.7 20.7 19.7

0.093 0.921 0.092 0.678 0.204 0.523 0.590

83.5 3.1 41.1 12.2 92.1 8.1 7.4

64.3 6.4 37.0 13.3 74.9 8.1 19.7

For all methods, the correlation of the data against manually fitted concentrations (Chenomx NMR Suite) is given (R2). In addition, regressions were calculated for each resonance against the Chenomx data, using only spectra with no resonance overlap. These were then used to calculate root mean square relative error values (%) for spectra with (Eov) and without (Eno) resonance overlap. The best-performing method for each metabolite has R2 shown in bold and underlined. a

In this case, however, binning nonaligned data gave an equally good result. In comparison, the BATMAN-fitted values were generally good and outperformed the alignment methods for peaks with overlap. For some metabolites, such as glucose, this meant that BATMAN performed better than the aligned data for some resonances (with overlap, e.g., δ 3.53) but worse for others (without overlap, e.g., δ 5.22). However, this meant that for metabolites with no nonoverlapped resonances, such as glycine, the BATMAN fitting was clearly the best performing method, as judged by correlation to manually deconvolved data (Table 1). In addition, the calculated absolute relative error for glycine peak values from overlapping peaks in the aligned data sets were in all cases higher (17−64%) than those from automated fitting with BATMAN (5.7%) (Table 1). In more complex mixtures, we expect there would be a considerable fraction of metabolites with no isolated resonances. We also compared the computational time needed for the different methods. The manual fitting with Chenomx required approximately 10 min per spectrum (seven metabolites fitted for the bacterial supernatant data). The alignment methods were all fast in comparison (e.g., the RSPA algorithm required less than 1 min to align the whole set of spectra). Automatic metabolite fitting (BATMAN) required approximately 30 min per spectrum on a single core, and parallelization of processors decreased the overall time per spectrum to 7.5 min (Intel Core i5, 2.8 GHz, 16 GB memory, four cores). In addition, there is of course the time required for sorting the spectra and fitting splines, which we have not allowed for here, but would require an investment of time for each set of spectra (e.g., from less than one hour, up to several hours, depending on the complexity of the spectra). Hence, our proposed workflow for using BATMAN in combination with the ordering approach would be as follows: (a) Order the spectra based on a peak with high chemical shift variability and inspect the pseudo-2D ordered plot for success in aligning peaks. (b) Apply spline-fitting to clearly ordered peaks and save the chemical shift information into BATMAN input file. (c) If needed, repeat the ordering of the data set according to peaks previously not successfully ordered (i.e., when there may be multiple sources of chemical shift variation within a set of spectra, as is typically the case for urine), fit splines to the additional peaks, and transfer the chemical shift information into BATMAN input file. (d) Run BATMAN on data set with per-spectrum chemical shift information. In addition, although we have not included it explicitly as a workflow step, in practice it may well be necessary to inspect

to help parametrize the deconvolution software BATMAN, which uses a Markov-chain Monte Carlo Bayesian sampling approach to fit metabolite signatures in parallel to NMR spectra.20 Fully automated BATMAN-fitted data of complex cell extract samples showed robust quantification in comparison to manual NMR Suite fitting.32 In principle, BATMAN should be able to quantify metabolites even where all resonances overlap with other signals in the spectrum, and it also has a function built in to prevent overestimation of concentrations by penalizing negative residuals. In contrast, the GFHT/PCA approach is most suitable for metabolites which have (at least one) nonoverlapping resonance: Alm et al.29 describe an evaluation step to remove resonances that do not correlate well with the others in a PCA model, e.g., because of overlap. For the existing BATMAN package, the prior probability distributions for chemical shift for each metabolite can be set individually, so peaks which are known to shift can be given wider priors. However, by default all spectra are treated independently, and so the approach does not take into account any prior information gained from considering trends across a set of spectra. We added functionality by providing an option to set chemical shift ranges for individual metabolite resonances on a “per spectrum” rather than “per batch” basis, with the information taken from the fitted splines. We used the bacterial supernatant data set as a test case and compared BATMAN metabolite quantification using per-spectrum chemical shift information to simple peak integrals (binning) and to expert manual fitting using the Chenomx NMR Suite software (Figure 4). We analyzed seven metabolites across multiple spectra, chosen to include metabolites which did (threonine, valine, isoleucine, glycine, methionine) and did not (glucose, aspartate) show pH variation. They were also selected to highlight the peak correspondence problem, with regions with overlapping shifting peaks from metabolite combinations (glycine−threonine, aspartate−methionine, glucose−threonine/glycine). The fitting approach was clearly superior to simple peak integrals for signals that had overlapping peaks from other metabolites (Figure 4). Even the worst-case scenario of crossing-over peaks could be accurately fitted (e.g., glycine singlet overlapping threonine doublet). We also calculated summary statistics for the different approaches (i.e., binned data of unaligned spectra, binned data of aligned spectra, and fitted metabolite data) in comparison to the manually fitted data (Table 1). Unsurprisingly, the alignment approaches tended to give excellent results for isolated peaks, e.g., the α-glucose anomeric proton resonance. 4610

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

automated fitting (time required for optimizing parameters, checking quality of results, etc.). Second, and more importantly, these approaches are most suitable when metabolite identities are known in advancea chemometric approach is more appropriate if unknown or unsuspected metabolites are present. In this case, using an alignment algorithm could well be a preferred approach. Furthermore, for cases where the betweenspectrum peak shift is relatively small, alignment approaches may perform well for the large majority of peaks in a spectrum and are quick to run.

the quality of the spectral fits (e.g., by examining the wavelet function-fitted residuals) and adjust the prior parameter data where necessary (e.g., it can be sensitive to exact values of J coupling constants). What other options exist for automated metabolite fitting from 1D NMR spectra of complex mixtures? The Chenomx database encodes pH-dependent chemical shift information, which has been used to extend the approach to automated metabolite fitting via the AutoFit program.33 Although we have not tested this in the current study, it is reported to work well for serum and cerebrospinal fluid, but not urinewhere, of course, pH is not the only or even main factor affecting peak shifts. What implications do these results have for 1D NMR-based metabolic profiling? We believe that insufficient attention has been given to the potential problems caused by alignment algorithms. This is surely at least partly down to the powerful psychological effect of inspecting a set of aligned spectrathey look so much more reproducible and interpretable than the original nonaligned spectra that one forgets that they might also be introducing errors. For cases where the alignment algorithms work well (isolated peaks with no overlap with other signals), simple “targeted binning” approaches (i.e., choosing peak integral boundaries that cover the frequency range of a shifting peak across an entire set of spectra) are likely to perform equally well (Table 1). We believe that where studies use alignment algorithms, the onus should be on the authors to demonstrate that this does not introduce errors in lowconcentration peaks in crowded regionsand simple inspection of stacked or overlaid unordered spectra is unlikely to be sufficient to achieve this. We recommend that sorting spectra with respect to internal variation is a far more powerful approach for assessing alignment performance. Clearly, though, not all sets of spectra will be equally amenable to this approach: urine has multiple sources of peak variation and may need to be sorted according to several different peaks [e.g., for the set of urine spectra that we examined here, ordering spectra by the resonances of alanine (δ 1.47 ppm), succinate (δ 2.40 ppm), and trimethylamine-N-oxide (δ 3.25 ppm) all gave different sorting effects]. The other spectral sets that we used here (bacterial supernatants and tissue extracts) had two advantages: (a) pH was the dominant source of peak variation, and (b) they possessed internal pH-sensitive metabolites with well-resolved peaks (histidine and dimethylhistidine, respectively). Not all sets of spectra will possess suitable endogenous signalsfor these, an appropriate pH-sensitive internal standard such as difluorotrimethylsilanylphosphonic acid (DFTMP) could be added to the sample buffer.34 The peak-fitting approach avoids the problems introduced by computational alignment (while being subject to problems of its own). For spectra with internal peak shifting, spectral ordering clearly improves the power of this approach. We have demonstrated this with the Bayesian package BATMAN, but the principle applies more generally. For instance, the GFHT/ PCA approach is also clearly a powerful method for automated metabolite fitting across different sample types.29 Previous studies have concluded that, when individual metabolites can be fitted, it improves the quality of the data analysis (e.g., more biologically interpretable clustering observed, simplification of data analyses, and greater compatibility with pathway-based bioinformatics approaches),11,35−37 but this does not mean that fitting approaches should be used universally. There are additional costs in terms of person-time, both for manual and



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Volker Behrends (Imperial College London) for access to the bacterial supernatant data set and Deepti Sood and Hector Keun (Imperial College London) for access to the human serum data set. We thank Giuseppe Balacco (Nucleomatica, Italy) for providing the script for sorting the spectra within iNMR. We thank Kirill Veselkov (Imperial College London) for access to a software implementation of the RSPA alignment tool and for helpful discussion of the manuscript. We thank Greg Tredwell (Imperial College London) for providing the Matlab code for fitting splines to individual peaks. This study was supported by the Natural Environment Research Council (NERC), under Grant NE/ H009973/1.



REFERENCES

(1) Lindon, J. C.; Nicholson, J. K. TrAC, Trends Anal. Chem. 2008, 27, 194−204. (2) Malet-Martino, M.; Holzgrabe, U. J. Pharm. Biomed. Anal. 2011, 55, 1−15. (3) Aberg, K. M.; Alm, E.; Torgrip, R. J. O. Anal. Bioanal. Chem. 2009, 394, 151−162. (4) Beckonert, O.; Keun, H. C.; Ebbels, T. M.; Bundy, J.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Nat. Protoc. 2007, 2, 2692−2703. (5) Stolzenburg, S.; Lauridsen, M. B.; Toft, H.; Zalloua, P. A.; Baunsgaard, D. Metabolomics 2011, 7, 270−277. (6) Jiang, L. M.; Huang, J.; Wang, Y. L.; Tang, H. R. Analyst 2012, 137, 4209−4219. (7) Ebbels, T. M.; Lindon, J. C.; Coen, M. Methods Mol. Biol. 2011, 708, 365−388. (8) Halouska, S.; Powers, R. J. Magn. Reson. 2006, 178, 88−95. (9) Craig, A.; Cloarec, O.; Holmes, E.; Nicholson, J. K.; Lindon, J. C. Anal. Chem. 2006, 78, 2262−2267. (10) Veselkov, K. A.; Lindon, J. C.; Ebbels, T. M.; Crockford, D.; Volynkin, V. V.; Holmes, E.; Davies, D. B.; Nicholson, J. K. Anal. Chem. 2009, 81, 56−66. (11) Alm, E.; Torgrip, R. J. O.; Aberg, K. M.; Schuppe-Koistinen, I.; Lindberg, J. Anal. Bioanal. Chem. 2009, 395, 213−223. (12) Csenki, L.; Alm, E.; Torgrip, R. J. O.; Aberg, K. M.; Nord, L. I.; Schuppe-Koistinen, I.; Lindberg, J. Anal. Bioanal. Chem. 2007, 389, 875−885. (13) Robinette, S. L.; Ajredini, R.; Rasheed, H.; Zeinomar, A.; Schroeder, F. C.; Dossey, A. T.; Edison, A. S. Anal. Chem. 2011, 83, 1649−1657. (14) Savorani, F.; Tomasi, G.; Engelsen, S. B. J. Magn. Reson. 2010, 202, 190−202.

4611

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612

Analytical Chemistry

Article

(15) Vu, T. N.; Valkenborg, D.; Smets, K.; Verwaest, K. A.; Dommisse, R.; Lemiere, F.; Verschoren, A.; Goethals, B.; Laukens, K. BMC Bioinf. 2011, 12, 405. (16) Nielsen, N. P. V.; Carstensen, J. M.; Smedsgaard, J. J. Chromatogr., A 1998, 805, 17−35. (17) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. AIChE J. 1998, 44, 864−875. (18) Wong, J. W. H.; Durante, C.; Cartwright, H. M. Anal. Chem. 2005, 77, 5655−5661. (19) MacKinnon, N.; Ge, W.; Khan, A. P.; Somashekar, B. S.; Tripathi, P.; Siddiqui, J.; Wei, J. T.; Chinnaiyan, A. M.; Rajendiran, T. M.; Ramamoorthy, A. Anal. Chem. 2012, 84, 5372−5379. (20) Hao, J.; Astle, W.; De Iorio, M.; Ebbels, T. M. Bioinformatics 2012, 28, 2088−2090. (21) Behrends, V.; Bell, T. J.; Liebeke, M.; Cordes-Blauert, A.; Nair, C.; Williams, H. D.; Bundy, J. G. Unpublished work. (22) Liebeke, M.; Bundy, J. G. Metabolomics 2012, 8, 819−830. (23) Ellis, J. K.; Athersuch, T. J.; Thomas, L. D.; Teichert, F.; PerezTrujillo, M.; Svendsen, C.; Spurgeon, D. J.; Singh, R.; Jarup, L.; Bundy, J. G.; Keun, H. C. BMC Med. 2012, 10, 61. (24) Baryshnikova, O. K.; Williams, T. C.; Sykes, B. D. J. Biomol. NMR 2008, 41, 5−7. (25) Bales, J. R.; Higham, D. P.; Howe, I.; Nicholson, J. K.; Sadler, P. J. Clin. Chem. 1984, 30, 426−432. (26) Nicholson, J. K.; Buckingham, M. J.; Sadler, P. J. Biochem. J. 1983, 211, 605−615. (27) Moore, G. J.; Sillerud, L. O. J. Magn. Reson., Ser. B 1994, 103, 87−88. (28) van der Graaf, M.; Heerschap, A. J. Magn. Reson., Ser. B 1996, 112, 58−62. (29) Alm, E.; Slagbrand, T.; Aberg, K. M.; Wahlstrom, E.; Gustafsson, I.; Lindberg, J. Anal. Bioanal. Chem. 2012, 403, 443−455. (30) Alm, E.; Torgrip, R. J. O.; Aberg, K. M.; Schuppe-Koistinen, I.; Lindberg, J. Anal. Bioanal. Chem. 2010, 396, 1681−1689. (31) Fan, W. M. T. Prog. Nucl. Magn. Reson. Spectrosc. 1996, 28, 161− 219. (32) Astle, W.; De Iorio, M.; Richardson, S.; Stephens, D.; Ebbels, T. M. J. Am. Stat. Assoc. 2012, 107, 1259−1271. (33) Mercier, P.; Lewis, M. J.; Chang, D.; Baker, D.; Wishart, D. S. J. Biomol. NMR 2011, 49, 307−323. (34) Reily, M. D.; Robosky, L. C.; Manning, M. L.; Butler, A.; Baker, J. D.; Winters, R. T. J. Am. Chem. Soc. 2006, 128, 12360−12361. (35) Rubtsov, D. V.; Waterman, C.; Currie, R. A.; Waterfield, C.; Salazar, J. D.; Wright, J.; Griffin, J. L. Anal. Chem. 2010, 82, 4479− 4485. (36) Weljie, A. M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C. M. Anal. Chem. 2006, 78, 4430−4442. (37) Wishart, D. S. TrAC, Trends Anal. Chem. 2008, 27, 228−237.

4612

dx.doi.org/10.1021/ac400237w | Anal. Chem. 2013, 85, 4605−4612