Comparison of 1D and 2D NMR Spectroscopy for ... - ACS Publications

Dec 15, 2007 - 1D proton spectra followed by statistical pattern recognition methods. The most popular statistical models are principal. * Correspondi...
12 downloads 32 Views 3MB Size
Comparison of 1D and 2D NMR Spectroscopy for Metabolic Profiling Que N. Van,*,† Haleem J. Issaq,† Qiujie Jiang,§ Qiaoli Li,§ Gary M. Muschik,† Timothy J. Waybright,† Hong Lou,‡ Michael Dean,‡ Jouni Uitto,§ and Timothy D. Veenstra† Laboratory of Proteomics and Analytical Technologies, Advanced Technology Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland 21702, Department of Dermatology and Cutaneous Biology, Thomas Jefferson University, Philadelphia, Pennsylvania 19107, and Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland 21702 Received September 11, 2007

High-resolution, liquid state nuclear magnetic resonance (NMR) spectroscopy is a popular platform for metabolic profiling because the technique is nondestructive, quantitative, reproducible, and the spectra contain a wealth of biochemical information. Because of the large dynamic range of metabolite concentrations in biofluids, statistical analyses of one-dimensional (1D) proton NMR data tend to be biased toward selecting changes in more abundant metabolites. Although two-dimensional (2D) proton–proton experiments can alleviate spectral crowding, they have been mainly used for structural determination. In this study, 2D total correlation spectroscopy NMR was used to compare the global metabolic profiles of urine obtained from wild-type and Abcc6-knockout mice. The 2D data were compared to an improved 1D experiment in which signal contributions from macromolecules and the urea peak have been spectroscopically removed for more accurate quantitation of low-abundance metabolites. Although statistical models from both 1D and 2D data could differentiate samples acquired from the two groups of mice, only the 2D spectra allowed the characterization of statistically relevant changes in the low-abundance metabolites. While acquisition of the 2D data require more time, the data obtained resulted in a more meaningful and comprehensive metabolic profile, aided in metabolite identifications, and minimized ambiguities in peak assignments. Keywords: metabolic profiling • metabonomics • metabolomics • NMR spectroscopy • TOCSY

Introduction Many human diseases result in specific and characteristic changes in the chemical and biochemical composition profiles of biological fluids and tissues. In the search for the causes of disease, toxicological progression, or recovery, metabolic profiling is a powerful complementary technique to other technologies such as genomics, transcriptomics, and proteomics. The primary aim of global profiling of the endogenous and exogenous small molecules in biological samples, referred to as metabolomics1 or metabonomics,2 is to discover biomarker(s) and cellular pathways that aid in early diagnosis, especially before the onset of symptoms, to allow for timely therapeutic interventions. The two most commonly used instrumental platforms for metabolic profiling are nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS).3–5 One of the first examples of metabolic profiling was conducted by Linus Pauling’s group in 1971 using gas chromatography (GC) to analyze volatile small molecules in urine vapor and in breath.6 * Corresponding author: Laboratory of Proteomics and Analytical Technologies, Advanced Technology Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, 21702. Phone: 301-8467192. Fax: 301-846-6037. E-mail: [email protected]. † SAIC-Frederick, Inc., National Cancer Institute at Frederick. § Thomas Jefferson University. ‡ Laboratory of Genomic Diversity, National Cancer Institute at Frederick.

630 Journal of Proteome Research 2008, 7, 630–639 Published on Web 12/15/2007

Jellum et al. used GC-MS for the metabolic profiling of approximately 100 different metabolic diseases eventually resulting in the discovery of 25 inborn errors of metabolism.7,8 Metabolic profiling using biological fluids from human subjects presents many challenges. These challenges stem from extensive biological variations that result from diet, diverse genetic makeup, and overall lifestyle choices. Confounding factors, such as contributions from gut microflora and the metabolic composition in human subjects, were recognized by Pauling et al., and in subsequent metabolic studies on human and mouse urine.9–11 Utilizing NMR to study biofluids is by no means a new concept. The potential of NMR spectroscopy to study living systems was recognized in the early 1980s12–14 and onedimensional (1D) proton (1H) NMR has been extensively used since 1985 in the field of inborn errors of metabolism.15,16 Many different biofluids and tissue types have been studied, the most popular being those that can be obtained noninvasively, including urine, serum, and plasma. Tissues are studied intact, or the aqueous/organic extracts are analyzed. In the case of intact tissues, high-resolution magic angle spinning NMR spectroscopy or imaging is used.17 A majority of published metabolic NMR profiling studies use 1D proton spectra followed by statistical pattern recognition methods. The most popular statistical models are principal 10.1021/pr700594s CCC: $40.75

 2008 American Chemical Society

1D and 2D NMR Spectroscopy for Metabolic Profiling 18

component analysis (PCA), partial least-squares discriminate analysis (PLS-DA),19,20 and orthogonal PLS (OPLS).21 Monitoring metabolites via 1H NMR has been the method of choice because of this nuclei’s high natural abundance (99.9%), sensitivity, and prevalence in endogenous metabolites relative to other nuclei, such as 13C, 15N, and 31P. While a 1D 1H spectrum can be obtained in less than 5 min, obtaining spectra with high signal-to-noise ratios requires accumulating a larger number of scans, making the total experimental time for a typical experiment about 30 min. With the use of a flow-based or tube-based automatic sample changer, studies involving hundreds or thousands of samples have been successfully analyzed within reasonable time frames.22 In general, many groups have successfully applied metabolic profiling by 1D 1H NMR to study the pathology and progression of various hepatic and renal toxins, cancer, age-related changes, and different mouse strains.23 The large dynamic range in metabolite concentrations found in biofluids and tissues, coupled with the fact that each metabolite has a range of concentrations that is considered physiologically normal,24,25 makes statistical analysis of 1D NMR data biased toward detecting changes in the more abundant metabolites. More often, the less abundant metabolites are not observed simply because they are hidden beneath NMR peaks originating from more concentrated metabolites. Attempts to overcome spectral crowding in 1D spectra have included (a) skyline projection of 2D J-resolved spectroscopy to obtain a broadband proton-decoupled 1D spectrum,26 (b) “targeted” metabolic profiling by spectral fitting of a select set of known metabolites in a 1D proton NMR database that have been carefully collected so that the pH and sample matrix match (in this case, human urine),27 (c) use of isotopically enriched metabolites,28 and (d) on- or off-line sample fractionation.29 All of these techniques alleviate peak congestion, but do not completely solve the problem. For instance, skyline projection of a 2D J-resolved spectrum collapses the multiplicity structure of coupled protons to give singlet peaks, but spectral crowding is still problematic due to the sheer volume of metabolites present, especially in the upfield region of the NMR spectrum (0–5 ppm). In addition, strong coupling artifacts will give rise to additional peaks that are not part of the sample.26 Targetor metabolite-specific studies depend upon a priori knowledge of the biological system or pathway utilized.27 Fan et al. successfully used a combination of metabolic profiling with the aid of 13C-labeled glucose and transcriptomic analysis to elucidate the mechanism of selenium anticancer action in human lung cancer cells.28 However, isotope labeling is not widely used in human or animal studies. Lastly, off-line sample fractionation would result in fractions of a less complex mixture that would give more meaningful data; however, this could easily lead to a prohibitive number of samples to analyze. Online fractionation, such as LC-NMR, is more amenable to automated, high-throughput analysis.29 Cloarec et al. combined cryoflow LC-NMR with statistical total correlation spectroscopy (STOCSY) of 1D spectra to help resolve coeluting metabolites with partially overlapping resonances.30 However, at least one reasonably well-resolved peak from each metabolite is needed to generate and drive STOCSY models.31,32 Multidimensional approaches have been used in gel electrophoresis, chromatography, and in NMR spectroscopy to increase resolution. Until recently, the traditional suite of 2D homo- and heteronuclear NMR experiments for high-resolution

research articles liquid state NMR spectroscopy, such as correlation spectroscopy (COSY), total correlation spectroscopy (TOCSY), heteronuclear single quantum correlation (HSQC), and heteronuclear multiple bond correlation (HMBC), was performed on a few select samples for structure elucidation purposes only. Welch et al. recently showed how localized 2D-COSY-based experiments can be used for in vivo monitoring of rat brain metabolites during vigabatrin treatment.33 The 2D imageselected in vivo (ISI)-COSY experiment allowed the resolution of overlapping resonances of γ-aminobutyric acid, glutamate, glutamine, and taurine, as well as resolution of these key brain metabolites from the more intense peaks of other metabolites. Dumas et al. used a 2D 1H-13C HMBC NMR experiment to study the urine of cattle treated with anabolic steroids.34 Compared to other NMR active nuclei, the chemical shift range of proton, 0–12 ppm, is quite narrow. Natural products have a carbon chemical shift range of over 200 ppm, and using carbon as the second dimension in the indirect detection 2D experiments can greatly help with peak dispersion. However, 13C only has a 1.1% natural abundance, making the 2D HMBC a very insensitive experiment. Separation of metabolites using diffusion-ordered spectroscopy (DOSY) alone, or concatenated with a 2D 1H-1H experiment, has been successfully applied to simple mixtures, but has not been used for global profiling of biofluids.35–37 To obtain a comprehensive metabolic profile of urine and, at the same time, overcome the crowding in 1D spectra, we used the 2D 1H-1H TOCSY experiment with the zero-quantum filter technique developed by Thrippleton et al. to obtain inphase peaks.38 The 2D TOCSY experiment alleviated the peak crowding that plagues 1D NMR spectra and allowed more accurate quantitation of low-abundance metabolites. This method was used to analyze and compare the metabolic profiles of urine obtained from wild-type and Abcc6-knockout mice using both 1D proton and 2D TOCSY NMR data. Pseudoxanthoma elasticum (PXE) is a heritable recessive connective tissue disorder with an incidence of 1 in 75 000.39 The disease is characterized by the progressive calcification of the elastic structures in the skin and eyes, as well as the gastrointestinal and cardiovascular systems.40 The cause of the disease has been associated with mutations of the ABCC6 gene on chromosome 16p13.1.41,42 The ABCC6 gene encodes the multidrug resistance-associated protein 6 (MRP6), a member of the ATP-binding cassette (ABC) family C. This protein is predicted to contain three transmembrane domains and two nucleotide-binding folds that are critical for its function as a transmembrane transporter.43 The functional relationship(s) between ABCC6 gene mutations and elastic fiber calcification, as well as the endogenous substrates(s) of MRP6 remain unknown. The 13 members of the human ABCC (CFTR/MRP) family display different numbers of transmembrane domains and are responsible for protection against toxic compounds and the secretion of organic anions.44 Other ABC genes have been implicated in a number of diseases. For example, Tangier disease, Harlequin ichthyosis, immune deficiency, DubinJohnson Syndrome, and cystic fibrosis are caused by mutations in ABCA1, ABCA12, ABCB2, ABCC2, and ABCC7, respectively.45,46 The discovery that MRP6 is expressed primarily in the liver and kidneys and that PXE patients display normal hepatic and renal function suggests that PXE may be primarily a metabolic disorder with secondary connective tissue manifestations.46 This theory was further supported by the work of Le Saux et al., who showed that when normal and PXE fibroblasts were Journal of Proteome Research • Vol. 7, No. 2, 2008 631

research articles

Van et al.

Table 1. Starting Mouse Urine Volume, Starting pH, and Concentration of NaOH Required To Obtain a Final pH of 7.4 mouse

starting vol. (µL)

starting pH

adjusted pH

NaOH (mM)

WT1 WT2 WT3 WT4 KO1 KO2 KO3

115 127 80 92 107 145 105

6.14 6.16 6.54 6.22 6.55 6.64 6.56

7.37 7.38 7.38 7.39 7.39 7.41 7.39

58.1 62.9 9.7 41.9 35.5 22.6 22.6

maintained in the presence of serum from PXE-affected individuals, abnormal aggregates of elastic fibers were deposited.47 Recently, a mouse model for PXE (Abcc6-/-) was developed.48 These null mice exhibited clinical features similar to PXE-affected patients, for example, mineralization of soft tissues in the skin, arterial blood vessels, and the eyes, yet appearing clinically healthy up to 22 months. The slow progression and delayed onset of the disease in the mouse model mimics PXE-affected patients, who can have a normal life span. Mineralization was noted in Abcc6-/- mice as early as at 5 weeks of age and was progressive with age. By the age of 3 months, the mineralization of connective tissues can be clearly detected by a total body computerized tomography scan, histopathology, and chemical assay of calcium and phosphate product.49

Experimental Section Animal Husbandry and Urine Collection. The PXE mouse model, Abcc6-/- knockout (KO) mouse, was developed by targeted ablation of the Abcc6 gene.48 The mice were maintained in the animal facility of the Thomas Jefferson University in a temperature- and humidity-controlled environment under 12-h light/dark cycles. Mice were fed a standard rodent diet (laboratory diet 5010; PMI Nutrition, Brentwood, MO) and had free access to water. To collect fresh and uncontaminated urine, the mice were held to urinate into 1.5 mL low adhesion microcentrifuge tube (USA Scientific, catalog No.1415–2600) outside of the animal cage. Urine samples were always collected at 2:00 p.m. from the age- and gender-matched wild-type (WT) and KO mice, put on dry ice, and then stored at -80 °C until use. Sample Preparation. A set of four WT and three Abbc6-/KO urine samples from 3-month-old male mice were analyzed by NMR. Raw mouse urine was diluted to 155 µL as necessary with deionized water to which 0.5 µL of sodium azide stock solution (0.1 mg/mL) had been added. The starting volumes for the raw mouse urine ranged from 80 to 145 µL (Table 1). The urine pH ranged from 6.14 to 6.64 and was manually adjusted, using a micro pH electrode (Thermo Scientific, Beverly, MA), to pH 7.4 by addition of dilute NaOH made up in D2O (Cambridge Isotope, Andover, MA). The samples were then centrifuged for 10 min, and 148.5 µL of urine was mixed with 8.25 µL of a phosphate buffer stock solution (0.5 M pD 7.4 in D2O), to help maintain a stable pH, and 8.25 µL of a 3-trimethylsilyl-propionate sodium salt (TSP) (Isotec, St. Louis, MO) stock (10 mM in D2O) as an internal reference. The solution was centrifuged for 5 min at 10 000g in an Eppendorf 5415D centrifuge, and transferred to a 3 mm NMR tube (Wilmad, Buena, NJ) for analysis. NMR Experiments. All NMR experiments were performed at 25 °C using a Varian INOVA 500 MHz spectrometer (Palo 632

Journal of Proteome Research • Vol. 7, No. 2, 2008

Alto, CA) equipped with a z-gradient, triple resonance HCN cryogenic probe. The tune and match was manually readjusted for each sample. The 1D pulse sequence used was based on the first increment of the NOESYPRESAT experiment, modified to contain an 80 ms Carr-Purcell Meiboom Gill (CPMG) pulse train of fast spin–echoes during the first two 90° pulses.50–52 The spectra were acquired with a relaxation delay of 5 s, during which the water was saturated, 256 transients, 5 s acquisition times, and a spectral width of 6000 Hz. The residual water and urea peaks were simultaneously saturated during the 106 ms mixing period. Two-dimensional TOCSY data were acquired with a 60 ms decoupling in the presence of scalar interactions (DIPSI-2)53 mixing period at 6 kHz field strength, 2048 × 512 complex points, 80 scans per increments, 5500 Hz sweep width for both dimensions, and an equilibrium delay of 1.3 s, during which water was saturated. The zero-quantum filters were placed at the beginning and end of the mixing period. Data Preparation and Statistical Analysis. The 1D proton spectra were zero-filled once to double the number of points, a 0.5 Hz exponential window function was applied, and phasing was done manually. The region from 0.5 to 9.6 ppm was binned using 0.04 and 0.02 ppm bin width intervals and integrated using the Varian VNMR6.1C software. 1D variable bin bucketing, also referred to as Intelligent Bucketing, was performed with ACDLabs’ software v10, with the bin size set at 0.04 ppm and a bin looseness factor set to 50%. Bin widths were automatically varied from 0.02 to 0.06 ppm. Bins containing residual water and a very limited region at the site of urea saturation were removed from statistical analysis, 4.64–4.96 and 5.76–5.84 ppm, respectively. The binning method and the number of resulting bins are as follow: (a) 1D 0.04 ppm constant bin width, 245 bins; (b) 1D 0.02 ppm constant bin width, 490 bins; (c) 1D variable bin width (0.02–0.06 ppm), 224 bins. The 2D TOCSY NMR spectra were processed with NMRPipe software using a shifted sinebell window function and zerofilled to 4 × 2K data points.54 The 2D spectra were prepared in two ways: (a) binned using a constant box size of 0.04 × 0.08 ppm and integrated using ACDLabs software, v10, 8840 bins; and (b) manually integrated to obtain a variable bin size using JEOL’s Delta processing software, v4.3.5, 1341 bins. For the constant bin size bucketing with ACDLabs’ SpecManager software (ACDLabs, Toronto, Canada), the residual water and urea diagonal peaks and the urea/water cross-peaks were removed from the data analysis. Diagonal peaks were removed from the 2D constant bin bucketing list by using the criterion that if the center of a bin is less than 0.09 ppm from the diagonal, then it is removed. This criterion resulted in the removal of 558 bins. 2D variable bin bucketing was performed manually. All of the spectra were overlaid using JEOL’s Delta software, the appropriate integration box sizes were manually drawn, and diagonal peaks were not integrated; the bin size ranged from 0.0072 × 0.0164 ppm to 0.3414 × 0.1156 ppm. The nonexcluded region between 0.5 and 9.6 ppm was normalized to the total integration, and multivariate statistical analyses were performed using Umetrics’ SIMCA-P11.5+ software. Three different models, PCA, PLS-DA, and OPLS, were generated using Pareto scaling and with or without log transformation of the data.

Results and Discussion Sample Preparation. Metabolic profiling for epidemiological studies and tracing toxicological or disease progression involv-

1D and 2D NMR Spectroscopy for Metabolic Profiling

research articles

Figure 1. (A) A 1D proton NMR spectrum of urine obtained from Abcc6 KO3 mouse using the NOESYPRESAT pulse sequence with water saturation during the equilibrium delay and mixing period. (B) Spectrum A plotted at 20× the vertical scale to show the impact of the urea peak on nearby resonances and broad peaks near the baseline from macromolecules, such as proteins, in the sample. (C) A 1D proton spectrum of the same sample, plotted at the same vertical scale as panel B, using a modified pulse sequence to spectroscopically remove both the urea peak and contribution from protein signals for better quantitation of small metabolites.

ing samples from different time points can quickly lead to a large number of samples. The desire for a robust method for high-throughput measurement of biofluids has lead many research groups to use 1D 1H NMR spectra from urine samples diluted using a phosphate stock solution to minimize pH variation in the sample set. The most popular sample preparation is a 2:1 dilution of the raw urine sample with a 100 or 200 mM phosphate buffer stock solution, pH 7.4, resulting in a final phosphate buffer concentration of 33.3 and 66.7 mM, respectively. Using this method, the samples’ final pH ranges from 6.7 to 7.6.55 Only molecules with a high sensitivity to pH, for example, histidine or citrate, are expected to undergo significant shifts in peak resonance frequencies in this pH range. Furthermore, binning the spectra using 0.04 ppm bin width (20 Hz) not only reduces data complexity for statistical analysis, but can also correct for any minor peak shifts due to pH differences since the peak is likely to remain within the same bin number. Thus, a direct comparison can be performed across the entire sample set. It is important to note that pH is logarithmically scaled, and thus, a 0.9 unit pH range is, in fact, quite a large change in H+ concentration. The simple addition of a fixed amount of buffer to save time has the unwanted effect of diluting the sample, making it more difficult to quantitate low-abundance metabolites. For this study, the sample pH was manually adjusted to 7.4 using NaOH to minimize dilution effects and salt concentration, and thereby increasing sensitivity. Miyataka et al. indicated that special attention be paid to

mouse urine pH values in metabonomic studies.56 The starting mouse urine volumes, pH, and the amount of NaOH required to adjust the samples to pH 7.4 ( 0.05 are listed in Table 1. No linear relationship between the starting urine volume and the amount of NaOH required was observed. One-Dimensional Pulse Sequence Modifications for Improved Quantitation. The 1D 1H NMR spectrum of a urine sample obtained from an Abcc6 knockout mouse (KO3) is shown in Figure 1A. The spectrum shows the high dynamic range of signals present, and a few of the major metabolites are labeled. The water peak was suppressed by on-resonance saturation during the equilibrium delay period and the 106 ms mixing period of the NOESYPRESAT experiment. The large urea peak at 5.81 ppm is also attenuated due to saturation exchange with water and its signal strength will vary with its concentration in the urine; therefore, the entire urea region, along with the residual water region (4.50–6.0 ppm) is usually removed from all subsequent statistical data analysis. In Figure 1B, the vertical scale of Figure 1A is increased by 20× revealing a number of small peaks that are present on the tail end of the urea peak (6.0–6.1 ppm). Their position relative to the urea resonance makes quantitation of these small peaks inaccurate. Other broad peaks near the baseline are seen in Figure 1B from proteins present in the sample, especially in the 0–1 and 7–10 ppm region. In the downfield region of the spectrum (5.0–10.0 ppm), the NMR spectrum is sparsely populated compared to the upfield Journal of Proteome Research • Vol. 7, No. 2, 2008 633

research articles

Van et al.

Figure 2. The downfield spectral expansions of (A) WT2 1D proton NMR spectrum, (B) WT2 2D TOCSY, and (C) Abcc6 KO3 2D TOCSY are shown. The 2D data show a wealth of information not obtainable from the 1D data. Statistically significant peaks from the PLS-DA models without log transformation are circled in orange, and those from log transformed data are colored green for increase in WT2 and purple for increase in KO3. The open circles indicate corresponding regions for better visualization. The combined use of 2D NMR data and log transformation enabled us to pick up contributions from the less abundant metabolites responsible for group separation, most of which were not detectable using 1D proton data.

region. This sparseness makes small perturbations in metabolites’ concentration readily detectable. The anomeric protons of various sugars and those from the ribose of nucleic acids, for example, are well-resolved in the 6-7 ppm region compared to other protons from the rest of the molecule, which lie in the crowded region between 3.2 and 4.5 ppm. In another example, R-glucose and β-glucose can be easily distinguished from each other solely based on the chemical shift of the anomeric proton, 5.23 and 4.64 ppm, respectively. If the broad urea peak is removed, quantitation of neighboring peaks is improved and additional peaks that were previously completely obscured by the urea peak will be uncovered and may be statistically significant. Both the urea and protein peaks in the 1D 1H NMR spectrum can be spectroscopically removed. The urea peak was removed by simultaneous saturation with the water peak during the mixing period, revealing a number of new small resonances. A recent study by Xu et al. removed protein signals from the 1D NMR spectrum by fitting a diffusion-edited spectrum of the sample containing NMR peaks from only the macromolecules using singular value decomposition to improve quantitation of the metabolites.57 On the basis of our earlier work with serum samples, we found that the simple insertion of an 80 ms CPMG pulse train of fast spin–echoes, (τ-180-τ)n with τ ) 200 µs and n ) 200, between the first two 90° pulses in the NOESYPRESAT experiment was quite adequate to spectroscopically remove the macromolecular signal contribution from the NMR spectrum.58 This technique exploits the difference in the transverse relaxation time of macromolecules and small molecules. The resulting protein- and urea-depleted spectrum is shown in Figure 1C, plotted at the same vertical scale as Figure 1B. Using this simple pulse sequence modification, additional sample preparation steps, such as protein precipitation, are avoided. Small resonances in the downfield region can now be more accurately quantified. Two-Dimensional TOCSY for Metabolic Profiling. The 2D TOCSY experiment was chosen for comparison to the traditional 1D method because it has the best potential to resolve resonances off the diagonal. During the mixing period of the TOCSY experiment, magnetization is allowed to transfer through634

Journal of Proteome Research • Vol. 7, No. 2, 2008

out the coupling network and cross-peaks far off the diagonal can be observed. The strength of this spin–spin coupling is dependent on the distance and the relative geometric orientation of the two protons. Historically, the TOCSY experiment has suffered from the presence of unwanted zero-quantum coherences that cause multiplets to appear distorted with antiphase characteristics. These peak distortions make it difficult to accurately quantitate species within complex mixtures containing a high dynamic range of metabolite concentrations. Thrippleton and Keeler reported a very efficient zero-quantum filter using a 180° adiabatic sweep in conjunction with a gradient pulse.38 Applying this deceptively simple filter enabled us to acquire 2D TOCSY spectra with in-phase cross-peaks for all but the strongest coupled protons, such as the two strongly coupled citrate doublets that have a coupling value of 16.2 Hz. To illustrate the wealth of additional information obtainable using 2D TOCSY (Figure 2), the aromatic region of the 1D spectrum from WT2 mouse is shown along with the corresponding 2D region from WT2 and Abcc6 KO3 mice. In the 2D spectra (Figure 2B,C), many peaks from low-abundance metabolites are well-resolved from neighboring peaks belonging to more abundant metabolites; the color coding is explained in the legend of Figure 2. In 1D data analysis, perturbations of these small peaks have a high probability of being missed because they are likely to be binned into the same bin number as the larger peaks. The ability to resolve peaks from lowabundance metabolites allows a richer global metabolic profile of each sample. When samples for metabolic profiling, especially urine, remain at room temperature for any length of time, sample degradation is always of great concern. Instead of relying solely on the addition of sodium azide to minimize sample degradation from potential bacterial growth, if the urine samples were carefully collected and prepared, the samples can remain stable for multiple days. This is demonstrated in Figure 3 where three 1D proton spectra were collected on the same KO2 urine sample at three different time points, (A) day 1, (B) day 43, and (C) day 58. After the first and second 1D proton spectra were collected, the sample remained in the spectrometer for an additional 65 and 159 h, respectively, while more NMR data

1D and 2D NMR Spectroscopy for Metabolic Profiling

Figure 3. Three 1D spectra were collected on the same urine sample, KO2, at different times to demonstrate sample stability at (A) day 1, (B) day 43, and (C) day 58. Note that spectral shimming is superior in panel A, and thus, some very narrow peaks appear higher in intensity compared to those in panels B and C. When not being analyzed, the sample was stored at -20 °C.

were being collected. When the sample was not being used, it was stored at -20 °C. Even after two freeze–thaw cycles, there were very little noticeable changes in the three spectra. Advantages of 2D Metabolic Profiling. Three multivariate models, PCA, PLS-DA, and OPLS, were generated for the 1D and 2D NMR data that were binned using constant and variable bin widths. The PCA models from the 2D data, but not the 1D data, showed clear group separation between the wild-type and knockout mice using the first two principal components (Figure 4A). As seen in the PCA score plot of the 1D 0.04 ppm binned data (Figure 4A), group separation can sometimes lie along two or more principal components. In this case, orthogonal signal corrections, such as those in OPLS-DA, are needed to rotate the axis so that group separation lies along the first component of the model. This process simplifies the subsequent analysis of loading coefficients for features responsible for intragroup variability; intergroup variability is now separated from intragroup variability. PLS-DA models of both the 1D and 2D showed complete group separation, and serendipitously, group separation essentially lied on the first component (Figure 4B). OPLS-DA models gave nearly identical loading coefficients to the PLS-DA models and are thus excluded from further analyses.

research articles A cursory examination of the scoring plots might indicate that any PLS-DA score plot, whether from the 1D or 2D data, would be sufficient and that there is no need for 2D NMR data. However, careful examination of the PLS-DA loading coefficients indicates otherwise. Figure 4C shows the top 20 loading coefficients for the bins showing increases in the wild-type mice, along with the confidence limits automatically calculated by the SIMCA software; the smaller the bar, the higher the confidence limit. A good potential biomarker would have both a high loading coefficient value and a high confidence limit. The loading coefficients of the 1D and 2D models are of similar value, but the 1D data have extremely poor confidence intervals and no obvious bins stand out as potential biomarkers. The loading coefficients from the 2D data, on the other hand, showed a number of statistically significant bins with very high confidence limits. Automated 1D variable bin size bucketing with the ACDLabs software, called Intelligent Bucketing, which attempts to group peaks that may have shifted slightly from one sample to the next into the same bin, did not fare much better than constant bin width bucketing using 0.04 or 0.02 ppm. In our study, the pH was carefully controlled, and there were negligible peak shifts across the small sample set. Therefore, it was not surprising to observe similar statistical results for the three 1D binning methods, even when using half the constant bin width of 0.04 ppm typically used in the literature. The loading coefficients from the 1D and 2D PLS-DA models, with and without log transformation, were sorted, and the top 10 WT and Abcc6 KO bins that showed an increase are listed in Table 2. Bins have been colored according to metabolite identifications to show similarities and differences among the various models. In general, bins from the upfield region of the NMR spectrum, where the majority of the abundant metabolites are observed, dominated the loading coefficients in the 1D data. Similarly, PCA and PLS-DA models of the 2D constant bin width bucketing were completely biased toward the large diagonal peaks. In fact, for the 2D full-spectrum constant bin width bucketing, 8 out of the top 10 loading coefficients are diagonal peaks. Removing the diagonal peaks and/or log transformation of the 2D data before multivariate statistical analysis helped resolve contributions from the less abundant peaks and generated better models, as illustrated in Figure 2B,C for the aromatic region of the spectra. Statistically significant peaks from the 2D variable bin size bucketing without log transformation are circled in orange and those from log transformed data are colored purple and green. Just as multiple bins belonging to citrate, taurine, and hippurate appeared in the same 1D models, multiple crosspeaks from the same metabolites appeared in the same 2D models. This redundancy in information is quite useful because it increases the certainty that a particular metabolite is statistically significant and because not all 2D cross-peaks are wellresolved. For the wild-type mice, increases were seen for citrate, trimethylamine (TMA), taurine, and creatine/creatinine in the 1D models with and without log scaling, as well as the 2D data without log scaling. For the knockout mice, the 1D and 2D data without log transformation showed increases in succinate, methylamine, and hippurate. However, with the exception of methylamine, these increases for the knockouts were not statistically significant when the data were log-transformed. Instead, cross-peaks in the 6.38, 7.14 and 7.14, 6.38 ppm regions dominated the models for the knockouts. In the 2D TOCSY expansion of the aromatic region from WT2 and Abcc6 KO3 Journal of Proteome Research • Vol. 7, No. 2, 2008 635

research articles

Van et al.

Figure 4. (A) PCA and (B) PLS-DA score plots of 1D and 2D NMR data binned using constant and variable bin size widths. (C). PLS-DA loading coefficients sorted to show the top 20 loading coefficients with confidence limits for the wild-type mice.

(Figures 2B,C), cross-peaks in these two regions are barely noticeable in the WT2 mice. The 6.38 ppm region was not very crowded, and the 1D log-transformed model was able to pick up the small changes in this region. Corresponding small changes for peaks in the 7.14 ppm region, however, could not be observed due to spectral crowding. Removal of the urea peak from the 1D spectra allowed better quantitation of small peaks in the downfield region. Log transformation of the data showed that differences observed in the intensity of the small peaks Abcc6 KO 6.38 ppm (unknown-2, U2) and WT 5.76 ppm were statistically significant in 5 of 6 1D and all 6 of the 2D log-transformed models (Table 2B,D). Without log transformation, instead of being in the top 5 loading coefficients, the WT 5.76 ppm peak appeared as 636

Journal of Proteome Research • Vol. 7, No. 2, 2008

number 19 in both the 1D 0.04 ppm constant bin bucketing and the Intelligent Bucketing models. As another example, the very small cross-peaks at 3.43, 2.85 and 2.86, 3.46 ppm, labeled as unknown-1 (U1), were only picked up in the 2D logtransformed models. In the 1D data, a change at 3.43 and 2.85 ppm might have been mistakenly attributed to changes in taurine and TMA individually. The 5.76 ppm peak is the anomeric proton of a ribose fragment. Identification and validation of these and other statistically significant peaks are currently underway and will be reported elsewhere. Log transformation reduced the dynamic scale of the 1D and 2D data and allowed perturbations in low-abundance metabolites to be picked up in the PLS-DA models. This procedure, however, has the potential to scale up unwanted noise. In this

1D and 2D NMR Spectroscopy for Metabolic Profiling

research articles

Table 2. The Top 10 Loading Coefficients of (A and B) 1D and (C and D) 2D PLS-DA Models with and without Log Transformationa

a The numbers, 1-10, on the left of each model denote the order of statistical importance. Common bins among the models are color coded by metabolite identification. (U1, U2, and U3 stand for unknowns 1–3.)

Journal of Proteome Research • Vol. 7, No. 2, 2008 637

research articles

Van et al.

Table 3. Log Transformation on a Model Data Set Xvar

log10

log2

ln

0.00001 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000

-5 -4 -3 -2 -1 0 1 2 3 4 5

-16.61 -13.29 -9.97 -6.64 -3.32 0 3.32 6.64 9.97 13.29 16.61

-11.51 -9.21 -6.91 -4.61 -2.30 0 2.30 4.61 6.91 9.21 11.51

study, log-10 was used, but perhaps a less severe transformation using log-2 or natural log (ln) would minimize noise contribution. A comparison of various log transformations on a model data set is shown in Table 3. The main disadvantage of removing the diagonal peaks from statistical analysis is the loss of information from compounds that do not contain protons coupled with each other and, therefore, do not show crosspeaks. Such resonances, which include dimethylamine (DMA) and trimethylamine oxide (TMAO), are fortunately among the more abundant metabolites, and therefore, their changes can easily be detected in the 1D 1H spectrum. It is suggested that 1D and 2D NMR data, along with log transformation, be used for global metabolic profiling so that perturbations in both the less and more abundant metabolites are observed.

Conclusions The collection of 2D NMR data is necessary for studies where the differences in metabolic profiles involve changes in lowabundance metabolites. Although acquiring 2D data is timeconsuming, the data provides a more comprehensive, global metabolic profile and increases the chances of finding meaningful differences between control and affected groups than 1D NMR data. Additionally, we observed that metabolite identifications were much easier to achieve using 2D NMR data, and ambiguities in peak identifications were minimized due to better signal dispersion.

Acknowledgment. This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. Q.N.V. thanks Bruce Adams (Merck) for assistance in making the selective saturation pulse, Frank Delaglio (National Institute of Diabetes and Digestive and Kidney Diseases, NIH) for help with data processing in NMRPipe, and Ashok Krishnaswami (JEOL USA) for help with JEOL’s Delta software. The Abcc6-/- mice were developed with support from the National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Grant R01AR28450. References (1) Oliver, S. G.; Winson, M. K.; Kell, D. B.; Baganz, F. Trends Biotechnol. 1998, 16, 373–378. (2) Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica 1999, 11, 1181–1189. (3) Dunn, W. B.; Ellis, D. I. Trends Anal. Chem. 2005, 24, 285–294.

638

Journal of Proteome Research • Vol. 7, No. 2, 2008

(4) Lindon, J. C.; Holmes, E.; Nicholson, J. K. Anal. Chem. 2003, 75, 384A–391A. (5) Want, E. J.; Cravatt, B. F.; Siuzdak, G. ChemBioChem 2005, 6, 1941– 1951. (6) Pauling, L.; Robinson, A. B.; Teranishi, R.; Cary, P. Proc. Natl. Acad. Sci. U.S.A. 1971, 68, 2374–2376. (7) Jellum, E.; Stokke, O.; Eldjarn, L. Scand. J. Clin. Lab. Invest. 1971, 27, 273–285. (8) Jellum, E. Philos. Trans. R. Soc. London, Ser. A 1979, 293, 13–19. (9) Robinson, A. B.; Cary, P.; Dore, B.; Keaveny, I.; Brenneman, L.; Turner, M.; Pauling, L. J. Int. Res. Commun. 1973, 1, 47. (10) Dirren, H.; Robinson, A. B.; Pauling, L. Clin. Chem. 1975, 21, 1970– 1975. (11) Robinson, A. B.; Dirren, H.; Sheets, A.; Miquel, J.; Lundgren, P. R. Exp. Gerontol. 1976, 11, 11–16. (12) Gadian, D. G. Nuclear Magnetic Resonance and Its Applications to Living Systems, 1st ed.; Oxford University Press: Oxford, 1982. (13) Nicholson, J. K.; Wilson, I. D. Prog. Nucl. Magn. Reson. Spectrosc. 1989, 21, 449–501. (14) Bell, J. D.; Brown, J. C.; Sadler, P. J. NMR Biomed. 1989, 2, 246– 256. (15) Iles, R. A.; Hind, A. J.; Chalmers, R. A. Clin. Chem. 1985, 31, 1975– 1801. (16) Yamaguchi, S.; Koda, N.; Eto, Y.; Aoki, K. J. Pediatr. 1985, 106, 620– 622. (17) Beckonert, O.; Monnerjahn, J.; Bonk, U.; Leibfritz, D. NMR Biomed. 2003, 16, 1–11. (18) Jackson, J. E. A User’s Guide to Principal Components, 1st ed.; Wiley: New York, 1991. (19) Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J. SIAM J. Sci. Stat. Comput. 1984, 5, 735–743. (20) Gavaghan, C. L.; Wilson, I. D.; Nicholson, J. K. FEBS Lett. 2002, 530, 191–196. (21) Trygg, J.; Wold, S. J. Chemom. 2002, 16, 119–128. (22) Dumas, M. E.; Maibaum, E. C.; Teague, C.; Ueshima, H.; Zhou, B.; Lindon, J. C.; Nicholson, J. K.; Stamler, J.; Elliott, P.; Chan, Q.; Holmes, E. Anal. Chem. 2006, 78, 2199–2208. (23) Serkova, N. J.; Niemann, C. U. Expert Rev. Mol. Diagn. 2006, 6, 717–731. (24) Wishart, D. S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A. C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S.; Fung, C.; Nikolai, L.; Lewis, M.; Coutouly, M. A.; Forsythe, I.; Tang, P.; Shrivastava, S.; Jeroncic, K.; Stothard, P.; Amegbey, G.; Block, D.; Hau, D. D.; Wagner, J.; Miniaci, J.; Clements, M.; Gebremedhin, M.; Guo, N.; Zhang, Y.; Duggan, G. E.; Macinnis, G. D.; Weljie, A. M.; Dowlatabadi, R.; Bamforth, F.; Clive, D.; Greiner, R.; Li, L.; Marrie, T.; Sykes, B. D.; Vogel, H. J.; Querengesser, L. Nucleic Acids Res. 2007, 35, D521–526. (25) Burtis, C. A.; Ashwood E. R., Eds.; Tietz Textbook of Clinical Chemistry, 3rd ed.; Sauders W. B. Co.: Philadelphia, PA, 1999; pp 1799–1839. (26) Foxall, P. J.; Parkinson, J. A.; Sadler, I. H.; Lindon, J. C.; Nicholson, J. K. J. Pharm. Biomed. Anal. 1993, 11, 21–31. (27) Weljie, A. M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C. M. Anal. Chem. 2006, 78, 4430–4442. (28) Fan, T. W.; Bandura, L. L.; Higashi, R. M.; Lane, A. N. Metabolomics 2005, 1, 325–339. (29) Borlak, J.; Walles, M.; Elend, M.; Thum, T.; Preiss, A.; Levsen, K. Xenobiotica 2003, 33, 655–676. (30) Cloarec, O.; Campbell, A.; Tseng, L. H.; Braumann, U.; Spraul, M.; Scarfe, G.; Weaver, R.; Nicholson, J. K. Anal. Chem. 2007, 79, 3304– 3311. (31) Cloarec, O.; Dumas, M. E.; Craig, A.; Barton, R. H.; Trygg, J.; Hudson, J.; Blancher, C.; Gauguier, D.; Lindon, J. C.; Holmes, E.; Nicholson, J Anal. Chem. 2005, 77, 1282–1289. (32) Holmes, E.; Cloarec, O.; Nicholson, J. K. J. Proteome Res. 2006, 5, 1313–1320. (33) Welch, J. W.; Bhakoo, K.; Dixon, R. M.; Styles, P.; Sibson, N. R.; Blamire, A. M. NMR Biomed. 2003, 16, 47–54. (34) Dumas, M. E.; Canlet, C.; André, F.; Vercauteren, J.; Paris, A. Anal. Chem. 2002, 74, 2261–2273. (35) Newman, J. M.; Jerschow, A. Anal. Chem. 2007, 79, 2957–2960. (36) Tsuda, M.; Yasuda, T.; Fukushi, E.; Kawabata, J.; Sekiguchi, M.; Fromont, J.; Kobayashi, J. Org. Lett. 2006, 8, 4235–4238. (37) Bradley, S. A.; Krishnamurthy, K.; Hu, H. J. Magn. Reson. 2005, 172, 110–117. (38) Thrippleton, M. J.; Keeler, J. Angew. Chem., Int. Ed. 2003, 42, 3938– 3941. (39) PXE International homepage, http://www.pxe.org. (40) Uitto, J.; Ringpfeil, F. In Principles of Molecular Medicine, 2nd ed.; Humana Press: Totowa, NJ, 2006; pp 1035–1042.

research articles

1D and 2D NMR Spectroscopy for Metabolic Profiling (41) Bergen, A. A.; Plomp, A. S.; Schuurman, E. J.; Terry, S.; Breuning, M.; Dauwerse, H.; Swart, J.; Kool, M.; van Soest, S.; Baas, F.; ten Brink, J. B.; de Jong, P. T. Nat. Genet. 2000, 25, 228–231. (42) Ringpfeil, F.; Lebwohl, M. G.; Christiano, A. M.; Uitto, J. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 6001–6006. (43) Belinsky, M. G.; Chen, Z. S.; Shchaveleva, I.; Zeng, H.; Kruh, G. D. Cancer Res. 2002, 62, 6172–6177. (44) Pohl, A.; Devaux, P. F.; Herrmann, A. Biochim. Biophys. Acta 2005, 1733, 29–52. (45) Dean, M.; Rzhetsky, A.; Allikmets, R. Genome Res. 2001, 11, 1156– 1166. (46) Uitto, J. Trends Mol. Med. 2005, 11, 341–343. (47) Le Saux, O.; Bunda, S.; VanWart, C. M.; Douet, V.; Got, L.; Martin, L.; Hinek, A. J. Invest. Dermatol. 2006, 126, 1497–1505. (48) Klement, J. F.; Matsuzaki, Y.; Jiang, Q. J.; Terlizzi, J.; Choi, H. Y.; Fujimoto, N.; Li, K.; Pulkkinen, L.; Birk, D. E.; Sundberg, J. P.; Uitto, J. Mol. Cell. Biol. 2005, 25, 8299–8310. (49) Jiang, Q.; Li, Q.; Uitto, J. J. Invest. Dermatol. 2007, 127, 1392–1402.

(50) Kumar, A.; Ernst, R. R.; Wüthrich, K. Biochem. Biophys. Res. Commun. 1980, 95, 1–6. (51) Carr, H. Y.; Purcell, E. M. Phys. Rev. 1954, 94, 630–638. (52) Meiboom, S.; Gill, D. Rev. Sci. Instrum. 1958, 29, 688–691. (53) Shaka, A. J.; Lee, C. J.; Pines, A. J. Magn. Reson. 1988, 77, 274–293. (54) Delaglio, F.; Grzesiek, S.; Vuister, G. W.; Zhu, G.; Pfeifer, J.; Bax, A. J. Biomol. NMR 1995, 6, 277–293. (55) Lindon, J. C.; Nicholson, J. K.; Holmes, E.; Everett, J. R. Concepts Magn. Reson. 2000, 12, 289–320. (56) Miyataka, H.; Ozaki, T.; Himeno, S. Biol. Pharm. Bull. 2007, 30, 667–670. (57) Xu, Q.; Sachs, J. R.; Wang, T. C.; Schaefer, W. H. Anal. Chem. 2006, 78, 7175–7185. (58) Van, Q. N.; Chmurny, G. N.; Veenstra, T. D. Biochem. Biophys. Res. Commun. 2003, 301, 952–959.

PR700594S

Journal of Proteome Research • Vol. 7, No. 2, 2008 639