Article pubs.acs.org/ac
Evaluation of 1H NMR Metabolic Profiling Using Biofluid Mixture Design Toby J. Athersuch,*,†,‡ Shahid Malik,§ Aalim Weljie,∥,⊥ Jack Newton,∥ and Hector C. Keun† †
Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Sir Alexander Fleming Building, Imperial College London, South Kensington, SW7 2AZ, U.K. ‡ MRC-PHE Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, Norfolk Place, London, W2 1PG, U.K. § Chenomx Inc., Suite 800, 10050 112 Street, Edmonton, Alberta, T5K 2J1, Canada ∥ Department of Biological Sciences, Bio-NMR Center, University of Calgary, Calgary, Alberta, T2N 1N4, Canada ⊥ Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, 10-113 Translational Research Center, 3400 Civic Center Boulevard, Building 421, Philadelphia, Pennsylvania 19104, United States S Supporting Information *
ABSTRACT: A strategy for evaluating the performance of quantitative spectral analysis tools in conditions that better approximate background variation in a metabonomics experiment is presented. Three different urine samples were mixed in known proportions according to a {3, 3} simplex lattice experimental design and analyzed in triplicate by 1D 1H NMR spectroscopy. Fifty-four urinary metabolites were subsequently quantified from the sample spectra using two methods common in metabolic profiling studies: (1) targeted spectral fitting and (2) targeted spectral integration. Multivariate analysis using partial least-squares (PLS) regression showed the latent structure of the spectral set recapitulated the experimental mixture design. The goodness-of-prediction statistic (Q2) of each metabolite variable in a PLS model was calculated as a metric for the reliability of measurement, across the sample compositional space. Several metabolites were observed to have low Q2 values, largely as a consequence of their spectral resonances having low s/n or strong overlap with other sample components. This strategy has the potential to allow evaluation of spectral features obtained from metabolic profiling platforms in the context of the compositional background found in real biological sample sets, which may be subject to considerable variation. We suggest that it be incorporated into metabolic profiling studies to improve the estimation of matrix effects that confound accurate metabolite measurement. This novel method provides a rational basis for exploiting information from several samples in an efficient manner and avoids the use of multiple spike-in authentic standards, which may be difficult to obtain.
■
simultaneously, in a highly robust and reproducible manner.9 A large number of studies have used NMR as the primary analytical tool for exploring a wide range of research questions, including those in drug toxicity testing,10 efficacy assessment,11 and population studies.12,13 Metabolic profiles obtained by NMR spectroscopy typically contain hundreds or thousands of real spectral features, along with confounding signals (including noise), that span a wide dynamic range (>8 orders of magnitude). Accurate quantification of metabolites using biofluid spectra obtained by NMR spectroscopy can be relatively difficult, as a consequence of signal overlap, resulting from insufficient spectral resolution, that may reduce accuracy of integral measurements made as an estimator of
INTRODUCTION Metabolic Profiling. Metabolic profiling (metabonomics/ metabolomics) has become a key platform in systems biology; the application of spectroscopy or spectrometry to biological samples provides a multicomponent metabolic phenotype that reflects a large number of interacting upstream processes including gene expression, cellular status, and organism function.1−3 Metabolite profiles are currently being used in a wide variety of contexts including “bench-to-bedside” translational medicine,4,5 real-time profiling for enhanced biomarker-based decision tools for clinicians/surgeons,6 and high-throughput metabolic phenotyping in large-scale molecular epidemiological studies aimed at understanding chronic disease risk and etiology.7,8 Nuclear magnetic resonance (NMR) spectroscopy is a core analytical platform used to characterize biological matrices in metabolic profiling studies as it provides quantitative spectra that capture concentration information on multiple metabolites © XXXX American Chemical Society
Received: February 10, 2013 Accepted: June 3, 2013
A
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
are discussed. We suggest that this strategy may have general benefits and applicability in metabolic profiling studies.
concentration. In addition to peak-alignment based methods of spectral deconvolution,14 spectral fitting approaches, that use individual template spectra matching for each metabolite, are commonly used in an attempt to reduce the influence of signal overlap on quantification.15 By virtue of the spectral matching process (that typically uses multiple peak fits in the spectrum to provide a best estimate of concentration), spectral fitting has the advantage oversimple spectral integration in that it is inherently less affected by background variation arising from the sample matrix, and from spectral artifacts such as the residual water peak in aqueous sample spectra. However, spectral fitting is more time-consuming and may be prone to user error or subjectivity. A common approach used to assess the quantification accuracy in biofluid spectra is the use of traditional “spike-in” experiments, whereby authentic standards are added to the sample in known concentration. However, these experiments are conducted typically in the context of an invariant background, which is often not representative of the “real world” scenario where baseline signals from different samples vary due to numerous matrix effects. Additional measures of the spectral quality and reliability of individual measurements made in metabolic profiling studies, that characterize performance in real sample sets, are therefore of potential utility to the metabolic profiling community. Mixture Design. Mixture design experiments are routinely used for the selection of optimal criteria for production processes, formulation, and more generally in the characterization of relationships between response and system composition. There are numerous designs that can be used, depending on the constraints placed on the mixture components; a simplexlattice design reflects one of the simplest designs, and is described as follows: “A {q, m} simplex-lattice design for q components consists of points defined by the following coordinate settings: the proportions assumed by each component take the m+1 equally spaced values from 0 to 1
■
MATERIALS AND METHODS
Chemicals. D2O was obtained from Goss Scientific (Nantwich, U.K.). All other reagents were of analytical grade and obtained from SigmaAldrich (Poole, U.K.). Experimental Design. A schematic of the experimental design is shown in (Figure 1), with details of discussed in turn below.
Figure 1. Schematic showing the overall approach described. Three different urines were mixed in known proportions according to a mixture design (1, 2). Concentrations of metabolites were determined by 1H NMR spectroscopy (3). Spectral fitting and spectral integration were both used for quantification (4). The mixture design data (Y block) were used in a PLS regression against the metabolite concentration data (X block) to generate model metrics (5).
Table 1. Sample Composition for Designed Biofluid Mixtures Used in This Study Following a {3,3} SimplexLattice Mixture Design
xi = 0, 1/m , 2/m , ..., 1 for i = 1, 2, ..., q
...and all possible combinations (mixtures) of the proportions from this equation are used.”16 The proportions must sum to unity. For example, a {3,3} simplex lattice design represents three components (q = 3), each of which have four (m + 1 = 4), equally spaced, different possible levels (0, 1/3, 2/3, 1), and therefore will have ten possible mixture combinations. We propose that mixing different biofluid samples in known proportions according to a mixture design (such as a simplex lattice) will produce a sample set that enables metabolite behavior across the sample compositional space to be characterized by regression of the design against the metabolite response. In an ideal situation, the observed response of an individual metabolite will exactly follow the mixture design, and a perfect fit will be achieved. In reality, matrix effects and confounding signal overlap may reduce the accuracy of metabolite responses and reduce the correspondence with the mixture design. Thus, this approach allows the reproducibility of individual metabolites to be assessed, and those that are adversely affected by matrix effects or signal overlap to be identified. Here we have applied this strategy of mixing intact biofluids, according to a predetermined experimental mixture design, to compare the performance of two commonly used metabolite quantification methods in the context of “real world” 1H NMR metabonomic analysis. The potential benefits of incorporating a designed mixture component in metabonomic analyses, as a method of assessing the accuracy of metabolite quantification,
volume (μL) rat urine
human urine
sample number
0−8 h
8−24 h
spot sample
sodium phospate buffer
1 2 3 4 5 6 7 8 9 10
300 0 0 200 200 100 0 100 0 100
0 300 0 100 0 200 200 0 100 100
0 0 300 0 100 0 100 200 200 100
300 300 300 300 300 300 300 300 300 300
Sample Collection and Preparation. Urine samples were obtained from an existing large-scale toxicological study resource.10,17,18 Sprague−Dawley rats (n = 7) were individually housed in standard metabolism cages (21 ± 3 ◦C, relative humidity 55 ± 15%) and acclimatized for six days prior to the start of the study (t = 0 h). A standard diet (Purina chow 5002) and fresh water (acidified to pH 2.5 using HCl to prevent microbial growth) was available to each animal ad libitum. Urine samples used in the current study were collected during B
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
Table 2. 1H NMR Spectral Regions Used for Integration, Estimated Metabolite Concentrations Derived from Spectral Fitting Using Chenomx Software, and PLS Model Metricsa spectral region (ppm)
Q2
estimated concentration (μM)
ID
metabolite
high
low
width
min
max
mean
STDEV
A
B
C
D
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1-methylnicotinamide 1,3-dimethyluratec,f 1,6-anhydro-β-D-glucose f 2-hydroxyisobutyratea,c,f 2-oxoglutarate 2-oxoisocaproateb,c,f 3-hydroxyisovalerate 3-indoxylsulfate acetate alanine allantoinc arginine betainec choline cis-aconitate citrate creatinine dimethylamine ethanolc,d,f ethanolamine formate fucose f fumarateb,d glucosed,e glycine glycolated guanidinoacetated hippurate isoleucine lactatec leucined malonated,f methanol methylmalonate methylsuccinate f N,N-dimethylformamidea,f N,N-dimethylglycine f N-acetylglycinee O-phosphocholinee,f oxaloacetatea,c,d phenylacetylglycinef pyruvated succinate taurine threoninec trans-aconitate trigonelline trimethylamine N-oxidec tryptophana,d tyrosine uracilb,e ureae valine xylose
4.47 3.29 5.45 1.35 3.02 0.94 1.26 7.51 1.91 1.48 5.42 1.95 3.90 3.18 3.11 2.56 3.05 2.72 1.19 3.15 8.45 1.25 6.51 4.65 3.56 3.94 3.79 7.85 1.01 1.33 0.97 3.12 3.35 1.24 1.08 2.86 2.92 2.03 3.19 3.67 7.43 2.37 2.40 3.43 1.33 6.59 4.43 3.26 7.76 6.90 7.54 5.95 0.99 4.59
4.46 3.28 5.44 1.34 2.98 0.92 1.25 7.48 1.91 1.46 5.34 1.87 3.88 3.18 3.09 2.50 3.02 2.70 1.16 3.12 8.44 1.23 6.51 4.63 3.55 3.93 3.78 7.81 0.99 1.31 0.94 3.11 3.34 1.22 1.06 2.85 2.91 2.03 3.18 3.66 7.39 2.36 2.39 3.39 1.31 6.56 4.42 3.25 7.73 6.87 7.52 5.61 0.97 4.56
0.01 0.01 0.01 0.01 0.04 0.02 0.01 0.03 0.01 0.02 0.08 0.08 0.02 0.01 0.02 0.06 0.02 0.02 0.03 0.03 0.01 0.02 0.01 0.03 0.01 0.01 0.01 0.04 0.02 0.03 0.03 0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.04 0.01 0.02 0.04 0.02 0.03 0.01 0.01 0.03 0.02 0.03 0.34 0.03 0.03
11 24 24 0 104 0 20 75 10 65 19 129 29 15 161 1775 1873 125 9 68 30 53 0 198 149 104 96 256 9 65 11 8 11 58 14 14 0 27 7 0 103 11 17 466 9 15 54 124 0 34 0 90366 17 102
182 312 70 36 4759 76 32 288 107 106 3929 184 462 50 459 14420 5154 452 36 204 321 197 42 641 458 198 332 3696 15 106 16 43 87 156 52 94 59 266 42 308 235 58 1100 1843 62 1834 294 481 41 48 123 207739 24 561
115 149 44 16 2737 41 26 175 73 92 2530 153 233 30 321 8951 3388 285 20 144 184 135 26 385 270 152 190 2158 12 81 14 26 50 95 35 48 11 131 25 76 160 32 537 1277 43 845 168 280 24 39 66 152730 20 287
55 93 15 15 1491 23 4 64 32 14 1162 17 133 13 89 4025 1028 100 9 41 88 42 13 140 95 29 79 1060 2 15 1 11 27 27 12 25 23 78 11 125 41 15 335 448 19 566 75 116 17 5 36 44253 3 151
0.99 0.98 0.89 0.42 0.99 0.94 0.80 0.98 0.98 0.94 0.73 0.55 0.92 0.88 0.51 0.94 0.98 0.98 0.97 0.84 0.83 0.97 0.96 0.90 0.95 0.97 0.87 0.81 0.99 0.62 0.87 0.25 0.70 0.56 0.81 0.99 0.75 −0.15 0.96 0.95 0.52 0.87 0.77 0.99 0.98 0.96 0.88 1.00 0.98 0.96 0.14 0.70 0.91 0.66
0.20 0.99 0.01 0.98 0.99 0.99 0.98 0.96 0.99 0.98 0.96 0.98 0.99 0.95 0.99 0.99 0.98 0.98 0.98 0.98 0.98 0.97 0.97 −0.04 0.96 0.98 0.99 0.99 0.97 0.98 0.98 0.98 0.92 0.94 0.97 0.98 0.94 0.99 0.98 0.98 0.98 0.99 0.99 0.99 0.98 0.99 0.72 0.99 0.92 0.94 0.98 0.91 0.97 0.00
0.91 0.90 0.25 0.39 0.89 0.82 0.85 −0.01 0.89 0.80 −0.06 0.66 0.69 0.87 0.66 0.88 0.85 −0.05 −0.07 0.83 0.76 0.68 0.71 0.01 0.81 0.82 −0.07 0.88 0.73 0.75 0.89 0.03 0.49 0.72 0.03 0.74 0.01 0.93 0.82 −0.21 0.34 0.25 0.90 0.85 0.71 0.91 0.76 0.30 0.89 0.68 0.67 0.45 0.69 0.85
−0.10 0.73 −0.11 0.61 0.84 0.83 0.63 0.65 0.72 0.56 0.70 0.25 0.84 0.80 0.23 0.85 0.79 0.59 0.45 0.66 0.87 0.64 0.80 −0.01 0.74 0.61 0.36 0.86 0.56 0.45 −0.11 0.43 0.78 0.77 0.46 0.44 0.76 0.90 0.65 0.29 0.06 0.57 0.89 0.24 0.48 0.90 −0.09 0.64 0.62 −0.01 0.82 0.65 0.54 0.11
a The Q2 statistic is given for each of four models: (A) spectral fitting data set normalized to TSP, (B) spectral integration data set normalized to TSP, (C) spectral fitting data set normalized using PQN, and (D) spectral integration data set normalized using PQN. aMetabolite present in only one of the component samples. bMetabolite absent in one of the component samples. cOverlapped signal. dLow s/n. eSpectral artifact present in region. fTentative assignment.
C
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
Figure 2. 1H NMR spectra of urine samples used in this study: (A) Pooled rat urine 0−8 h collection (n = 7), (B) pooled rat urine 8−24 h collection (n = 7), and (C) human urine spot collection. Spectra were acquired at an observation frequency of 600 MHz using a standard 1D pulse sequence with water presaturation.
NMR Spectral Acquisition and Processing. 1H NMR spectra were acquired on a Bruker AVANCE DRX600 NMR spectrometer (Bruker Biospin, Rheinstetten, Germany) operating at 14.1 T (600.29 MHz 1H NMR frequency) using a PH FI TXI 600SB 5 mm probe maintained at 300 K. Samples were introduced to the probe using a BEST flow-injection system (Bruker) in a randomized order. Gradient shimming was used immediately prior to spectral acquisition to ensure high field homogeneity. Spectral acquisition was made using standard a standard 1D pulse sequence (RD-90°-t1-90°-tm-90°-AQ).20 The t1 delay and the mixing time (tm) were set to 3 μs and 100 ms respectively. All spectra were collected as the sum of 128 free induction decays (FIDs) were collected into 32K complex data points. The spectral width of 12019.23 Hz (20 ppm) giving the FID a native resolution of 0.366 Hz/pt, and an acquisition time (AQ) of 1.36 s. A 2 s relaxation delay (RD) was used between pulses. A presaturation pulse was applied to the water resonance (δH = 4.7 ppm) during RD and tm. Processing of the raw NMR data for analysis using a targeted integration approach was carried out using XWINNMR software (Bruker Biospin, Rheinstetten, Germany), with each FID being multiplied by an exponential weighting function equivalent to a line broadening of 1 Hz prior to Fourier transformation. Resulting frequency-domain spectra were referenced to TSP (δH = 0.00 ppm) and interpolated from 32K to ∼42K data points using a cubic spline function to regularize the abscissa and improve calibration accuracy
two periods (0−8 h, 8−24 h) from control animals. Urine voided by the animals was collected in the metabolism cage into a container cooled by dry ice. Samples were subsequently stored at −40 °C. Each sample underwent two freeze−thaw cycles before use in this work as a consequence of realiquoting. Further study information has previously been published.17 Additionally, a spot urine sample (5 mL) was obtained from a healthy human volunteer, according to established protocols, including filtration to remove cellular material (0.2 μm Minisart 16534K, Sartorius, Germany), and immediate storage at −40 °C until required for analysis. Urine samples were prepared following established protocols for NMR metabolome analysis.19 Urine samples were defrosted, vortex mixed (30 s, RT), and centrifuged (16000 g, RT, 10 min) to remove particulate matter. To provide sufficient total sample, for each collection period, 450 μL of each rat urine sample was pooled (total volume 3150 μL per collection period). The three urines (two pooled rat urine, one human spot urine) were mixed according to a {3,3} simplexlattice mixture design (Table 1), with each mixed sample having a volume of 300 μL. These mixed samples were then buffered by the addition of 300 μL sodium phosphate buffer (pH 7.4, 0.2 M, 80:20 H2O:D2O (v/v)) containing sodium 3-(trimethylsilyl)[2,2,3,3-2H4]propionate (TSP, 1 mM). Samples were vortex mixed (30 s, RT), and a 550 μL aliquot transferred to a 96-well autosampler plate. The mixed samples were prepared in triplicate from the pooled rat samples and the human spot urine. The preparation order was randomized. D
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
(final resolution 0.29 Hz/pt) prior to analysis using in-house scripts running in the Matlab (The Mathworks, Natick) computing environment. Metabolite Quantification. Fifty-four metabolites were quantified using both the spectral fitting approach and a targeted spectral integration approach. Spectral fitting was performed using Chenomx NMR Suite 4.6 (Chenomx Inc., Edmonton, Canada); reference spectra from the Chenomx 600 MHz library were combined so as to best approximate each acquired urine spectrum, and the relative concentrations of each metabolite present determined by reference to the internal TSP standard15 (Table 2). For the targeted spectral integration approach, spectral regions were defined for each of the metabolites of interest (Table 2), with a width sufficient to encapsulate the majority of the peak across the entire set of spectra (determined manually by spectral overlay). The integral area of these regions was calculated using an in-house routine in Matlab. Probabilistic quotient normalization21 (PQN) was applied to remove variation originating from intersample differences in urinary dilution. Chemometric analysis of metabolite concentration data was completed using Simca P+12 (Umetrics, Umea, Sweden). Principal component analysis (PCA, using metabolite concentration data) was conducted. Partial least-squares regression (PLS, using metabolite concentration data and the experimental design matrix) allowed goodness-of-fit (R2) and goodness-of-prediction (Q2) estimates to be made for each metabolite.22
■
RESULTS AND DISCUSSION Quantification of Metabolites Using NMR. As detailed in the Materials and Methods section, three different urines were mixed according to a {3,3} simplex lattice design, and analyzed in triplicate by 1H NMR spectroscopy. Representative spectra are shown in Figure 2. A total of 54 metabolites were successfully quantified in these samples using both a targeted spectral fitting approach (involving the fitting of individual reference metabolite spectra to the spectra acquired for each sample mixture), and a targeted spectral integration approach (involving the integration of a representative spectral region). Metabolite data are given in Table 2. Comparison of the metabolite concentrations of metabolites across the three samples containing only one of the three urine components (i.e., the corners of the simplex lattice design) showed considerable variation between the rat and human samples (Supporting Information Figure S1). Some metabolites were present or absent in only one of the samples (2-hydroxyisobutyrate, 2-oxoisocaproate, fumarate, N,N-dimethylglycine, oxaloacetate, tryptophan). Other metabolites spanned up to 2 orders of magnitude in their absolute concentration in these samples; those with the greatest variation included 1-methylnicotinamide, 1,3-dimethylurate, betaine, and allantoin. PCA/PLS Modeling and Effect of Normalization. Principal component analysis (PCA) and partial least-squares (PLS) regression are widely used multivariate analysis tools based on latent variable methods.23,24 For each quantification approach, the metabolite concentrations (X-matrix) were modeled by PCA in an unsupervised manner, and also modeled against the experimental mixture design (Ymatrix) using PLS. PCA of the data set before and after PQN was conducted and showed differences in the variation captured by the two methods of quantification (Supporting Information, Figure S2). Prior to normalization, the largest variation in each data set was
Figure 3. Partial least-squares regression analysis scores plots indicating latent structure of the TSP normalized data (A) spectral integration data set and (B) spectral fitting data set. It can be seen that the samples recapitulate the {3,3} simplex-lattice mixture design in the score space. Samples (triplicates) are colored according to proportions of the three component urines (Table 1).
attributable to the sample dilution. Upon normalization, the next biggest variation in the spectral fitting data set was revealed to be the mixture design, whereas in the spectral targeting data set, it was related to nuisance variation, driven by outlying samples, and attributable to inferior water suppression. PLS analysis was used to assess the fit of the relative metabolite variation to the mixture design. The component scores for the PLS models (Figure 3) clearly showed the experimental design as anticipated. Switching the X and Y matrices of the PLS model also allowed a calculation of the goodnessof-fit (R2) and goodness-of-prediction (Q2) values for each metabolite against the {3,3} simplex-lattice design (Figure 3 and Table 2). It can be seen that a large proportion have high Q2 values indicating that the latent structure of these data follow that of the mixture proportions. It was recognized that in these models, overall urinary dilution would have the effect of artificially enhancing some of these values as a consequence of introducing structure into the concentration profiles. PQN concentrations were subsequently modeled and indicated that once the global dilution factor was removed from the concentration data, several metabolites displayed greatly reduced Q2 values. Removing the dominating dilution difference means that the Q2 statistic reported for each metabolite gives a more realistic representation of the fit to the design of the metabolite response, in the presence of a variable background. There are several reasons that explain low Q2 value including (a) changes in chemical shift as a consequence of pH variation, E
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
(b) the absence of a metabolite in one or more of the component samples (in this case the rat and human urines), (c) low s/n in the measurement, and (d) overlap of spectral features. Where identified during analysis, these influences are indicated (Table 2).
Comparison of the PQN data revealed that several metabolites exhibited high Q2 values in PLS models from both quantification approaches, as shown in Figure 4. Of these, the highest were N,N-dimethylglycine, succinate, transaconitate, and 2-oxoglutarate. The Q2 value for several
Figure 4. continued F
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
Figure 4. Goodness-of-fit (R2, green bars) and goodness-of-prediction (Q2, blue bars) metrics generated across the {3,3} simplex-lattice mixture design data for 54 metabolites for (A) spectral fitting data set normalized to TSP, (B) spectral integration data set normalized to TSP, (C) spectral fitting data set normalized using PQN, and (D) spectral integration data set normalized using PQN.
In summary, metabolites artificially well modeled as a consequence of the urinary dilution factor may be shown to be poorly modeled following normalization (e.g., an overlapped peak on a variable background). Conversely, metabolites apparently poorly modeled may have their concentration structure across the experimental design revealed (e.g., a small peak on a variable background). Metabolite resonances in spectral regions with little background variation, and that are well resolved should be well modeled and exhibit a high Q2 value. It should be noted
metabolites was substantially different when comparing the two methods. Those performing well only in the targeted spectral fitting approach included 1-methylnicotinamide, trigonelline, and tyrosine. Conversely, those performing well only in the targeted integration approach included N,N-dimethylformamide, allantoin, 3-indoxysulfate, dimethylamine, malonate, guanidinoacetate, and ethanol. Oxaloacetate, phenylacetylglycine, 1,6-androβ-D-glucose, and glucose exhibited low, or subzero Q2 values in both models. G
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
Article
Eli Lilly & Co. and NovoNordisk - for the provision of rat urine samples.
that metabolites that are invariant across all the component samples might artificially appear to be poorly quantified as a consequence of their low difference in signal relative to the background noise. In this study, we deliberately included samples of the same type (urine), but of varying similarity (rat (day) vs rat (night) vs human) to produce a spectral set with contrasting metabolite concentrations and background matrix effects. We chose a simple mixture design as an exemplar of the approach, but other designs are possible. For example, this might have particular value when substantial changes in the background/matrix are expected to change between the samples in each class (e.g., toxicological interventions that result in proteinuria, clinical samples containing high concentrations of treatment excipients). Samples pooled according to class and titrated to give linear combinations with known proportions would characterize the sample compositional space between these classes. In practice, the approach described could be adapted for use in larger sample sets; post hoc selection of samples that are identified by their profiles as being at contrasting extremes could be used in this way to characterize the individual metabolite behavior (linearity and matrix effect in relation to a varying background) across the sample compositional space.
■
(1) Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica 1999, 29, 1181−1189. (2) Nicholson, J. K.; Connelly, J.; Lindon, J. C.; Holmes, E. Nat. Rev. Drug Discovery 2002, 1, 153−161. (3) Nicholson, J. K.; Lindon, J. C. Nature 2008, 455, 1054−1056. (4) Keun, H. C. Biomarker discovery for drug development and translational medicine using metabonomics. Oncogenes Meet Metabolism: From Deregulated Genes to a Broader Understanding of Tumour Physiology; Kroemer, G., Mumberg, D., Keun, K., Riefke, B., StegerHartman, T., Petersen, K., Eds.; Springer: Berlin, 2008; Vol. 4, pp 79− 98. (5) Keun, H. C.; Athersuch, T. J. Pharmacogenomics 2007, 8, 731− 741. (6) Nicholson, J. K.; Holmes, E.; Kinross, J. M.; Darzi, A. W.; Takats, Z.; Lindon, J. C Nature 2012, 491, 384−392. (7) Holmes, E.; Loo, R. L.; Stamler, J.; Bictash, M.; Yap, I. K.; Chan, Q.; Ebbels, T.; De Iorio, M.; Brown, I. J.; Veselkov, K. A.; Daviglus, M. L.; Kesteloot, H.; Ueshima, H.; Zhao, L.; Nicholson, J. K.; Elliott, P. Nature 2008, 453, 396−400. (8) Athersuch, T. J. Bioanalysis 2012, 4, 2207−2212. (9) Keun, H. C.; Ebbels, T. M.; Antti, H.; Bollard, M. E.; Beckonert, O.; Schlotterbeck, G.; Senn, H.; Niederhauser, U.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Chem. Res. Toxicol. 2002, 15, 1380−1386. (10) Lindon, J. C.; Keun, H. C.; Ebbels, T. M. D.; Pearce, J. M. T.; Holmes, E.; Nicholson, J. K. Pharmacogenomics 2005, 6, 691−699. (11) Forgue, P.; Halouska, S.; Werth, M.; Xu, K.; Harris, S.; Powers, R. J. Proteome Res. 2006, 5, 1916−1923. (12) Holmes, E.; Nicholson, J. K. Ernst Schering Found. Symp. Proc. 2007, 227−249. (13) Ellis, J. K.; Athersuch, T. J.; Thomas, L. D.; Teichert, F.; PerezTrujillo, M.; Svendsen, C.; Spurgeon, D. J.; Singh, R.; Jarup, L.; Bundy, J. G.; Keun, H. C. BMC Med. 2012, 10, 61. (14) Veselkov, K. A.; Lindon, J. C.; Ebbels, T. M.; Crockford, D.; Volynkin, V. V.; Holmes, E.; Davies, D. B.; Nicholson, J. K. Anal. Chem. 2009, 81, 56−66. (15) Weljie, A. M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C. M. Anal. Chem. 2006, 78, 4430−4442. (16) accessed January 2013. (17) Athersuch, T. J.; Keun, H.; Tang, H.; Nicholson, J. K. J. Pharm. Biomed Anal. 2006, 40, 410−416. (18) Ebbels, T. M. D.; Keun, H. C.; Beckonert, O. P.; Bollard, M. E.; Lindon, J. C.; Holmes, E.; Nicholson, J. K. J. Proteome Res. 2007, 6, 4407−4422. (19) Beckonert, O.; Keun, H. C.; Ebbels, T. M.; Bundy, J.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Nat. Protoc. 2007, 2, 2692−2703. (20) Neuhaus, D.; Ismail, I. M.; Chung, C. W. J. Magn. Reson. Series A 1996, 118, 256−263. (21) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Anal. Chem. 2006, 78, 4281−4290. (22) Keun, H. C.; Ebbels, T. M. D.; Antti, H.; Bollard, M. E.; Beckonert, O.; Holmes, E.; Lindon, J. C.; Nicholson, J. K Anal. Chim. Acta 2003, 490, 265−276. (23) Wold, S.; Esbensen, K.; Geladi, P. Chemometr. Intell. Lab. 1987, 2, 37−52. (24) Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J. Siam J. Sci. Comput. 1984, 5, 735−743.
■
CONCLUSION The approach we report combines the use of sample mixing to encode sample spectra according to a known experimental design, with multivariate analysis that allows the theoretical and observed responses to be compared. We propose that a Q2 statistic is a suitable index with which to make this comparison. This statistic provides an unbiased estimate of how reliable the quantification of a particular spectral feature is across the sample compositional space, and thus which can be safely interpreted from the urinary data. We found PQN suitable to remove nuisance variation attributable to gross sample dilution, and this procedure helped reveal the variation of interest, that related to the experimental design. Broad agreement between targeted spectral fitting and targeted spectral integration approaches was observed, but differences in the response of metabolites with peaks in overlapped or baseline-dominated spectral regions. This approach, which efficiently exploits the information contained in several samples simultaneously, has general applicability, can be used as an additional metric for profile quality assessment when conducting biomarker discovery research using spectroscopic platforms. We suggest that the method offers good complementarity to measures of analytical reproducibility obtained by replicate analysis of individual samples.
■
ASSOCIATED CONTENT
S Supporting Information *
Additional material as described in the text. This material is available free of charge via the Internet at http://pubs.acs.org.
■
REFERENCES
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS The authors wish to acknowledge the Consortium of Metabonomic Toxicology (COMET) - comprising BristolMyers-Squibb, Hoffman-La Roche Pharmaceuticals, Pfizer Inc., H
dx.doi.org/10.1021/ac400449f | Anal. Chem. XXXX, XXX, XXX−XXX