Exploring Matrix Effects and Quantification Performance in

Yokohama, Kanagawa, 230-0045, Japan. Graduate School of Pharmaceutical Sciences, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba, 263-8522 Japan...
2 downloads 0 Views 863KB Size
ARTICLE pubs.acs.org/ac

Exploring Matrix Effects and Quantification Performance in Metabolomics Experiments Using Artificial Biological Gradients Henning Redestig,*,†,‡ Makoto Kobayashi,† Kazuki Saito,§,† and Miyako Kusano*,† †

RIKEN Plant Science Center, Tsurumi-ku, Suehiro-cho, 1-7-22 Yokohama, Kanagawa, 230-0045, Japan Graduate School of Pharmaceutical Sciences, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba, 263-8522 Japan

§

bS Supporting Information ABSTRACT: Metabolomics has become an integral part of many lifescience applications but is technically still very challenging. Numerous analytical approaches are needed as metabolites have very broad concentration ranges and extremely diverse chemical properties. Configuring a metabolomics pipeline and exploring its merits is a complex task that depends on effective and transparent evaluation procedures. Unfortunately, there are no widely applicable methods to evaluate how well acquired data can approximate actual concentration differences. Here, we introduce a powerful approach that provides semiquantitative calibration curves over a biologically defined concentration range for all detected compounds. By performing metabolomics on a stepwise gradient between two biological specimens, we obtain a data set where each peak would ideally show a linear dependency on the mixture ratio. An example gradient between extracts of tomato leaf and fruit demonstrates good calibration statistics for a large proportion of the peaks but also highlights cases with strong background-dependent signal interference. Analysis of artificial biological gradients is a general and inexpensive tool for calibration that greatly facilitates data interpretation, quality control and method comparisons.

M

etabolomics aims to quantify highly diverse molecules in extremely broad concentration ranges; this is a very difficult task1 that requires a wide array of analytical approaches. 2,3 Modern metabolomics platforms, such as mass spectrometry (MS) coupled to separation techniques, can be used to quantify a wide variety of molecules. However, as is the case for all instruments, the concentration ranges where quantification is meaningful are limited. Outside these dynamic ranges, readings are either merely noise or locked at a saturated level. For single compounds, dynamic ranges can be estimated by calibration using authentic standards. Unfortunately, the signals from many metabolites can be influenced by other interfering metabolites through several different mechanisms, collectively known as matrix effects.4,5 The gravity of this problem is easily realized when considering a situation where the signal of a given metabolite is affected by the concentration of a different metabolite with fluctuating concentration. In this scenario, the first metabolite will appear to change even if its concentration is stable. The presence of matrix effects is an important reason why quantification of an authentic standard is very different from untargeted analysis of metabolites in an actual biological sample.2 Matrix effects are present on all major platforms including gas chromatographymass spectrometry (GC-MS),68 liquid chromatographymass spectrometry,5,9 and capillary electrophoresismass spectrometry4 raising concerns over how well calibration using authentic standards can generalize to the metar 2011 American Chemical Society

bolomics scenario—particularly in cases where the metabolome undergoes massive changes such as during fruit ripening10 or severe stress responses.11 For this reason, it is highly desirable to evaluate quantification performance in as realistic a setting as possible. In the absence of reference biological samples with known metabolite concentrations, performance evaluation is often done using mixtures of hundreds of standards in known concentrations.12,13 Although this approach provides an accurate ‘gold standard’ to compare with, the mixture complexity and concentration ranges can never match those of actual biological samples and matrix effects are therefore underestimated. A complementary approach that addresses this problem is the analysis of standard mixtures in the presence of a background biological sample.14 Nevertheless, both these approaches require the laborious and expensive preparation of standard mixtures and only evaluates the quantification performance of a select set of known metabolites. Furthermore, they do not consider matrix effects in a changing metabolome, as is the case in many biological studies. The technical complexity and limited conclusiveness of reference mixture evaluation have led many authors to resort to qualitative evaluation by examining how well data from actual experiments overlap with expected patterns.15,16 However, Received: March 29, 2011 Accepted: June 1, 2011 Published: June 01, 2011 5645

dx.doi.org/10.1021/ac200786y | Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry

ARTICLE

peaks can be compared and data preprocessing methods evaluated by studying their influence on calibration curve statistics. Analysis of artificial biological gradients does not require expensive standards or isotope labeling experiments and is therefore a practical and highly accessible tool for biologically defined calibration for all matrices and platforms. Our results open up a wide range of new opportunities for quality control, method improvement, method comparison, data integration and data interpretation with broad utility in metabolomics applications.

’ EXPERIMENTAL SECTION

Figure 1. Artificial biological gradients and their uses for evaluating quantification performance. When the fruit (X) samples are mixed with increasing proportions of the leaf (Y) sample (a), peaks from well-detected metabolites that are more abundant in fruit than in the leaf (X > Y) will decrease and vice versa (Y < X). Peaks from equally abundant metabolites (X = Y) do not change (b). Modifications to experimental protocols and data preprocessing can be evaluated by studying how they influence peak behavior over the gradient (c). Peaks that are severely affected by matrix effects can be identified by testing for complex patterns (d).

as we show here, clustering of biological replicates does not always imply accurate data. To summarize, the standing problem is that to evaluate performance, realistic gold standards are necessary but not currently available. However, if relative quantification is sufficient, then expressing concentration as the percentage difference between two biological specimens provides a way around this obstacle. Comparing two strictly different specimens, most metabolites will not be present in identical concentrations in both. Preparing a stepwise gradient between the two specimen, we thus obtain a set of samples for which the resulting abundance estimates should monotonously decrease or increase relative to the mixture ratios (Figure 1a). By analyzing such an artificial biological gradient, we obtain calibration data over the relative concentration difference between the two specimens for all peaks. The stepwise concentration changes will affect a vast proportion of the metabolites, and relevant matrix effects will cause signal diminishments, amplification or possibly complex updown patterns. Quantification performance can thereby be evaluated by investigating how clearly a monotonous calibration curve becomes visible or, in other words, how well the concentration differences between the end-point samples are captured (Figure 1bd). Here, we demonstrate our approach by analyzing an artificial gradient between leaf and fruit samples from tomato as a case example. We show that quantification performance becomes highly transparent, enabling peaks to be categorized as well-detected or problematic by fitting a standard calibration curve model over the gradient. Quantification performance for different categories of

Plant Materials. Seeds from tomatoes (Solanum lycopersicum, cv. Reiyo) were sown in pots (volume, 2 L) with rockwool (Nittobo, Tokyo, Japan) and grown in a hydroponics system with a nutrient solution containing N, P, and K at 122, 21, and 156.6 mg/L, respectively (Otsuka Chemical, Osaka, Japan), in a growth chamber at 25 °C/20 °C (light/dark) and 900 ppm CO2 concentration with a light/dark cycle of 16 h/8 h at Chiba University, Matsudo, Japan. Photosynthetic photon flux (PPF) level in the growth chamber was adjusted to 450 500 pmol m2 s1 when we measured at the meristem of each tomato plant (light source: Ceramic metal halide lamps). Subirrigation was applied twice a day with the nutrient solution, and plant material was harvested three weekdays after flowering in December 2009. Extraction and Derivatization for GC-TOF-MS. We used GCelectron ionizationtime-of-flightMS (GC-TOF-MS) for metabolomics analysis. Similar to our previously described procedure,17 each sample was extracted with a concentration of 2.5 mg dry weight (DW) of tissues per ml extraction medium [methanol/chloroform/water (3:1:1 (v/v/v))] containing 10 stable isotope reference compounds: [2H4]-succinic acid, [13C5,15N]-glutamic acid, [2H7]-cholesterol,[13C3]-myristic acid, [13C5]-proline, [13C12]-sucrose, [13C4]-hexadecanoic acid, [2H4]-1,4-butanediamine, [2H6]-2-hydoxybenzoic acid, and 13C6]-glucose. These internal standards were used to normalize the data using cross-contribution compensating mutipl standard normalization (CCMN).18 Each isotope compound was adjusted to a final concentration of 15 ng/μL for each 1 μL injection. After centrifugation, a 200 μL aliquot of the supernatant (∼0.5 mg of DW of each sample) was drawn and transferred into a glass insert vial for a pilot experiment. We mixed leaf extracts (at the second internode of the second truss) and fruit extracts (mixture of pericarp and jelly/ seed) for a gradient experiment. The percentages of leaf:fruit mixture extracts are given in Supporting Information Table 1. The extracts were evaporated to dryness in an SPD2010 SpeedVac concentrator from ThermoSavant (Thermo Electron Corporation, Waltham, MA, USA). For methoximation, 30 μL of methoxyamine hydrochloride (20 mg/mL in pyridine) was added to the sample. After 24 h derivatization at room temperature, the sample was trimethylsilylated for 1 h using 30 L MSTFA (Tokyo Chemical Industry, Tokyo, Japan) at 37 °C with shaking. A 30 μL aliquot of n-heptane was added following silylation. All derivatization steps were performed in a VSC-100 vacuum glovebox (Sanplatec, Japan) filled with 99.9995% (G3 grade) dry nitrogen. GC-TOF-MS Conditions. For metabolome analysis, 1 μL of extracts (∼5.6 μg of each sample) was injected in the splitless mode by a CTC CombiPAL autosampler (CTC Analytics, Zwingen, Switzerland) into an Agilent 6890N gas 5646

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry chromatograph (Agilent Technologies, Wilmington, DE, USA) equipped with a 30 m  0.25 mm inner diameter fused-silica capillary column with a chemically bound 0.25 μL film Rtx-5 Sil MS stationary phase (Restek, Bellefonte, PA, USA). Helium was used as the carrier gas at a constant flow rate of 1 mL min1. The temperature program for metabolome analysis started with a 2 min isothermal step at 80 °C and this was followed by temperature ramping at 30 °C to a final temperature of 320 °C, which was maintained for 3.5 min. The transfer line and the ion source temperatures were 250 and 200 °C, respectively. Ions were generated by a 70 eV electron beam at an ionization current of 2.0 mA. The acceleration voltage was turned on after a solvent delay of 237 s. Data acquisition was performed on a Pegasus IV TOF mass spectrometer (LECO, St. Joseph, MI, USA) with an acquisition rate of 30 spectra s1 in the mass range of a mass-to-charge ratio of 60 E m/z E 800. Alkane standard mixtures (C8C20 and C21C40) were purchased from Sigma-Aldrich (Tokyo, Japan) and were used for calculating the retention index (RI).19,20 The normalized response for the calculation of the signal intensity of each metabolite from the mass-detector response was obtained by each selected ion current that was unique in each metabolite MS spectrum to normalize the peak response. For quality control, we injected methylstearate into every sixth sample. Data Preprocessing. Nonprocessed MS data from GC-TOFMS analysis were exported in NetCDF format generated by chromatography processing and mass spectral deconvolution software, Leco ChromaTOF version 3.22 (LECO, St. Joseph, MI, USA) to MATLAB 7.0 (Mathworks, Natick, MA, USA), where all data pretreatment procedures, such as smoothing, alignment, time-window setting and H-MCR were carried out.12 The resolved mass spectra were matched against reference mass spectra using the NIST mass spectral search program for the NIST/EPA/NIH mass spectral library (version 2.0) and our custom software for peak annotation written in Java. Peaks were identified or annotated based on RIs and the reference mass spectra comparison from the Golm Metabolome Database (GMD, http://csbdb.mpimp-golm.mpg.de/csbdb/ gmd/msri/gmd_msri.html) released from CSB.DB21 and our in-house spectral library. The metabolites were identified and defined as annotated metabolites by comparison with RIs from the library databases (GMD and our own library) and with those of authentic standards and mass spectra from these two libraries. Data were normalized using the CCMN algorithm18 and metabolite identifiers were organized using MetMask.22 Data Analysis. When designing the gradients, we aimed to find a pair of samples that were balanced in the sense that one set of metabolites was higher in sample a and another set was higher in sample b. We estimated how differentially large a metabolite peak was between two plain samples as follows. Standard deviations and averages of peak areas, y, were estimated by linear interpolation functions of concentration, c, ya(c), yb(c), sa(c), and sb(c), using GC-TOF-MS data for the replicated plain samples diluted at 1:0, 1:1, and 1:100 (v/v). We quantified how differentially large peak i was between a and b at concentrations ca and cb by ya ðca Þ  yb ðcb Þ ffi di ðca , cb Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 ðca Þa =na  s2 ðcb Þb =nb where na = nb = 3 because we used three replicates. As d is a t-statistic, it is high for peaks that are larger in a than b, low

ARTICLE

(negative) for peaks that are larger in b than a and close to zero for peaks with no differences. We then calculated an overall statistic for how balanced a and b were by Sðca , cb Þ ¼ minðmedianijdðca , cb Þ > 0 di ðca , cb Þ, jmedianijdðca , cb Þ < 0 di ðca , cb ÞjÞ ð1Þ We evaluated S(ca,cb) for 20 (linearly interpolated) logarithmic concentration steps between 0.001 and 1. Five-parameter log-logistic models (5-LL) were fitted by means of nonlinear least-squares regression23 using the drc software package v2.0-124 with default settings. The 5-LL model is defined as yðcÞ ¼ g +

hg 1 + exp½bðlog c  log eÞf

ð2Þ

over the gradient from c = 0% leaf (100% fruit) to c = 100% leaf (0% fruit). Loess curves were fitted with a second degree polynomial over two-thirds of the data. For both 5-LL and Loess models, R2 is defined as nc P

R ¼ 1 2

i¼1

ðyi  ^yi Þ2 nc P

i¼1

ð3Þ yi 2

where yi is the mean-centered peak area. Peaks were called differentially abundant where the Student’s t-test nominal p-value was < 0.05. Multiple testing correction was not performed because a small increased ratio of false positives was not a concern in this study. All data analysis was performed using R v2.12.1.25 Both raw and final preprocessed data from this experiment are available at http://prime.psc.riken.jp/?action=drop_index.

’ RESULTS Gradient Design. Calibration curves are only informative for the concentration range they cover. In the metabolomics scenario, the matrix effect adds another dimension to this well-known condition causing the quantification performance to be dependent on the overall composition of the considered matrix. When designing a gradient for performance evaluation, we therefore need to consider two main aspects: (i) mixing the two samples should result in a matrix that has comparable complexity with those expected in actual biological experiments and (ii) the total metabolite concentration should be similar in both samples and still yield as broad concentration ranges as possible for as many peaks as possible. Because we cannot control the composition of the biological samples, we addressed the first point by choosing two widely different samples that we have an actual interest in comparing. In a future project, we aim to study sink-source relationships in plants where metabolites are translocated from one type of tissue to another,26 causing strong differences between them. In a step to prepare for these experiments, we here consider a gradient between tomato leaf (predominant source tissue) and fruit samples (predominant sink tissue) and use the reference technology GC-TOF-MS27 as the analytical platform. For the second point, we optimized the detailed configuration of the gradient by performing a dilution-series experiment. 5647

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry

Figure 2. Optimization of gradient design by comparing fruit and leaf samples. Shown is the statistic S for pairs of tissues at concentrations 0.001 to 1 (arbitrary unit). A higher S value indicates that the sample pair is balanced and more suitable for use in an artificial gradient. We see that pc vs leaf3 at high concentrations (close to 1) gives the best results: pc, pericarp; js, jelly/seed; leaf1, leaf at the first internode of the second truss; leaf3, leaf at the third internode of the second truss.

Having decided on using fruit and leaf samples, we still had different options in terms of which leaf and what part of the fruit to use. We considered leaves sampled at the first and third internode (leaf1 and leaf3), and pericarp (pc) and the jelly/seed part (js) from fruits. Large total concentration differences between the chosen samples must be avoided in order to obtain a balanced gradient with concentration differences from both samples visible over the whole gradient. To account for the possibility of discrepancies in the total metabolite concentrations between fruit and leaf, we profiled all samples using GC-TOFMS after dilution with solvent in proportions of 1:0, 1:9, and 1:99 (v:v). To estimate how balanced a sample pair is, we defined a statistic, S(Ca,Cb) (1). The clearest results are expected when half the metabolites are higher in sample a and the other half are higher in sample b. The proposed statistic S is proportional to the number of differentially high peaks in the sample with the smallest number of such peaks. By evaluating S for different combinations of concentrations, Ca and Cb, we were able to study the effect of choosing different end-point concentrations and tissues. We used linear interpolation to estimate S(Ca,Cb) for a total of 20 concentration steps and visualize the results as heat maps (Figure 2). The results indicate that pc and leaf3 at high concentrations (close to 1) gives the highest S-values (Figure 2). This combination results in 62 of the 191 peaks (32%) being higher in pericarp, 48 (25%) peaks being higher in leaf and the 42% remaining peaks being invariant (Student’s t-test, p < 0.05). Performance Evaluation Using an Artificial Biological Gradient. Having identified suitable leaf and fruit samples, we prepared a gradient of 11 steps [10:0, 9:1, 8:2, ..., 2:8, 1:9, 0:10 (v/v)] (Supporting Information Table 1). The samples were analyzed in three analytical replicates using GC-TOF-MS resulting in the detection of 224 peaks of which 83 peaks could be unambiguously identified using a reference spectral library. The total ion chromatograms (TICs) of the plain and mixed samples were visibly different but also exhibited many common

ARTICLE

features (Supporting Information Figure 1). Similar to the pilot experiment, 83 peaks (37%) were significantly higher in pericarp than leaf, and 42 peaks (19%) were higher in leaf (Student’s t-test, p < 0.05). The ideal response for each metabolite over the gradient is a direct linear dependency on the mixture proportions. However, this will frequently not be the case as not all components of a biological sample can be quantified accurately at all concentrations simultaneously. For most biological applications, detecting approximate differences is sufficient, and for this reason, responses are acceptable as long as they fit a clear monotonous, although possibly asymptotic, profile. The 5-LL model, defined in (2), is commonly used for such curves. Patterns that are not desirable, such as updown patterns, which would indicate strong matrix effects, cannot be approximated by this model. These, rather, are easily modeled by Loess regression, which can capture any smooth pattern. To evaluate the quantification performance, we fit both 5-LL and Loess models to each of the detected peaks and estimate the overall fit using the ordinary R2 statistic (3). By comparing the two models, we were able to provide a direct overview of the relative quantification performance for all detected peaks (Figure 3a). Peaks with high R5-LL2 and RLoess2 were wellmeasured, peaks with low R5-LL2 but high RLoess2 were probably suffering from strong matrix effects, and peaks with low R2 for both models were either concentration invariant or subject to strong noise. Fructose and histidine had high RLoess2 and low R5-LL2 (Figure 3b,c). These peaks showed a complex behavior and although there were clear differences between the plain samples, the gradient revealed that quantification performance was poor in this particular matrix when using automated data-processing. The fructose level was high in tomato fruits and TIC of the fructose peak looked saturated (Supplementary Figure 1 in File S1). In the case of histidine, manual inspection revealed that this was due to an unknown coeluting compound that was more abundant in leaf than fruit. Calculation of peak area using specific ion peaks (m/z 277 and 307 for fructose and m/z 154, 254, and 356 for histidine) resulted in acceptable patterns that showed that both compounds are actually most abundant in fruit (Figure 3d,e). Trans-caffeic acid showed no dependency on mixture proportions but still a strong coefficient of variation (21%). This result indicates poor detection performance rather than invariant concentrations (Figure 3f). Proline, however, was also mixtureproportion-independent but had a very low coefficient of variance (0.6%) suggesting that the abundance of proline was very similar between leaf and fruit (Figure 3g). These examples aside, 77 of the 125 peaks that were different between the plain samples showed calibration curve characteristics with R5-LL2 > 0.75. The presented assessment includes unknown compounds (Figure 3h, unknown metabolite with monosaccharide moiety), something that is not possible using standard mixtures. Maltose was found to be clearly higher in leaf than in fruit but showed a diminishing response over the gradient (Figure 3i). This is interesting because the maltose peak was very small and clearly not saturated indicating that the diminishing response might have been a result of matrix effects. Comprehensive Estimation of Quantification Performance Facilitates Data Interpretation. We investigated how the fit to the 5-LL model varied between different types of metabolites and peaks. For the known metabolites, we 5648

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry

ARTICLE

Figure 3. Model comparison for evaluating performance quantification. (a) The R2 values quantify how well the gradient profile of each peak is approximated by a Loess model (any regular pattern) or the five-parameter log-logistic (5-LL; typical asymptotic calibration curve). (b, c) Compounds that are obscured by matrix effects show irregular gradient profiles (origin set to zero for comparison of variation levels). In this case, these could be better approximated by manual data analysis using specific masses (d, e). Peaks that do not show any trends over the gradient have concentration differences between leaf and fruit outside the reliable detection range of the instrument (f, g). Well-detected compounds that are more abundant in either sample show monotonous profiles (h, i).

Figure 4. Interpreting the performance estimates. (a, b) For the annotated peaks, the number of H-bond donors and the distribution coefficient, logD, of the metabolites are correlated to their fit to the calibration curve (F = 0.28, p = 0.009 and F = 0.40, p = 0.017, respectively). The R5-LL2 statistics are shown Fisher-transformed (tanh1) for the purpose of visualization. (c) The annotated peaks have higher R5-LL2 values than the unknown peaks. (d) The number of differentially large peaks (Student’s t-test, p < 0.05) in fruit and leaf as well as the number of these peaks that also fit a calibration curve pattern with R5-LL2 g 0.75.

downloaded eight nonredundant physicochemical-properties from the ChemSpider database and examined their rank correlation with the R5-LL2-statistic (Figure 4). Of these, the number of H-bond donors per molecule [F = 0.28, p = 0.009, and false discovery rate (FDR) = 0.045] as well as the distribution coefficient between octanol and water logD at pH 7.4 (F = 0.40, p = 0.017, and FDR = 0.045) were significantly correlated (Figure 4a,b). Furthermore, we observed that unknown peaks have considerably lower R5-LL2 values (Figure 4c). The varying reliability of the different peaks has important consequences for biological interpretation. In actual experiments,

we generally do not mix samples, but the effect of varying matrices can still influence the conclusion, as seen in Figure 3c. In the leaffruit gradient we present here, most peaks showed the expected behavior of a monotonous reduction when being diluted with a sample in which they are smaller; however, it is noteworthy that ∼38% of the peaks did not (Figure 4d). Method Comparison. The R5-LL2-values provide a direct way to examine the performance of different preprocessing methods such as normalization algorithms. We recently introduced CCMN that performs multivariate data correction.18 Looking at the distributions of the R5-LL2 statistics calculated using the raw 5649

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry

Figure 5. Comparison of normalization methods. The CCMN algorithm generates data with the highest R5-LL2 values and is therefore the preferred normalization algorithm for this data set.

data, after CCMN, after normalization with one internal standard (a widely used approach), normalization to the median,28 or normalization to all peak areas (by scaling each chromatogram so that the square sum of all peaks equals one),29 we noted that CCMN clearly outperform the other approaches and is therefore the preferred algorithm for this data set (Figure 5).

’ DISCUSSION The availability of a widely applicable method for broad evaluation of metabolomics pipelines is an important advancement with numerous applications. Here, we presented a gradient between tomato leaf and fruit samples and showed how it can be used to gain insight into the information content of all detected peaks. Using the 5-LL calibration model, we identified the peaks that correlate well with the mixture proportions (e.g., Figure 3h,i) and those that were problematic (Figure 3b,c) and needed manually curated data preprocessing (Figure 3d,e). The gradient was designed to mirror the situation in an actual comparison between sink and source tissues. The higher complexity and accompanying matrix effects for the mixed samples are therefore comparable to an actual biological experiment. That said, matrix complexity is inflated when samples are mixed, implying that the performance estimates presented here are conservative. The optimization of the gradient to yield both increasing and decreasing peaks over the gradient sets our approach aside from merely diluting a single sample with solvent, in which case, only matrix effects arising from the increasing amount of solvent would have been considered. This is the reason we chose the term gradient instead of dilution; each step in the mixture series results in both increases and decreases in concentration. The use of calibration models increases the transparency of the data, thereby facilitating the investigation of why certain types of peaks and metabolites are better quantified than others. For the metabolomics application, GC-TOF-MS works well on polar, hydrophilic molecules.30 Here, we demonstrate this by showing that molecules with many H-bond donors (high polarity) and a low octanol/water partitioning coefficient (hydrophilic) also tend to result in a better fit to the calibration model (Figure 4a,b). This type of insight is highly useful for understanding how the technology performs in a realistic scenario and might aid method improvement. The annotated metabolites behaved considerably better over the studied gradient than the unknown peaks (Figure 4c), presumably an effect of protocol optimization generally being

ARTICLE

done using only known compounds. With metabolomics data typically containing a vast proportion of unknown peaks, this result is important as it raises concerns regarding the usefulness of these peaks and indicates that stringent filtering of the data might be preferable. The analysis of artificial gradients provides a useful tool for optimizing quantification of unknown compounds and for identifying poorly detected peaks (Figure 4d). Removing such peaks increases the overall signal-to-noise ratio and thereby facilitates data interpretation. This is an important point for biological studies in general and biomarker studies in particular as peaks that cannot be measured in a reliable fashion might be misleading even if they are clearly differentially large between two studied biological samples (e.g., Figure 3a,b). In this study, we focused on the use of artificial gradients for a broad quantification performance assessment and therefore used biological samples that were highly different from one another. Gradients between more similar samples (e.g., different genotypes instead of tissues) will not give as comprehensive results because many metabolites will be present in similar concentrations. However, for comparison of competing methods, such as normalization algorithms (Figure 5), this is not a necessity as gradient responses can still be evaluated under the rationale that the method that gives a closer fit to the calibration curve model is better. Additionally, if standardized gradients are developed for specific types of biological matrices, we posit that objective comparison of different metabolomics pipelines will be possible; this might facilitate protocol optimization and their much needed standardization.31 Using 13C-labeled biological materials,32 artificial gradients might furthermore be used to identify the peaks that are both of biological origin and quantifiable. This approach might thereby provide listings of bona fide metabolites and those that are known to be problematic in a given matrix. Inverse calibration models can be used to express abundances as “estimated percentages of a given xy gradient”, and to obtain bootstrap confidence intervals. Although it was not within the scope of this study to show this, we speculate that these models can be used to integrate signals from two different analytical platforms in an automated fashion. A prospective application could be considering measurements on the same sample at two different concentrations. Using the calibration models from both data sets, consensus readings could then be defined as the average of the two models, weighted by their estimated precision, leading to an overall broader dynamic range without the need for manual peak selection.

’ ASSOCIATED CONTENT

bS

Supporting Information. Figure showing total ion chromatograms of plain fruit and plain leaf and table listing compositions of each step of the gradient. This material is available free of charge via the Internet at http://pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]; [email protected]. Present Addresses ‡

Bayer CropScience N.V., Technologiepark 38, 9052 Zwiijnaarde (Gent), Belgium.

5650

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651

Analytical Chemistry

’ ACKNOWLEDGMENT We are grateful to P. Jonsson, H. Stenlund (Umea University, Sweden), and T. Moritz (Umea Plant Science Centre) for sharing their software for GC-MS data pretreatment. We thank M. Hayakumo, S. Hikosaka, and E. Goto (Chiba University) for providing tomato samples and Y. Okazaki (RIKEN Plant Science Center) for fruitful discussions. A part of the study was supported by a project grant Elucidation of biological mechanisms of photoresponse and development of advanced technologies utilizing light from the Ministry of Agriculture, Forestry and Fisheries (MAFF). The authors declare that they have no competing financial interests.

ARTICLE

(26) Herbers, K.; Sonnewald, U. Curr. Opin. Plant Biol. 1998, 1, 207–216. (27) Kopka, J. J. Biotechnol. 2006, 124, 312–322. (28) Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H. Anal. Chem. 2003, 75, 4818–4826. (29) Crawford, L. R.; Morrison, J. D. Anal. Chem. 1968, 40, 1464– 1469. (30) Kusano, M.; Fukushima, A.; Redestig, H.; Saito, K. J. Exp. Bot. 2011, 62, 1439–1453. (31) Fiehn, O.; et al. Metabolomics 2007, 3, 175–178. (32) Giavalisco, P.; K€ohl, K.; Hummel, J.; Seiwert, B.; Willmitzer, L. Anal. Chem. 2009, 81, 6546–6551.

’ REFERENCES (1) Stitt, M.; Fernie, A. R. Curr. Opin. Biotechnol. 2003, 14, 136–144. (2) Fiehn, O. Plant Mol. Biol. 2002, 48, 155–171. (3) Saito, K.; Matsuda, F. Annu. Rev. Plant Biol. 2010, 61, 24.1–24.27. (4) B€uscher, J. M.; Czernik, D.; Ewald, J. C.; Sauer, U.; Zamboni, N. Anal. Chem. 2009, 81, 2135–2143. (5) Taylor, P. J. Clin. Biochem. 2005, 38, 328–334. (6) Erney, D.; Gillespie, A.; Gilvydis, D.; Poole, C. J. Chromatogr. 1993, 638, 57–63. (7) Mastovska, K.; Lehotay, S. J.; Anastassiades, M. Anal. Chem. 2005, 77, 8129–8137. (8) Frenich, A. G.; Vidal, J. L. M.; Moreno, J. L. F.; Romero-Gonzalez, R. J. Chromatogr., A 2009, 1216, 4798–4808. (9) Annesley, T. M. Clin. Chem. 2003, 49, 1041–1044. (10) Carrari, F.; Baxter, C.; Usadel, B.; Urbanczyk-Wochniak, E.; Zanor, M.-I.; Nunes-Nesi, A.; Nikiforova, V.; Centero, D.; Ratzka, A.; Pauly, M.; Sweetlove, L. J.; Fernie, A. R. Plant Physiol. 2006, 142, 1380–1396. (11) Cook, D.; Fowler, S.; Fiehn, O.; Thomashow, M. F. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 15243–15248. (12) Jonsson, P.; Johansson, A. I.; Gullberg, J.; Trygg, J.; A, J.; Grung, B.; Marklund, S.; Sj€ostr€om, M.; Antti, H.; Moritz, T. Anal. Chem. 2005, 77, 5635–5642. (13) Soga, T.; Ueno, Y.; Naraoka, H.; Ohashi, Y.; Tomita, M.; Nishioka, T. Anal. Chem. 2002, 74, 2233–2239. (14) B€ottcher, C.; Roepenack-Lahaye, E. V.; Willscher, E.; Scheel, D.; Clemens, S. Anal. Chem. 2007, 79, 1507–1513. (15) t’Kindt, R.; Morreel, K.; Deforce, D.; Boerjan, W.; Van Bocxlaer, J. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2009, 877, 3572–3580. (16) Gu, Q.; David, F.; Lynen, F.; Rumpel, K.; Dugardeyn, J.; Straeten, D. V. D.; Xu, G.; Sandra, P. J. Chromatogr., A 2011, 1218, 3056–3063. (17) Kusano, M.; Fukushima, A.; Kobayashi, M.; Hayashi, N.; Jonsson, P.; Moritz, T.; Ebana, K.; Saito, K. J. Chromatogr., B: Analyt. Technol. Biomed. Life Sci. 2007, 855, 71–79. (18) Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K.; Kusano, M. Anal. Chem. 2009, 81, 7974–7980. (19) Schauer, N.; Steinhauser, D.; Strelkov, S.; Schomburg, D.; Allison, G.; Moritz, T.; Lundgren, K.; Roessner-Tunali, U.; Forbes, M. G.; Willmitzer, L.; Fernie, A. R.; Kopka, J. FEBS Lett. 2005, 579, 1332–1337. (20) Wagner, C.; Sefkow, M.; Kopka, J. Phytochemistry 2003, 62, 887–900. (21) Kopka, J.; Schauer, N.; Krueger, S.; Birkemeyer, C.; Usadel, B.; Bergm€uller, E.; D€ormann, P.; Weckwerth, W.; Gibon, Y.; Stitt, M.; Willmitzer, L.; Fernie, A. R.; Steinhauser, D. Bioinformatics 2005, 21, 1635–1638. (22) Redestig, H.; Kusano, M.; Fukushima, A.; Matsuda, F.; Saito, K.; Arita, M. BMC Bioinformatics 2010, 11, No. 214. (23) Finney, D. Int. Stat. Rev. 1979, 47, 1–12. (24) Ritz, C.; Streibig, J. C. J. Stat. Software 2005, 12, 1–12. (25) R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2004. 5651

dx.doi.org/10.1021/ac200786y |Anal. Chem. 2011, 83, 5645–5651