New Figures of Merit for Comprehensive Functional Genomics Data

Analytical repeatability (intrabatch precision) is a common figure of merit for the measurement error of metabolites repeatedly measured in one batch ...
4 downloads 0 Views 2MB Size
ARTICLE pubs.acs.org/ac

New Figures of Merit for Comprehensive Functional Genomics Data: The Metabolomics Case Marinus F. Van Batenburg,†,‡ Leon Coulier,§ Fred van Eeuwijk,|| Age K. Smilde,† and Johan A. Westerhuis*,† †

)

Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1089 XH Amsterdam, The Netherlands ‡ Netherlands Metabolomics Centre, Einsteinweg 55, NL-2333 CC Leiden, The Netherlands § TNO, Utrechtseweg 48, 3704 HE Zeist, The Netherlands Biometris-Applied Statistics, Wageningen University and Research Centre, 6708 PB Wageningen, The Netherlands

bS Supporting Information ABSTRACT: In the field of metabolomics, hundreds of metabolites are measured simultaneously by analytical platforms such as gas chromatography/mass spectrometry (GC/MS), liquid chromatography/mass spectrometry (LC/MS) and NMR to obtain their concentration levels in a reliable way. Analytical repeatability (intrabatch precision) is a common figure of merit for the measurement error of metabolites repeatedly measured in one batch on one platform. This measurement error, however, is not constant as its value may depend on the concentration level of the metabolite. Moreover, measurement errors may be correlated between metabolites. In this work, we introduce new figures of merit for comprehensive measurements that can detect these nonconstant correlated errors. Furthermore, for the metabolomics case we identified that these nonconstant correlated errors can result from sample instability between repeated analyses, instrumental noise generated by the analytical platform, or bias that results from data pretreatment.

A

nalytical repeatability or precision obtained in the best possible circumstances (identical analyst, same equipment, and same laboratory) is generally used as a figure of merit for the reliable measurement of an analyte. Its value has impact on the estimation of treatment effects in randomized case-control studies.1 However, in functional genomics such as proteomics and metabolomics studies, in which responses of many compounds in a sample (comprehensive) are measured for a large number of samples, measurement error of a compound may vary due to many different error sources. A single value for repeatability as a figure of merit for such data therefore does not suffice. It is well-known that many analytical technologies such as liquid chromatography/mass spectrometry (LC/MS), gas chromatography/mass spectrometry (GC/MS), NMR, which are often used in metabolomics studies, provide measurement errors that are proportional to intensity level at large intensity levels but constant at low intensity levels.2 This can be observed when samples are analyzed repeatedly in the same measurement batch and the variance of each metabolite is estimated.3 The typical figure of merit to report such measurement errors is the RSD (relative standard deviation). This value, however, has limitations: it is sometimes too optimistic and it only quantifies measurement error for one analyte. The nonconstant measurement error of a metabolite is the result of many error sources. These will be illustrated by the r 2011 American Chemical Society

example of GC/MS (which is the platform used later on in the experimental part). First, in a GC/MS experiment, the number of ions is given as response of a metabolite. Such a response has a nonconstant error (Poisson statistics). Second, electronics generate different types of noises that contribute both to the constant (thermal noise) and nonconstant part (shot and flicker noise) of the measurement error. Third, the GC column deteriorates in the time period between the analyses of a sample (becomes less clean), increasing the background and hence uncertainty in the estimation of concentration of a metabolite. Fourth, certain compounds attach to the column as residues after analysis of a sample and they affect the follow up measurement of the same sample in the same batch, i.e., they interfere with the metabolites that flow through the column (carry-over effect). Fifth, a sample may not be stable in the time period between its analyses, leading to a bias in the results. That is, concentrations of unstable compounds decrease while concentrations of stable compounds increase due to evaporation of solvents. Sixth, the injection volume of a sample changes over measurements. Finally, because of their large number, the samples cannot be measured in one measurement batch and have to be separated into several Received: September 14, 2010 Accepted: February 4, 2011 Published: March 10, 2011 3267

dx.doi.org/10.1021/ac102374c | Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry measurement batches. These batches may generate different between-batch variation in responses of metabolites, such that their measurement errors become large. The between-batch variation is due to replacement of the liner and part of the column when a new measurement batch is started, i.e., measurement batches have different starting conditions among each other (level of cleanness of a column varies over different measurement batches). Note that analyses of a sample can be performed consecutively but also randomly over the entire measurement batch. In the latter type of measurements, the time between analyses will be much longer such that sample instability and GC column aging will increase the measurement error. In metabolomics, different metabolites are analyzed simultaneously in one sample, e.g., in the subsequent example the intracellular metabolites of Escherichia coli are measured comprehensively by a GC/MS method. Many of these metabolites have an important biological function in a cell but contain a polar (protic) functional groups that are thermally unstable at the temperatures required for their separation or are not volatile at all. The compounds are derivatized by a nonpolar derivatization agent (e.g., silylation reagent) via their functional group (e.g., O-H and N-H) before GC analysis to make them less polar and thermally more stable such that their concentrations can be measured by the GC/MS method. The stability of the derivatized forms of these compounds depends on the type of functional group. On the basis of these functional groups, the following three main classes of metabolites in decreasing order of stability of their derivatized forms have been established for the derivatization reaction silylation:4 metabolites that contain hydroxylic and carboxylic functional groups (e.g., O-H), compounds having amine or phosphoric groups (e.g., N-H), and analytes containing amide, thiol, or sulfonic groups. Generally, functional groups with oxygen (O-H) have a more stable bond with silicon atoms than those with nitrogen (N-H).5 Furthermore stereometry plays also a role in the stability of a compound. Primary alcohols are, for example, more stable than secondary alcohols since steric hindrance of binding to silylation reagent is much lower for primary alcohols than for secondary alcohols.5 Since stability of the silyl derivative is inversely proportional to its RSD value,4 different metabolites have different RSD values. Besides the nonconstant measurement errors that will be different for each metabolite, there may also be correlation between the measurement errors of different metabolites. Measurement error correlation occurs when, after multiple analyses of the same sample, two metabolites deviate from their average value in a consistent manner. This error correlation can be the result of coelution of compounds that belong to the same class of analytes such as sugars (share the same chemical properties), the same derivatization (metabolites containing hydroxylic and carboxylic functional groups).4 In addition, direct sylilation (derivatization) of sugars may produce several different peaks for every individual sugar compound, even when cyclization is inhibited by introducing an oximation step prior to silylation.4 The measurement errors of these peaks are thus correlated among each other. This may also occur in deconvoluted data (Figure 1 in the Supporting Information), where the area of one overlapping (coeluted) peak is estimated too high while the area of the other peak is too low. Other examples of data pretreatments are intrabatch correction and baseline correction (Figure 1 in the Supporting Information), which produce similar responses over repeated measurements of the same sample for several metabolites. This may also have induced positive measurement error correlations among metabolite responses.

ARTICLE

Because of all these issues discussed above, the repeatability will not do as a figure of merit for a comprehensive metabolomics data set. The aim of this work is thus to define new figures of merit to quantify the quality of a comprehensive functional genomics data set. To define such figures of merit, measurement errors and error correlations are first estimated and visualized as a function of measured responses. A recent study of measurement errors in NMR showed that these errors vary with metabolites’ responses, but it was not possible to investigate the measurement error of a single metabolite as a function of response. The conclusions were made for a single biological condition, i.e., the concentration of each metabolite was not changed during the investigation.3 For GC/MS data, response-dependent variances of a single metabolite have been characterized by the RockeLorenzato model.2,6 However, that model was used for only a few compounds and it cannot describe metabolite-to-metabolite error correlations when applied to many compounds.2 A full characterization of measurement error variance and measurement error correlation has been accomplished for a limited number of metabolites. On the basis of this characterization, a multivariate detection limit of a metabolite has been established to account for the presence of other interferring metabolites.7 However in their investigation, the measurement error correlation was assumed constant throughout the whole concentration range. In the present study, deconvoluted GC/MS data from a repeated profiling of E. coli fermentation performed under different experimental conditions has been employed to characterize the measurement errors and error correlation of a large number of metabolites as a function of their responses. For each metabolite, its response-dependent variance is modeled by the RockeLorenzato model, which includes a constant nonsample related part as well as a response-dependent part that measures the contribution of instrumental noise at high response levels. Then the Rocke-Lorenzato model is extended to include level dependent measurement error correlations among metabolites. Examples of correlated measurement errors will be provided, and their origin will be discussed. Finally we will introduce new figures of merit to quantify the nonconstant measurement errors and error correlations in any comprehensive functional genomics data set.

’ THEORY In functional genomics studies, the biological system is perturbed in different ways to study its response. The concentrations of most metabolites (the focus of metabolomic analysis) will change during the study. Because of the concentration dependent measurement error of most analytical methods used in metabolomics, the error structure of the data set will be different within the concentration range of a single metabolite (heteroscedastic), different between metabolites, and the error structure might show correlated measurement errors. To estimate this complex error structure, analytical repeats are necessary at different levels for all metabolites. This can be achieved by repeating the analysis of samples that cover a large range in most metabolites. In the following subsections, modeling and estimating measurement errors and measurement error correlations are explained in detail. Figures of merit for quantifying nonconstant measurement errors and error correlations are also described. The data flows from metabolites to these figures of merit and to the widely used RSDs are shown in Figure 1. The different steps of the data flow to the new figures of merit (blocks in Figure 1) 3268

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry

ARTICLE

Figure 1. Data flow from metabolomics data to component-to-component RSD values, indicated by “Traditional”, and to our new figures of merit, marked by “New approach”.

will be explained in this section and in the Supporting Information. Let us consider metabolomics data in which J metabolites (j, 1..., J) in S samples (s, 1..., S) are analyzed. Each sample is measured A (a, 1..., A) times. Hence a response of metabolite j, in measurement a of sample s is given by Xjsa. Modeling and Estimating Measurement Variance. Figure 3 in the Supporting Information shows the measured intensity of metabolite j after repeated analysis of sample s. These levels are sampled independently from a population which is assumed to be normally distributed N(μjs, σjs2), with parameter σjs2 as the measurement variance and parameter μjs as the population mean. Hence, the measurement error εjsa = Xjsa - μjs is a random variable, i.e., εjsa ∼ N(0,σjs2). The measurement variance σjs2 and and mean mean μjs are estimated by the sample variance of the repeated levels (see the Supporting Information). The measurement variance is estimated for each sample separately. In metabolomics studies, the number of repeated measurements of a sample is limited, preventing a robust analysis of measurement variance at all levels for each metabolite. Therefore, the following estimation procedure has been developed to obtain reliable estimation of σjs2 as function of the intensity. The ) are first sorted according to pairs of estimated values ( (and associated the value of the mean . The sorted levels ) are subsequently grouped in consecutive bins of W levels.10 A bin size of 10 (W = 10 levels) is used as a compromise for stable estimates and a sufficient number of bins. The sample and in the bins q (q = 1 ... Q) form the means of improved estimations of σjs2 and mean μjs . Modeling Measurement Variance by Rocke-Lorenzato Model. For a metabolite j, the measurement variance is modeled by a three-parameter model, given by σ js 2 ¼ σ Add 2 2

2

μjs e RMult

if 2

σjs ¼ σAdd þ σMult ðμjs - RMult Þ 2

if

μjs > RMult

ð1Þ

At low intensity levels, the measurement variance is a constant, the additive model parameter σAdd2, due to background noise that is present in the absence of a metabolite. Its size can be

estimated by analyzing blank samples, i.e., samples with matrix but no metabolites. Above the threshold intensity level, given by the model parameter RMult, multiplicative noise starts to contribute to the precision such that the measurement variance increases with intensity level. The associated proportionality constant is the model parameter σMult2. This model parameter is metabolite dependent, as it was observed earlier for the relative error in calibrated GC/MS data of propionitrile6 that its value is 4 times larger than that of toluene.2 The Rocke model describes both contributions.2 Estimating Parameters of the Rocke-Lorenzato Model. To quantify the model parameters for each metabolite simultaneously, the estimated measurement variance and mean for the binned data have been fitted to the Rocke-Lorenzato model in two steps repeatedly for different values of RMult. In this way, the distance between the data and the model for the whole response range is minimized for parameters σAdd2, σMult2, and RMult. The first step consists of fitting the variance of data at low intensity levels to a constant level, the so-called additive model parameter, denoted by σAdd2. The second step is thus fitting the data to a multiplicative ) that represents the contribution of an analyterm ( tical-specific error source that emerges beyond the threshold level RMult. Details about the fitting procedure are explained in the Supporting Information. Modeling Error Correlation. Van der Voet et al.7 already realized that measurement errors, denoted by εjsa, might correlate for pairs of compounds but considered this correlation constant over the whole concentration range. Here we assume the measurement error correlation Fjj0 s between different metabolites j and j0 to be dependent on their responses μjs and μj0 s. For low responses of metabolite j and j0 in sample s, we find a different error correlation than for the high responses of metabolites j and j0 . Estimating Error Correlation. The error correlation is estimated over different 2D bins that cover the measured 2D region of intensity levels for metabolites j and j0 in order to account for its intensity dependence. Here, we consider the measurement error correlation to be constant within a 2D bin. 3269

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry The grid of 2D bins is defined by balancing between two extreme situations. For a small number of 2D bins, the correlation within a 2D bin cannot be considered constant, while with a large number of 2D-bins, the amount of data points per 2D bin is too small to allow a precise estimation of the correlation level for each bin. (see the Supporting Information for details). The measurement errors of all repeats of all samples in a 2D bin are combined for the two metabolites to estimate the measurement error correlation matrix. As this matrix is excessively sensitive to outliers,8 it is estimated in a robust way using the classical Spearman rank-order coefficient. Details can be found in the Supporting Information. The number of samples decreases considerably when going from 1D to 2D bins. Hence, the size of the 2D-bin has to be much larger for estimating the error correlation matrix than the 1D-bin that was used for estimating the measurement variances. Figures of Merit. New figures of merit are needed for the metabolomics data set to quantify its analytical repeatability. Two figures are defined. (a) The measurement error distribution: The nonconstant measurement error distribution, i.e., distribution of the model parameter σMult over all metabolites is summarized by the median and 90% quantile. (b) The measurement error correlation: the figure of merit, called p-value, is the probability that the observed error correlations are the result of randomly generated data sets with uncorrelated measurement errors. It quantifies the significance of the observed error correlations. The figures of merit for nonconstant measurement errors quantifies the spread in measurement errors for all metabolites in a data set. When the 90% level is several times the median level of the σMult distribution, 10% of all metabolites have a very large nonconstant measurement error. The figure of merit for error correlations is given by a p-value associated with a test statistic. That statistic, called the Frobenius norm F, is such that it summarizes the estimated error correlation coefficients present in a metabolomics data set of J metabolites and allows each observed correlation coefficient to be tested on the same footing. Details about the test statistic used and its distribution for uncorrelated measurement errors can be found in the sections “Frobenius norm” and “Computing p-value for error correlations” in the Supporting Information.

’ EXPERIMENTAL SECTION Measurement Design. Batch fermentation experiments have been performed under different conditions using a factorial design to maximize the variation in the production of the amino acid phenylalanine.9 The layout of the study and measurement design and details about the study design are given in the Supporting Information. In this study, 12 samples at consecutive time points (t, red blocks in Figure 7 in the Supporting Information) have been extracted from each fermentation (f, yellow blocks in Figure 7 in the Supporting Information) in logarithmic growth (8 time points, t = 1...8) and stationary phase (4 time points, t = 1...8). The sampling at the different growth phases are shown together with accumulation of biomass over time in Figure 1 of ref 9. Each of these samples is analyzed in duplicate in the same measurement batch of GC/MS (a, blue blocks in Figure 7 in the Supporting Information) as running samples. The analyzed samples represent analytical repeats, i.e., measurement repeats of a fermentation at a time point, i.e., sample. They are used to quantify analytical repeatability of metabolites in a sample. The data consist of intensity values that

ARTICLE

represent the noncalibrated concentration levels of putative intracellular metabolites. The biomass density and phenylalanine concentrations have been measured separately. Approximately 200 samples are analyzed in duplicate. Since these repeats cannot be analyzed in one measurement batch, 8 measurement batches have been used. Each measurement batch contains 48 running samples. The running samples include all analytical repeats of two fermentation processes that are performed under two different sets of fermentation conditions. Data Flow. GC/MS data on analytical duplicates, also called measurement repeats, have first been preprocessed by an inhouse deconvolution method to obtain information about the concentration of 407 (putative) metabolites or analytes, also called peaks. This deconvolution method accounts partly for the measurement error structure of the raw GC/MS data by excluding signals at the background level. The data of the peaks have been normalized to the used injection volume through an injection internal standard. These corrected data have further been normalized to the biomass in the running sample via a labeled internal standard. The amount of this internal standard has been independently calibrated to the biomass.9 On the basis of a mix of batch internal standards that were included as part of running samples at fixed concentrations, different metabolites have been corrected for changes in intensity from running sample to running sample within the same batch. Data Matrix. The final data matrix obtained as indicated in Figure 7 in the Supporting Information contains 407 peaks (J = 407) and 196 samples (S = 196). All samples are distributed over b = 1...B (B = 8 measurement batches), and each sample is measured twice (A = 2) in the same measurement batch. The running order of the samples in each measurement batch is represented by d = 1...D (D = 54 running samples). In a single measurement batch, the time-series of two fermentation processes, performed under two different sets of fermentation conditions denoted by p1 and p2, have been measured in a random order as 48 consecutive running samples.

’ RESULTS AND DISCUSSION Figure of Merit for Nonconstant Measurement Errors. Before estimation of level dependent measurement error variances and covariances, an explorative analysis of the data is performed to detect outliers and strange phenomena. From the 407 peaks, 318 had almost only zero values and therefore were not considered for further analysis. From the remaining peaks, measurement errors were estimated from all samples in a robust manner (Modeling Measurement Variance by Rocke-Lorenzato Model and Estimating Error Correlation sections). This means that outliers were not included in the estimation. The remaining peaks have sufficiently large intensities for at least one design point to allow the estimation of measurement errors. and mean For each peak, the measurement error variance intensity level are estimated for each sample s (Modeling and Estimating Measurement Variance section). The samples are grouped in bins of 10 samples based on their mean intensity level. is obtained for each The average measurement error variance bin. A total of 10 samples within a bin proved sufficient for stable parameter estimation. Subsequently, the 3-parameter Rocke-Lorenzato model is fitted to the binned data for each peak using the two-step approach (Modeling Measurement Variance by Rocke-Lorenzato 3270

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry

Figure 2. Analyte-specific nonconstant measurement errors. The plot shows the standard deviation as a function of intensity levels for different peaks. The fitted values are included as lines. The blue, red, black, and green color represent the results for peaks Var015, Var365, Var118, and Var213, respectively.

Model). The fit is based on a robust M-estimator8 to prevent bias due to outliers in the estimation of the model parameters. Of the 89 peaks, 85 peaks have been fitted successfully. The other 4 peaks did not have a sufficient number of bins to allow a reliable estimation of at least one model parameter. To compare with the widely used figure of merit RSD, the standard deviation, i.e., the square root of the estimated measurement variance, has been plotted as function of mean for four different peaks in Figure 2. These four peaks have entirely different slopes given by σMult. These observations indicate that the dependence of measurement error on responses varies with metabolite. Each metabolite has, for example, a specific polar group and a number of carbon atoms that connect to the carbon atom that carries the polar group. Both properties characterize the thermal stability of the silyl derivatives of the targeted metabolite and their associated RSD values.4 A histogram of all 85 estimated σMult in Figure 3 reveals that half of the peaks have a small σMult (σMult e 0.1) and 10% of the peaks (8-9 peaks) have a relatively large value of σMult (σMult > 0.35). Peaks with σMult > 0.35 have a steeper slope than peak Var015 that is plotted in Figure 2. The presented four peaks have a negligible additive term σAdd (Figure 3). For all 85 peaks, most of them tend to distribute around the same small value (σAdd e 0.01) but few have a large value. These values behave according to a distribution of background levels that is the same for all metabolites. Hence, the parameters σAdd and RMult of some peaks can have large

ARTICLE

Figure 3. Nonconstant measurement errors for the metabolomics GC/ MS data set. Upper and lower plot show the distributions of intercept σAdd and slope σMult over different peaks (metabolites), respectively. The estimated intercepts and slopes of peaks Var015, Var365, Var118, and Var213 have been included.

values such that the estimation of the widely used figure of merit, relative standard deviation,

via a linear model with intercept zero (Figure 4), will underestimate the true rate of dependence of error on response, given by σMult (Figure 4). In addition, peaks have most of their measured responses in the range between 0 and RMult. Hence, the figures of merit RSDs for a metabolomics data set give a too optimistic quantification of nonconstant measurement errors. In view of this shortcoming, we propose the set of quantities, median and 90% level of the σMult distribution, as a figure of merit that quantifies the contribution of nonconstant measurement errors to the metabolomics data set. For the GC/MS metabolomics data set, we observed a median σMult level 0.1 and 90% level of 0.35. Thus, 10% of all peaks have a σMult which is 3.5 the median value. That is 8 to 9 of the 85 peaks have an extremely large nonconstant measurement error. Figure of Merit for Error Correlation among Peaks. To quantify the error correlations for the whole GC/MS metabolomics data set, the error correlations among 85 peaks (J = 85), i.e., error correlations for 3750 pairs of peaks, have been estimated. Error 3271

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry

Figure 4. Nonconstant measurement errors of a metabolite. The blue and red curve represent the fits to the data by a linear model with intercept 0 (its slope is RSD) and the Rocke-Lorenzato model, respectively.

correlations may vary over measured responses, such that the assumption of level-independent correlation is not valid anymore. For this reason, pairs of peaks have been analyzed in a grid of four 2D bins (q = 1...4). To allow estimation of error correlation, a 2D bin should have at least 3 samples (for details see Theory). The bincorrelation plot in Figure 5 reveals that error correlation between peaks Var015 and Var071 is strong at low responses and zero elsewhere. This shows that for the metabolomics data set, error correlation varies over measured responses. Although there are sufficient bins to quantify the error variance for the two metabolites of each pair, their co-occurrence in samples could be limited. Hence there are metabolites which have a low number of samples per bin in common such that their intererror correlation cannot be estimated. Three peaks do not even have a sufficient number of overlapping samples to allow the estimation of error correlations in each bin. Therefore, error correlations have been estimated for 11 987 bins (3046 pairs). To quantify the error correlations for this large metabolomics data set, the Frobenius norm is introduced. This number, denoted by Fobs, summarizes the error correlations (eq 7 in the Theory section of the Supporting Information) of the data set. To test the presence of error correlation, its value is compared to values Fo, derived from N randomly generated data sets with uncorrelated measurement errors. Figure 6 in the Supporting Information reveals that the observed Fobs is larger than Fo. Since N = 1000, the probability that the observed Frobenius norm Fobs belongs to the distribution of uncorrelated measurement errors, given by the p-value, is thus smaller than 1/N = 0.001. Its value is below 0.05 (significance level of test, Supporting Information). Hence there are correlations among peaks in the data set.

ARTICLE

The chance of having no error correlations in the data set, denoted by the p-value, can be used as a figure of merit for error correlations in the metabolomics data set. It turns out that the largest contributions to the observed Frobenius norm Fobs come from 164 bins (153 pairs) with a significant nonzero Spearman correlation coefficient (FWE < 0.05, Supporting Information). A total of 50 peaks are involved, and 15 of these occur in multiple different pairs (g10 pairs). A substantial number of these pairs have peaks with a similar trend over running samples in the same measurement batch (Figure 6). When the response of one peak increases between measurements of the same sample in the same measurement batch, the response of the companion peak will increase as well (Figure 6). Hence, the measurement errors are positively correlated between the peaks. Coelution of metabolites and erroneous data pretreatment may cause similar responses among peaks over measurements of the same sample in the same measurement batch and hence error correlations between peaks emerge. These error correlations will increase the Frobenius norm and subsequently will decrease the p-value. Hence the lower the p-value, the higher the chance of having error correlation in the data set. The p-value is thus a figure of merit that can serve as part of a diagnostic tool to check the performance of a data pretreatment or to quantify the amount of coelution in the data collected with a metabolomics platform. In the field of untargeted metabolomics, it can be part of an optimization procedure to select from many available preprocessing methods the one that produces a reliable metabolomics data set, that is, a data set with a minimal amount of error correlations generated by the employed preprocessing method.

’ CONCLUSIONS We defined figures of merit to quantify the structure of measurement errors of comprehensive analytical methods, such as the ones employed in functional genomics. These new figures of merit describe the key characteristics of heteroscedastic measurement error variance and concentration dependent correlations between measurement errors. They find their origin in a model for heteroscedastic measurement error variance and a significance test for measurement error correlation. The figures of merit for heteroscedastic measurement error variance are based on a single model for describing the measurement error variance of each metabolite as a function of its response. Subsequently, the median and 90% level of the distribution of the parameter describing the heteroscedasticity across all metabolites are used as summary statistics. The figure of merit for measurement error correlations is a p-value related to testing the significance of the calculated measurement error correlations. We used metabolomics data as an example to illustrate our new figures of merit. For the metabolomics GC/MS data, the heteroscedastic measurement error standard deviation has a median of 0.1 and 90% level of 0.35. Hence 10% of all peaks have a σMult which is 3.5 that of the median level. So 10% of the peaks have very large nonconstant measurement errors. The p-value for the metabolomics GC/MS data set is smaller than 0.001. This is clearly below the significance level of 0.05 and indicates the presence of error correlations. We have also shown that the widely used RSDs as figures of merit for metabolomics data have serious drawbacks: it may give too optimistic results. Furthermore, strong error correlations come from a small set of metabolites whose responses have a similar (opposite) monotonic 3272

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry

ARTICLE

Figure 5. Correlation bin plot for peaks Var015 and Var071. The pairs of intensity levels of these variables averaged over analytical repeats are plotted as circles. The estimated correlation level of each bin is given by a color between blue (Fjj0 s = -1) and red (Fjj0 s = 1).

to optimize data preprocessing methods ultimately resulting in high-quality comprehensive data.

’ ASSOCIATED CONTENT

bS

Supporting Information. Supplementary figures and details of methods and design (source code of all employed R scripts will be available at the research group Webpage). This material is available free of charge via the Internet at http://pubs. acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected].

Figure 6. Peak intensity measured at position (running samples) in measurement batch reveals a trend. The upper plot shows the measured intensities for peaks Var015 (red circles) and Var237 (blue circles). The lower plot shows the data of running samples (repeated measurements) selected for one sample. The arrows indicate the direction and size of the measurement errors.

trend across the running samples of the measurement batch. It illustrates, for example, that a data pretreatment such as intrabatch correction may in theory remove all trends due to coelution of metabolites and effects of having multiple silyl derivatives per metabolite but in practice fails to do so for several metabolites. In the era of comprehensive analytical methods, such as proteomics and metabolomics, it is important to have good figures of merit. Such figures of merit can be used for diagnosing the quality of the measurement. Moreover, they can also be used

’ ACKNOWLEDGMENT This project was financed by The Netherlands Metabolomics Centre (NMC), which is part of The Netherlands Genomics Initiative (NGI)/Netherlands Organization for Scientific Research. The authors wish to thank Bas Muilwijk, Karin Overkamp, and Carina Rubingh for providing us GC/MS data and explanations. We thank Dr. Andrew Gibson for proofreading the manuscript. ’ REFERENCES (1) Prentice, R. L. J. Natl. Cancer. Inst. 1996, 88, 1738–1747. (2) Rocke, D. M.; Lorenzato, S. D. Technometrics 1995, 37, 176–184. (3) Karakach, T. K.; Wentzell, P. D.; Walter, J. A. Anal. Chim. Acta 2009, 636, 163–174. (4) Koek, M. M.; Muilwijk, B.; van der Werf, M. J.; Hankemeier, T. Anal. Chem. 2006, 78, 1272–1281. (5) Handbook of Derivatives for Chromatography, 2nd ed.; Blau, K., Halket, J. M., Eds.; John Wiley & Sons, Inc.: Chichester, U.K., 1993. (6) Rocke, D. M.; Durbin, B.; Wilson, M.; Kahn, H. D. Ecotoxicol. Environ. Saf. 2003, 56, 78–92. 3273

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274

Analytical Chemistry

ARTICLE

(7) Van der Voet, H.; de Boer, W. J.; de Ruig, W. G.; van Rhijn, J. A. J. Chemom. 1998, 12, 279–294. (8) Huber, P. J. Robust Statistics, 1st ed.; Wiley series in probability and mathematical statistics; John Wiley & Sons, Inc.: New York, 1981; pp 43 (3.1), 204-205 (8.3). (9) Rubingh, C. M.; Bijlsma, S.; Jellema, R. H.; Overkamp, K. M.; van der Werf, M. J.; Smilde, A. K. J. Proteome Res. 2009, 8, 4319–4327. (10) Jansen, J. J.; Hoefsloot, H. C. J; Boelens, H. F. M.; Van der Greef, J.; Smilde, A. K. Bioinformatics 2004, 20, 2438–2446.

3274

dx.doi.org/10.1021/ac102374c |Anal. Chem. 2011, 83, 3267–3274