Influences of Normalization Method on Biomarker Discovery in Gas

Apr 12, 2017 - Data reduction techniques in gas chromatography–mass spectrometry-based untargeted metabolomics has made the following workflow of da...
0 downloads 8 Views 3MB Size
Article pubs.acs.org/ac

Influences of Normalization Method on Biomarker Discovery in Gas Chromatography−Mass Spectrometry-Based Untargeted Metabolomics: What Should Be Considered? Jiaqing Chen,†,‡,§,¶ Pei Zhang,†,‡,§,¶ Mengying Lv,∥ Huimin Guo,†,‡,§ Yin Huang,†,‡,§ Zunjian Zhang,*,†,‡,§ and Fengguo Xu*,†,‡,§ †

Key Laboratory of Drug Quality Control and Pharmacovigilance (Ministry of Education), ‡Jiangsu Key Laboratory of Drug Screening, and §State Key Laboratory of Natural Medicine, China Pharmaceutical University, Nanjing 210009, China ∥ School of Pharmacy, Shihezi University, Shihezi 832002, China S Supporting Information *

ABSTRACT: Data reduction techniques in gas chromatography−mass spectrometry-based untargeted metabolomics has made the following workflow of data analysis more lucid. However, the normalization process still perplexes researchers, and its effects are always ignored. In order to reveal the influences of normalization method, five representative normalization methods (mass spectrometry total useful signal, median, probabilistic quotient normalization, remove unwanted variation-random, and systematic ratio normalization) were compared in three real data sets with different types. First, data reduction techniques were used to refine the original data. Then, quality control samples and relative log abundance plots were utilized to evaluate the unwanted variations and the efficiencies of normalization process. Furthermore, the potential biomarkers which were screened out by the Mann−Whitney U test, receiver operating characteristic curve analysis, random forest, and feature selection algorithm Boruta in different normalized data sets were compared. The results indicated the determination of the normalization method was difficult because the commonly accepted rules were easy to fulfill but different normalization methods had unforeseen influences on both the kind and number of potential biomarkers. Lastly, an integrated strategy for normalization method selection was recommended.

U

untargeted metabolomics, the main purpose of preacquisition normalization is to avoid huge concentration variation which would result in nonlinear intensities in mass spectrometry detection.12 However, unwanted biological variations may still exist because it may be unclear whether the same scaled samples are really comparable (e.g., water content in blood, cell size, and different plant parts may mask the true concentrations of metabolites). On the other hand, postacquisition normalization methods are assigned to remove both biological and experimental variations. However, some methods just focus on the experimental aspect, such as operating error and variations between batches. Methods based on the internal/external standard(s) or pooled quality control (QC) samples are unable to remove biological variations.6,7 Other methods which are based on putative unchanging factors, such as statistical factors or models, are more promising to remove both experimental and biological

ntargeted metabolomics has been widely applied in studies of animal models, cell models, clinical diseases, and plants to screen out potential biomarkers.1−4 Recently, the development of data reduction techniques has brought untargeted metabolomics into a new era, and statistical analysis objects can be translated from ions to putative or identified metabolites.5 So, if the ranks of semiquantitative results are almost accurate between samples, statistical methods such as nonparameter tests, receiver operating characteristic curve (ROC), and random forest (RF) are capable to screen out significantly changed metabolites. However, the original intensities of ions are affected by unwanted variations arising from biological and experimental aspects.6−8 Unwanted variations in data sets from gas chromatography−mass spectrometry (GC-MS) are obvious because of complex pretreatments (particularly when derivatization is applied) and relatively longer time of instrumental analysis.9−11 Pre/post-acquisition normalization are therefore always applied before/after instrumental analysis to make the quantitative results comparable between samples.8 Preacquisition normalization is vital in quantitative metabolomics. In © 2017 American Chemical Society

Received: December 29, 2016 Accepted: April 12, 2017 Published: April 12, 2017 5342

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry

dilution and dehydration, equal volumes of samples were analyzed directly. Six QC samples were mixed and analyzed between samples. Urine Data Set. This data set was from a clinical study of respiratory-syncytial-virus-induced pneumonia (unpublished). Pediatric patients were classified into two groups according to Traditional Chinese Medicine syndromes, that is, the syndrome of wind and heat obstructing the pulmonary (W, n = 21) and the syndrome of phlegm and heat obstructing the pulmonary (P, n = 61). Additionally, 23 normal children (Ctr) were included, and 13 QC samples were mixed and analyzed between samples. Depletion of urea by urease10 and two-step derivatization were conducted on equal volume of samples with IS-HA added before untargeted detection. Data Reduction. Original data files were extracted and aligned by Profiling Solution (version 1.1, Shimadzu, Japan). Extracted data were exported to an ion-sample matrix and handled according to “blank-sample-ion filtering”, “80% rule in test samples”, missing-value filling and ion grouping by hierarchical clustering. Then, 167 putative volatile oils (metabolites) in plant data set, 86 and 114 putative metabolites in serum and urine data sets were picked out. Finally, by nonnegative matrix factorization (NMF) reduction of the ions’ intensities from the same metabolite,25 the dimensions of data sets were dramatically reduced (Method S-1, Supporting Information). Normalization Methods. Methods (detailed in Supporting Information Method S-2) which can efficiently remove both biological and experimental variations were tested, and therefore, internal-standard-based calibration methods were not included. In addition, calibration methods by QC samples in large-scale experiments were also beyond the scope of the current work. Mass Spectrometry Total Useful Signal (MSTUS) Normalization. The statistical-factor method based on the total signals or the useful signals is most commonly used.17−19 Actually, the process is to calculate the percentage of each metabolite in the whole detected metabolites. Thus, the assumption is that MSTUS will keep constant between samples if there are no unwanted variations. So, the key factor for this method is the linear relation between metabolites’ concentrations and the intensities in mass spectrometry. However, this presumption is always violated by significantly changed metabolites with large intensities. Here, MSTUS factors were calculated after ion grouping. Median Normalization. Normalization by the median is similar to MSTUS, but the influence of large-intensity metabolites is reduced because the assumption is that those middle-intensity metabolites would remain constant between samples.20,26 However, with the electron impact ionization technique in GC-MS, the data sets contain groups of ions originating from the same metabolites. This phenomenon is always more obvious for large-intensity metabolites. Then, the median value will bias toward those metabolites. We therefore calculated the median factors after data reduction. Probabilistic Quotient Normalization (PQN). PQN was designed to overcome the dilution factor between samples for NMR metabolomics.21 Here, the dilution factor is extended if the unwanted variations have equal effects on all metabolites in a sample (e.g., weight/volume difference, water content difference, and operating error). The assumption is that most metabolites have similar changes compared to the reference spectrum. The choice of reference spectrum is very crucial for this method, and QC samples are recommended because they are “average

variations. Additionally, compositional data analysis working with ratios between metabolites is also worth being studied. Though it is widely known that the normalization method should be compatible with experimental design, research purpose, and data mining methods,13 the choice of an appropriate method is still confusing. First, the prerequisites of a normalization method are difficult to confirm in some cases (e.g., the self-averaging property). Furthermore, evaluations of the efficiency after normalization are still in the gray area. Most studies compare different methods by RSD, ANOVA, correlation analysis, and principal component analysis (PCA) of QC samples, or by statistical verification of positive/negative control metabolites.6,7,14,15 However, these requests are always easy to fulfill. Finally, few studies have investigated the influence of normalization on biomarkers, and the good performance of a classification model should not be an indicator of an appropriate normalization method.16 This paper was aimed to reveal the influences of normalization method on biomarker discovery in GC-MS-based untargeted metabolomics and introduce a useful strategy to ensure the accuracy of normalization. Five representative normalization methods which can remove both experimental and biological variations theoretically were selected, including three statisticalfactor methods (mass spectrometry total useful signal,17−19 median,20 and probabilistic quotient normalization21), one statistical-model method (remove unwanted variation-random6), and one compositional data analysis method (systematic ratio normalization22). These methods were investigated in three real data sets (i.e., one serum data set from a rat model, one volatile-oil data set from plants, and one urine data set from human). First, we took advantage of data reduction techniques to refine the original data. Then the commonly used evaluation methods (i.e., QC samples and relative log abundance (RLA) plots) were utilized to explore the causes of unwanted variations under different situations and evaluate the efficiencies of normalization process. Furthermore, the potential biomarkers, screened out by the Mann−Whitney U test, ROC analysis, RF, and feature selection algorithm Boruta, were compared in different normalized data. Additionally, three statistical factors were further investigated by histograms and scatter plots to speculate the causes of inconsistent biomarkers. Finally, an integrated strategy for normalization method selection was recommended.



EXPERIMENTAL SECTION Data Sets. Three real data sets with different types were tested in this study. The untargeted detections were performed on GCMSQP2010 Ultra (Shimadzu Inc., Kyoto, Japan). Serum Data Set. This data set was from our previous study of cisplatin induced nephrotoxicity on rats.23 Samples collected at the fifth day after cisplatin dosing were utilized here and this data set comprised four groups, including normal saline (NS, n = 13), low-dose (LC, n = 12), middle-dose (MC, n = 13), and high-dose (HC, n = 19) of cisplatin. Additionally, 34 QC samples during the analysis process were included. Two steps of derivatization were conducted on the equal volume of samples with internal standard-heptadecanoic acid (IS-HA) added before untargeted detection. Plant Data Set. This data set comprised untargeted data of volatile oils in the roots and stems of traditional Chinese medicine Ephedra sinica.24 In total, 18 root samples (G) and 19 stem samples (MH) with equal weights were immersed in boiling water, and volatile oils were extracted by n-hexane. Following 5343

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry

Figure 1. Normalization results of serum data set. (A) Within-group RLA plots of 34 QC samples before normalization. All samples are sorted by batch sequences (discriminated by different colors) and injection time points. (B) Boxplots of RSD values of metabolites or ratios (SRN method) in QC samples. (C) The heat map of sum scores of 36 identified metabolites. The 90 percentiles of scores in the three comparisons are 99.1, 102.1, and 102 respectively. (D) ROC plots of L-isoleucine (Var 10), 4-hydroxy-L-proline (Var 18), and their ratio (Var10/Var18) between NS and LC samples in different normalization methods. (E) Scatter plots between median and PQN factors.

samples”.21 However, if the number of samples or the changes of metabolites in different groups are unbalanced, or there are too many groups with varying kinds of metabolites, QC samples may not fulfill the requirement. Here, the mean values of QC samples were regarded as reference spectrum. Remove Unwanted Variation-Random (RUV-Random) Method. The RUV-random method is developed on a linear mixed model. First, “quality control metabolites”, only associated with unwanted variations, are selected. Then, through singlevalue decomposition of these metabolites, k factors are extracted to estimate the unwanted variation factors. So finally, the unwanted variations can be subtracted from the linear mixed model.6 Here, this method was performed by R package “MetNorm”.6,27 Systematic Ratio Normalization (SRN). As a compositional data analysis method,16,22 the analytical objects are the ratios between pairs of metabolites, and the assumption is that the information contained in ratios are not influenced by unwanted variations. Therefore, the SRN method will dramatically increase the number of variates from n to (n2 − n)/2. The benefit of SRN is that the ratios are not directly related with the quantity of samples. Evaluation of Normalization. Metabolites’ RSD values among QC samples,10,28,29 PCA analysis, and within-group and across-group RLA plots6,7,30 were adopted to evaluate the normalization outcome. PCA models were constructed after Box−Cox transformation and autoscaling. For within-group RLA plot, the boxplot of each sample would have a median value close to zero and a similar range between samples within the group if the normalization method had efficiently reduced the unwanted variations. For across-group RLA boxplots, the differences between groups can be observed. Additionally, statistical factors of MSTUS, median, PQN, and IS-HA were presented by histogram and scatter plots to depict the differences and illustrate why the normalization results were different or similar between methods.

Data Analysis. To simplify the presentation and interpretation of metabolites in serum data set, the following data analysis processes were performed on 36 identified metabolites. Furthermore, two putative metabolites with large RSD values in the urine data set were removed. Univariate Statistics. Area under the curve (AUC) of ROC analysis and Mann−Whitney U test with FDR correction were conducted to evaluate the metabolite’s difference between two groups. Random Forest. RF (R package randomForest31,32) was first performed to evaluate the discrimination between groups. Through 1000 times of stratified sampling, error rates of test sets were summarized (Method S-3, Supporting Information). Then, RF was conducted to screen out potential biomarkers between two groups. MeanDecreaseGini values were utilized to indicate the importance of metabolites. All Relevant Metabolite Selection by Boruta. R package Boruta was applied to screen out classification related metabolites by a novel wrapper algorithm based on RF.33,34 Metabolites with significant importance than the best “random contrast variables (shadow features)” were picked out. Besides, the median of “importance values” for each metabolite was recorded to sort the importance among metabolites. Here, parameter “MaxRun” and “pValue” were set at 1000 and 0.01, respectively. Comprehensive Evaluation of Metabolites. For serum and urine data sets, the results of AUC (in ROC analysis), p values (in Mann−Whitney U test), MeanDecreaseGini (in RF analysis), and “median importance values” (in Boruta) were integrated to indicate the importance of metabolites. Metabolites were first scored from least to most importance on the basis of their ranks in each index and then evaluated by the sum scores. A larger score indicated the metabolite was more important. Theoretically, the largest sum scores for metabolites in serum and urine data sets were 36 × 4 and 112 × 4 respectively. Additionally, a simulative process by comparing top ranked metabolites was also performed. 5344

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry

Figure 2. Normalization results of plant data set. (A) Across-group RLA plots of test samples before (“None”) and after normalization. (B) Heat map of AUC values of 167 putative metabolites. (C) Histogram of AUC values of ratios in SRN method. (D) Examples of significantly changed ratios between G and MH.



RESULTS AND DISCUSSION Serum Data Set. Unwanted Variations. QC samples were fluctuant according to the injection time (Figure 1A), and the main sources of unwanted experimental variations were supposed to be the evaporation of solvent with the continuing derivatization reaction. The normalization processes successfully alleviated the variations within QC and test samples (Figure 1B and S-1). Additionally, only 8.1% of ratios in SRN normalized data set were unstable (RSD > 0.3). The small differences between QC samples was just a basic requirement. Scatter plots of metabolites and statistical factors were taken as examples to show other challenges (Figure S-2). First, not all metabolites in QC samples had a high correlation with statistical factors before normalization. Those poorly correlated metabolites might be influenced by multipeak/ multiorigination phenomenon35,36 or by any other mistakes in the former processes. Additionally, QC samples only represented one concentration level for each metabolite, and thus, it was unclear whether other concentrations could also be calibrated properly. Discrimination between Groups. Results of repeated RF analysis showed that the differences between cisplatin-treated and NS samples were dose-dependent (Figure S-3). NS and MC/HC samples were well-classified, whereas NS and LC samples were not. MSTUS, median, PQN, and RUV-random had similar results, while SRN improved the classification results between NS and MC/HC. Influences of Normalization Method on Potential Biomarkers. The potential biomarkers were inconsistent among different normalization methods. The sum scores of metabolites between normalization methods were variable (Figure 1C), especially for those high-score metabolites which were more likely to be biomarkers. Moreover, the simulation process of biomarker selection also indicated those highly scored metabolites had changeable positions under different methods (Table S-1). In comparisons between NS and HC samples, the RUV-random method was distinct from other methods. Additionally, the classification-relevant metabolites determined by Boruta (Figure S-4−S-6) and summarized by Venn plots (Figure S-7) showed that different normalization methods would influence some metabolites and thus affected potential biomarkers.

For SRN method, 14, 49, and 71 ratios were screened out. Interestingly, some ratios would enlarge the difference between groups, such as L-isoleucine and 4-hydroxy-L-proline (Figure 1D). Speculating the Causes. MSTUS and PQN factors remained constant between groups, whereas median factors had a rising trend from NS, LC, MC, to HC samples (Figure S-8A). Interestingly, intensities of IS-HA in HC samples were also significantly larger than other groups. Because quality-control metabolites in RUV-random method were selected by IS-HA, it reflected the statistical model of RUV-random in some degree. Besides, most HC samples in the median-PQN plot and MSTUS-PQN plot were deviated from other samples, while IS-HA had bad correlations with other factors (Figure 1E and S8B). All these inspections indicated the differences between statistical factors and the influences on potential biomarkers were mainly originated from HC samples. Choice of Normalization Method. In this data set, it was difficult to confirm the assumption of MSTUS and median methods. First, across-group RLA plots indicated most HC samples contained varying concentrations of metabolites (Figure S-9). The influences of large-intensity metabolites on MSTUS and median were unclear. Furthermore, it was found in another study that the volumes of urine in cisplatin-treated samples were also dose-dependent; that is, the order of urine volumes from largest to smallest was HC, MC, LC, and NS (unpublished, Method S-4). The larger values of IS-HA and urine volumes in HC samples indicated the water content in blood might be influenced and serum samples in HC group tended to be concentrated. Moreover, if the aim of similar studies was to discover real biomarkers, even targeted metabolomics should pay more attention to preacquisition normalization. PQN and RUV-random methods, with relatively better satisfied prerequisites, were recommended here. If both methods were applied, the intersection set of potential biomarkers was highly confident, and the union set was more comprehensive. Different from other methods, more potential ratios were found in the SRN normalized data set, and the classification results between some groups were improved (Figure S-3). However, there was still a shortage; that is, some ratios might be not directly related in the metabolic pathway (such as the Lisoleucine and 4-hydroxy-L-proline). Therefore, it was hard to determine their meanings in mechanism-aimed studies. 5345

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry

Figure 3. Normalization results of urine data set. (A) Within-group RLA plots of samples before (“None”) and after normalization. (B) Venn plots of classification relevant metabolites selected by Boruta. (C) Examples of significantly changed metabolites normalized by different methods. For the metabolite in each data set, the semiquantitative results in the two groups were extracted, Box−Cox transformed, and min−max scaled. Then the boxplot in each group was depicted. The ends of the whiskers represent the minimum and maximum values in the group.

Plant Data Set. Study Purpose. In this data set, the intended purpose of different normalization methods were distinct. MSTUS and median methods were aimed to compare the construction of components (i.e., the relative concentrations of a substance in all volatile oils), while RUV-random method was suitable to compare the absolute concentrations between equally weighted samples. For PQN method, the assumption seemed to be incompatible with any purpose. Because the concentrations of volatile oils between G and MH samples were obviously different (Figure 2A), there was no appropriate reference spectrum. Unwanted Variations and Data Mining Strategy. Without the derivatization process, the relatively small differences between QC samples were easy to handle (Figure S-10A and B), and only two ratios in SRN method were unstable (RSD > 0.3). However, the variations between groups were more obvious than that within groups. After transforming intensity values from absolute to relative concentrations, MSTUS and median methods reduced the difference between groups, whereas the RUV-random method still retained it (Figure 2A). The difference between groups affected the following data mining methods. Because about half of the metabolites’ and ratios’ AUC values were close to 1.0 in ROC analysis (Figure 2B,C), the application of various data mining methods was unnecessary. Influences of Normalization Methods on Potential Biomarkers. The potential biomarkers in different normalized

data sets were varying (Figure 2B and Figure S-10C). As shown in Figure S-10D, some metabolites were only significantly changed between groups in absolute concentrations or relative concentrations, but some were both changed in absolute and relative concentration. Histograms of statistical factors also clearly showed that G samples contained less violate oils and PQN factors between G and MH samples were much closer (Figure S-10E). On the other hand, MSTUS and median factors were linearly correlated with similar distributions (Figure S-10F), which explained their similar results. Meaning of SRN Method. Because the ratios between metabolites were almost independent of the sampling process, the significantly changed ratios here were promising to discriminate G and MH samples (Figure 2D). Regardless of the original sample weights, these ratios would still be good classifiers. Urine Data Set. Unwanted Variations. Within-group RLA plots showed the variations in the same-group samples were obvious before normalization because of the uncontrollable volumes of urine (Figure 3A). Additionally, the urea depletion process made the variations more obvious (shown in QC samples). All normalization methods successfully alleviated the problem (Figure S-11). In the SRN method, 15.3% of ratios were deleted (RSD > 0.3). Moreover, the assumption of PQN method 5346

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry Table 1. Summary of the Biomarker Candidates Screened out by Four Normalization Methods in Urine Data Set group pairs Ctr vs W

Ctr vs P

W vs P

ranksa top 5 top 20 top 30 top 10 top 20 top 25 top 10 top 15 top 20

commonsb 1 12 16 4 11 13 4 4 4

MSTUSc

mediand

0 (25) 2 (102) 2 (102) 1 (31) 1 (97) 1 (111) 1 (80) 2 (80) 3 (80)

0 (35) 1 (98) 2 (110) 1 (22) 0 (98) 0 (99) 1 (47) 1 (90) 1 (109)

PQNe g

1 (35) 2 (99) 4 (112) 0 (29) 0 (92) 2 (92) 1 (47) 2 (69) 1 (91)

RUV-randomf 2 (65) 7 (87) 8 (109) 4 (72) 7 (110) 10 (110) 3 (79) 6 (106) 9 (110)

a

Top 5, 10, 15, etc. metabolites in ranks were assumed to be potential biomarkers which were determined according to the results of Boruta. bThe number of common metabolites among four methods. c,d,e,fThe first digit indicates the number of metabolites screened out uniquely in this data set and the digit in parentheses indicates the max rank (in this data set) of the metabolites screened out in other normalized data sets. gTaking it as an example, among all metabolites ranked top 5 between Ctr and W samples in different normalized data sets, one metabolite was only found in PQN normalized data set and another metabolite screened in other data sets only ranked 35th in PQN normalized data set.

carefully, especially in the aspects of the possible sources of unwanted variations and the purpose of removing unwanted variations. If the examinations indicate the quality control metabolites are hard to determine, the application of RUVrandom is limited. PQN method was initially designed for urine metabolomics based on NMR. In this data set, test samples in PQN and median methods contained less variations than that in MSTUS and RUV-random methods (Figure 3A). Besides, variations in QC samples indicated PQN method had a better result (Figure S11A). Median method was close to PQN method in respects of QC samples, test samples, and statistical factor (Figure S-14), but the potential biomarkers were still different slightly (Figure 3B). For SRN method, the stability of ratios should be noticed since the ratios with large RSD values among QC samples were much obvious than that in serum and plant data sets.

was not violated because no significant differences between groups were found (Figure S-11D). The stability of SRN seemed to be related to the data sets. With urea depletion and derivatization processes, the urine data set contained more unstable ratios than other data sets. Discrimination between Groups. MSTUS, median, PQN, RUV-random, and SRN methods had similar results in repeated RF analysis (Figure S-12). The difference between Ctr and W was very distinct, while the differences between Ctr versus P and W versus P were not so obvious. In order to balance the trainingset samples between P and other groups, 3/4 of samples in P group were assigned to validate the RF models, which may slightly influence the error rates. Anyway, the RF models were stable enough to screen potential biomarkers. Influences of Normalization Methods on Potential Biomarkers. Here, the purposes of different normalization methods (except SRN) were the same, but both the kind and number of biomarkers in different normalized data sets were inconsistent. The sum scores of metabolites between normalization methods were varying (Figure S-13A). Besides, the simulation process of biomarker selection (Table 1), the relevant metabolites determined by Boruta (Figure 3B), and some representative metabolites (Figure 3C) all indicated a satisfactory conclusion. In SRN normalized data set, 144, 116, and 82 ratios were screened out by Boruta. Similar to the results in plant data set, such ratios were promising classifiers for clinical urine samples (Figure S-13B). Choice of Normalization Method. MSTUS method was not recommended because the assumption of MSTUS in this data set was hard to fulfill. Without any preacquisition normalization, the high variability of metabolites between samples would introduce missing values in the data set. On the other hand, intensities of some abundant metabolites would be saturated. In some studies, creatinine or urine volume has been utilized to estimate the dilution effects and adjust the samples before instrumental analysis.12 Becuase the MSTUS method is highly dependent on the linear relationship between concentration and response, it would be much better if the variability between samples could be alleviated beforehand. RUV-random method also seemed to be unsuitable for this data set because the selection of “quality control metabolites” based on IS-HA were unsure. Actually, because of the changing volumes of urine samples, there was no better alternative way. For this method, the original data sets should be examined



CONCLUSIONS Commonly applied evaluation methods were not sufficient and accurate enough to indicate the appropriate normalization method for some data sets. However, different normalization methods with their own limitations did have influences on the kind and number of potential biomarkers. If inappropriate methods were applied, the consequences were unpredictable. We therefore recommended the following two-part strategy for the determination of normalization method. Part I was aimed to get a comprehensive understanding of the compatibility between the data and normalization methods. First, the determination of study purpose: theoretically, unfitting methods should be excluded (e.g., the normalization methods with varying purposes in plant data set). Second, inspections of QC samples, that is, within-group RLA plots, missing-value inspection, and so on, should be performed before normalization to exclude outliers and find out the main sources of unwanted experimental variations. Third, inspections of test samples, that is, across-group RLA plots, distribution of MSTUS factors between groups, and the experience with the experiments, were useful to estimate metabolic differences between groups. Any method with incompatible theory should be filtered out (e.g., the PQN method for plant data set). Fourth, tests on normalized data, that is, basic requests of the similarity between QC samples and other known restrictions (e.g., positive/negative control metabolites), should be satisfied. Besides, the discrimination of samples and stability of multivariate models also should be considered. 5347

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348

Article

Analytical Chemistry

(7) De Livera, A. M.; Dias, D. A.; De Souza, D.; Rupasinghe, T.; Pyke, J.; Tull, D.; Roessner, U.; McConville, M.; Speed, T. P. Anal. Chem. 2012, 84, 10768−10776. (8) Wu, Y. M.; Li, L. J. Chromatogr. A 2016, 1430, 80−95. (9) Dunn, W. B.; Broadhurst, D.; Begley, P.; Zelena, E.; FrancisMcIntyre, S.; Anderson, N.; Brown, M.; Knowles, J. D.; Halsall, A.; Haselden, J. N.; Nicholls, A. W.; Wilson, I. D.; Kell, D. B.; Goodacre, R. Nat. Protoc. 2011, 6, 1060−1083. (10) Chan, E. C. Y.; Pasikanti, K. K.; Nicholson, J. K. Nat. Protoc. 2011, 6, 1483−1499. (11) Want, E. J.; Wilson, I. D.; Gika, H.; Theodoridis, G.; Plumb, R. S.; Shockcor, J.; Holmes, E.; Nicholson, J. K. Nat. Protoc. 2010, 5, 1005− 1018. (12) Chen, Y. H.; Shen, G. Q.; Zhang, R. P.; He, J. M.; Zhang, Y.; Xu, J.; Yang, W.; Chen, X. G.; Song, Y. M.; Abliz, Z. Anal. Chem. 2013, 85, 7659−7665. (13) Saccenti, E.; Hoefsloot, H. C. J.; Smilde, A. K.; Westerhuis, J. A.; Hendriks, M. M. W. B. Metabolomics 2014, 10, 361−374. (14) Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K.; Kusano, M. Anal. Chem. 2009, 81, 7974−7980. (15) Jauhiainen, A.; Madhu, B.; Narita, M.; Narita, M.; Griffiths, J.; Tavare, S. Bioinformatics 2014, 30, 2155−2161. (16) Filzmoser, P.; Walczak, B. J. Chromatogr. A 2014, 1362, 194−205. (17) Godzien, J.; Ciborowski, M.; Angulo, S.; Ruperez, F. J.; Paz Martinez, M.; Senorans, F. J.; Cifuentes, A.; Ibanez, E.; Barbas, C. J. Proteome Res. 2011, 10, 837−844. (18) Warrack, B. M.; Hnatyshyn, S.; Ott, K. H.; Reily, M. D.; Sanders, M.; Zhang, H. Y.; Drexler, D. M. J. Chromatogr. B: Anal. Technol. Biomed. Life Sci. 2009, 877, 547−552. (19) Peralbo-Molina, A.; Calderon-Santiago, M.; Priego-Capote, F.; Jurado-Gamez, B.; Luque de Castro, M. D. Anal. Chim. Acta 2015, 887, 118−126. (20) Wang, W. X.; Zhou, H. H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H. Anal. Chem. 2003, 75, 4818−4826. (21) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Anal. Chem. 2006, 78, 4281−4290. (22) Lehallier, B.; Ratel, J.; Hanafi, M.; Engel, E. Anal. Chim. Acta 2012, 733, 16−22. (23) Zhang, P.; Chen, J.; Wang, Y.; Huang, Y.; Tian, Y.; Zhang, Z.; Xu, F. Chem. Res. Toxicol. 2016, 29, 776−783. (24) Lv, M. Y.; Sun, J. B.; Wang, M.; Fan, H. Y.; Zhang, Z. J.; Xu, F. G. Chin. J. Nat. Med. 2016, 14, 133−140. (25) Fernandez-Albert, F.; Llorach, R.; Andres-Lacueva, C.; PereraLluna, A. Anal. Chem. 2014, 86, 2320−2325. (26) Scholz, M.; Gatzek, S.; Sterling, A.; Fiehn, O.; Selbig, J. Bioinformatics 2004, 20, 2447−2454. (27) De Livera, A. M. Package MetNorm for R, 2015. Available at the following: https://cran.r-project.org/web/packages/MetNorm/ MetNorm.pdf. (28) van der Kloet, F. M.; Bobeldijk, I.; Verheij, E. R.; Jellema, R. H. J. Proteome Res. 2009, 8, 5132−5141. (29) Dunn, W. B.; Broadhurst, D. I.; Atherton, H. J.; Goodacre, R.; Griffin, J. L. Chem. Soc. Rev. 2011, 40, 387−426. (30) Chawade, A.; Alexandersson, E.; Levander, F. J. Proteome Res. 2014, 13, 3114−3120. (31) Liaw, A.; Wiener, M. R News 2002, 2, 18−22. (32) Breiman, L. Mach. Learn. 2001, 45, 5−32. (33) Kursa, M. B.; Rudnicki, W. R. J. Stat. Softw. 2010, 36, 1−13. (34) Kursa, M. B.; Jankowski, A.; Rudnicki, W. R. Fundam. Inform. 2010, 101, 271−286. (35) Xu, F. G.; Zou, L.; Ong, C. N. J. Proteome Res. 2009, 8, 5657− 5665. (36) Xu, F. G.; Zou, L.; Ong, C. N. TrAC, Trends Anal. Chem. 2010, 29, 269−280.

Part II was used to handle biomarkers. If only one normalization method was available, the biomarkers were confirmed directly (e.g., RUV-random was the only appropriate method for absolute concentration comparison in plant data set). In cases when several methods were applicable, the intersection of potential biomarkers screened out in several normalized data sets were regarded as highly confident biomarkers; otherwise, the union set could provide more metabolites worth being studied.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.6b05152. Additional methods, tables, and figures (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Fax: +86 025 83271021. Tel.: +86 025 83271021. *E-mail: [email protected]. Fax: +86 025 83271454. Tel.: +86 025 83271454. ORCID

Jiaqing Chen: 0000-0001-9598-9701 Yin Huang: 0000-0001-9678-2630 Fengguo Xu: 0000-0001-9999-0128 Author Contributions ¶

(J.C. and P.Z.) These authors contributed equally. All authors have given approval to the final version of the manuscript.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank Dr. Alysha M. De Livera for illustrating the application of RUV-random method and developing R package MetNorm. We also thank Miron B. Kursa and Witold R. Rudnicki for developing Boruta, and other numerous algorithm developers in R language and MATLAB. This study was financially supported by the NSFC (Nos. 81573385, 81573626, and 81430082), the Program for Jiangsu province Innovative Research Team, the Program for New Century Excellent Talents in University (No. NCET-13-1036) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).



REFERENCES

(1) Zhao, Y. N.; Zhao, C. X.; Lu, X.; Zhou, H. N.; Li, Y. L.; Zhou, J.; Chang, Y. W.; Zhang, J. J.; Jin, L. F.; Lin, F. C.; Xu, G. W. J. Proteome Res. 2013, 12, 5072−5083. (2) Hu, L. P.; Browne, E. R.; Liu, T.; Angel, T. E.; Ho, P. C.; Chan, E. C. Y. J. Proteome Res. 2012, 11, 5903−5913. (3) Klawitter, J.; Klawitter, J.; Schmitz, V.; Brunner, N.; Crunk, A.; Corby, K.; Bendrick-Peart, J.; Leibfritz, D.; Edelstein, C. L.; Thurman, J. M.; Christians, U. J. Proteome Res. 2012, 11, 5135−5144. (4) Zhao, L. J.; Zhao, A. H.; Chen, T. L.; Chen, W. L.; Liu, J. J.; Wei, R. M.; Su, J.; Tang, X. L.; Liu, K. Y.; Zhang, R.; Xie, G. X.; Panee, J.; Qiu, M. F.; Jia, W. J. Proteome Res. 2016, 15, 2327−2336. (5) Chen, J.; Xu, F. Bioanalysis 2017, 9 (3), 235−238. (6) De Livera, A. M.; Sysi-Aho, M.; Jacob, L.; Gagnon-Bartsch, J. A.; Castillo, S.; Simpson, J. A.; Speed, T. P. Anal. Chem. 2015, 87, 3606− 3615. 5348

DOI: 10.1021/acs.analchem.6b05152 Anal. Chem. 2017, 89, 5342−5348