MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating

Aug 17, 2018 - MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating Normalization Methods in Blood Metabolomics Data from Maintenance ...
0 downloads 0 Views 3MB Size
Article Cite This: Anal. Chem. 2018, 90, 11124−11130

pubs.acs.org/ac

MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating Normalization Methods in Blood Metabolomics Data from Maintenance Hemodialysis Patients Shisheng Wang,†,§ Xiaolei Chen,‡,§ Dan Du,† Wen Zheng,† Liqiang Hu,† Hao Yang,† Jingqiu Cheng,† and Meng Gong*,† †

Anal. Chem. 2018.90:11124-11130. Downloaded from pubs.acs.org by COLUMBIA UNIV on 01/18/19. For personal use only.

West China-Washington Mitochondria and Metabolism Research Center and Key Lab of Transplant Engineering and Immunology, MOH, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China ‡ Department of Nephrology, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China S Supporting Information *

ABSTRACT: Because of inevitable and complicated signal variations in LC-MSn-based nontargeted metabolomics, normalization of metabolites data is a highly recommended procedure to assist in improving accuracies in metabolic profiling and discovery of potential biomarkers. Despite various normalization methods having been developed and applied for processing these data sets, it is still difficult to assess their performance. Moreover, such methods are elusive and difficult to choose for users, especially those without bioinformatics training. In this study, we present a powerful and user-friendly web platform, named MetaboGroupS, for comparison and evaluation of seven popular normalization methods and provide an optimal one automatically for end users based on the group entropies of every sample data point. For examination and application of this tool, we analyzed a complex clinical human data set from maintenance hemodialysis patients with erythrin resistance. Metabolite peaks (11 027) were extracted from the experimental data and then imported into this platform; the entire analysis process was completed sequentially within 5 min. To further test the performance and universality of MetaboGroupS, we analyzed two more published data sets including a nuclear magnetic resonance (NMR) data set on this platform. The results indicated that the method with a lower intragroup entropy and higher intergroup entropy would be preferable. In addition, MetaboGroupS can be quite conveniently operated by users and does not require any profound computational expertise or background for scientists in many fields. MetaboGroupS is freely available at https://omicstools. shinyapps.io/MetaboGroupSapp/.

W

tional approach,8 which would be not conducive to long-range application of metabolomics in biomarker identification, pathological studies, or drug discovery. Generally, these unwanted variations can be reflected by signal drift and batch effects, which are also continually encountered in long-term metabolic profiling. To correct for signal drift and eliminate batch effects, repeated analysis of quality control (QC) samples, which are typically prepared by pooling each sample, is widely utilized over the entire time period of large-scale studies.9,10 Additionally, some statistical approaches are also implemented to capture biases of arbitrary complexity and improve the overall differential profiles across data sets. Normalization is now frequently taken into account as a necessary part of data analysis.11,12 Two main types of normalization methods have been developed so far: (1)

ith the tremendous development of instruments and gradual amplification of relative databases, mass spectrometry (MS) coupled with either liquid chromatography (LC-MS) or gas chromatography (GC/MS) is increasingly becoming a prevalent and powerful approach for the identification and quantification of various small-molecule metabolites involved in complex biological and disease processes in organisms,1−3 for example, renal anemia, which is usually present in maintenance hemodialysis patients suffering chronic kidney disease (CKD).4 Moreover, hundreds, even thousands of metabolites from biosamples can be detected in one assay due to high sensitivity and versatile selection capabilities.5,6 In addition, different forms of unwanted variations caused by biological or experimental processes may give rise to significant systematic biases involving the raw metabolomics data, resulting in invalidation of downstream statistical inference.7 As a result, deciphering and visualizing these large-scale data sets is a formidable and arduous task that usually requires a sophisticated computa© 2018 American Chemical Society

Received: July 8, 2018 Accepted: August 17, 2018 Published: August 17, 2018 11124

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry sample-center normalization, such as median normalization8 and variance stabilizing normalization (VSN),13 aims at correcting different sample-to-sample concentrations; (2) metabolite-center normalization, in which QC sample-based support vector regression (QC-SVR)14 and robust, locally estimated scatterplot smoothing (LOESS) signal correction (QC-RLSC)15 are fairly representative, is imposed primarily to correct data with batch-to-batch experiment analytical variations. Most of these methods are integrated in some published pipelines, such as NOREVA,16 BatchQC,17 and MetaboAnalyst.18 However, most of them are either too obscure to be operated for users or inadequate to provide an applicable normalization method. Furthermore, in information theory, entropy (common symbol: S) has been a fundamental quantity in thermodynamic systems and expands its application in statistics and machine learning.19,20 The classical formula is S = kB × log Ω

Seven frequently used normalization methods, as mentioned above, and a group entropy algorithm were embedded into this software. Detailed operations about MetaboGroupS are shown in the Supporting Notes. Sample Collection and Preparation. A total of 61 maintenance hemodialysis patients in West China Hospital (Sichuan, China) were involved in this study. All the following comorbidities, which are relevant to secondary anemia, had been excluded: bleeding, acute infections, malignant tumors, hematological diseases, iron deficiency, and malnutrition. The therapeutic effect of Erythrotropin (EPO) on these patients was evaluated using the Erythrotropin Resistance Index (ERI), calculated as the weekly weight-adjusted dose of erythropoiesis-stimulating agents (ESA) divided by Hb level (g/L). According to the latest clinical records of these patients, the ERI values ranged from 0 to 36.28. The patients were divided into three groups based on ERI values. Group A (n = 22) with ERI < 10, Group B (n = 22) with ERI 10−20, and Group C (n = 17) with ERI > 20. Fasting blood samples were collected with anticoagulant heparin sodium salt and stored at 4 °C for no more than 2 h. Plasma samples were separated by centrifuging whole blood at 1500g for 10 min at 4 °C and then stored at −80 °C. Each 0.2 mL plasma sample was added to 0.8 mL of a mixture of chloroform and methanol (2:1), vortexed for 1 min, and then centrifuged at 13 000g for 10 min at 4 °C. The upper phase of each sample was collected for subsequent LC-MS analysis. Liquid Chromatography/Mass Spectrometry Analysis. Biological sample preparation and data acquisition and processing were in accordance with the same protocol as described in Nikolic et al.29 LC-MS/MS data were acquired in positive ion mode and negative ion mode separately using a Xevo G2-XS Q-TOF mass spectrometer (Waters) controlled by Masslynx software (Waters, Version 4.1). Chromatographic separation was carried out using an HSS T3 column (2.1 × 100 mm, 1.8 μm, Waters) on an ACQUITY UPLC I-Class system (Waters). The detailed chromatographic and mass spectrometer parameters are described in the Supporting Information (UPLC-QTOF-MS Parameters). The pooled QC samples combined with small aliquots (10 μL) of each sample were then used throughout the experiment as a process control to monitor LC and MS performance across sample runs, as recommended by Sangster et al.30 After data acquisition, peak intensities, mass to charge (m/z), and retention time (RT) can be extracted from the raw data using Progenesis QI (Waters, version 2.3.6275.47962). Published Data Sets Collected for Verification. For testing the performance and utility of MetaboGroupS, two more published data sets were collected: (1) MTBLS79,31 in which 48 metabolites were extracted eventually across 172 cardiac tissue samples, including 38 QC samples; (2) Wine data,32 which included the 1H nuclear magnetic resonance (NMR) spectra of different origin and color (red, white, and rose) and were preprocessed with speaq package to obtain the feature matrix.33 Data Preprocessing and Normalization. The uploaded data containing a sample-by-feature matrix at the entrance portal for analysis in MetaboGroupS can be xlsx, xls, csv, and txt formats, which are easily output with Progenesis QI software or other similar tools.34 Subsequently, we set missing values (whose peak intensities are 0s) to not available values (NAs) and removed those features in which the NAs ratio was above 0.5 (50%). After that, imputation was implemented with

(1)

where kB is the Boltzmann constant, equal to 1.38065 × 10−23 J/K, and Ω is the number of possible microscopic configurations.21 This formula reveals the relationships between entropy and the number of ways in which the atoms or molecules of a thermodynamic system can be placed.22 Thus, the entropies of different group samples can be natural metrics, and standard evaluations describing the information content in each data set post-treated by diverse normalization methods can be determined. In this work, we designed a powerful and comprehensive software tool, MetaboGroupS, which can automatically calculate group entropies based on principal component analysis (PCA) score matrix of every data set after normalization and comparatively evaluate the fitness of different normalization methods from miscellaneous perspectives, in order to provide the most appropriate normalization methods for subsequent analysis. To date, MetaboGroupS has performed seven current methods, including median normalization,23 standard normalization,24 VSN,13 Remove Unwanted Variation-Random normalization (RUV-random),25 QCSVR,14 EigenMS,26 and QC-RLSC,15 and also reserves the potential of integrating additional methods in the future. Additionally, there are no complicated operations in MetaboGroupS, with only a necessary requirement of having access to the Internet. Furthermore, one experimental data obtained from maintenance hemodialysis patients with erythrin resistance and two published data sets31,32 were applied to extensively exhibit the originality and availability of this software. With this, we aim to enable scientists not engaged in bioinformatics to conveniently incorporate relevant metabolomics analysis into their research programs, especially for clinical data analysis.



EXPERIMENTAL DETAILS Software Implementation. All functions in MetaboGroupS were written in R (https://www.r-project.org/),27 and the graphical user interface (GUI) was developed in Shiny.28 This platform was deployed on the free shinyapp.io sever, which is supported by the RStudio team. Alternatively, we prepared a spare Web site: http://www.omicsolution.org/ wukong/MetaboGroupS, which users can also visit and analyze their data freely with no login requirement. The GUI contains three main parts (Figure S1): each module name, the parameters setting panel, and the results presentation panel. 11125

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry

Figure 1. Straightforward computation framework of MetaboGroupS.

the k-Nearest Neighbor (KNN) algorithm35 and the coefficient of variation (CV) for each group of samples was counted statistically based on the log2-tansformed data, while four modes of transformation (“Log2”, “Log”, “Log10”, “none”) in MetaboGroupS can be chosen felicitously for users according to their own data. After preprocessing, the data were normalized using seven methods (median normalization,23 standard normalization,24 VSN,13 RUV-random,25 QC-SVR,14 EigenMS,26 QC-RLSC15). The syntaxic introduction and implementation in R language are interpreted in Table S2. Group Entropy Computation. Entropy and mutual information estimation have been given close attention in information theory.36 On the basis of those principles, herein we define the group entropies as

as for each feature (Figure S2B). The maximum rate of missing values in samples was approximately 49%, which indicates that the quantification of those clinical samples was eligible. Subsequently, features with missing value rates above 0.5 and CV above 0.3 (Figure S3) were removed arbitrarily and the remaining data, containing 5308 features (Table S4), were imputed with the KNN algorithm.35 For data skewness, which usually makes data not subject to normal distribution and then affects the accuracy of subsequential computation,40 logarithm transformation is usually taken into consideration to remove or reduce this influence in biomedical and psychosocial data analysis.41 Two boxplots were shown for the interpretation of difference between the two results in Figure S4. The skewness range across samples in each group (Table 1) revealed that the Table 1. Skewness Range of Original and Log2Transformed Intensities

g

Hge = − ∑ θk log (θk) k=1

(2)

group

where g is the replicate number of each group sample and θk is the bin frequencies of PCA score distance matrix of the k-th group, which can be deduced with a James-Stein-type shrinkage estimator, as shown below.37

QC A B C

original [47.68, [40.59, [38.48, [29.28,

58.07] 62.36] 67.40] 60.96]

log2 [0.62, [0.42, [0.38, [0.41,

0.66] 0.69] 0.67] 0.62]

g

Shrink Shrink Shrink Ĥge = − ∑ θk̂ log (θk̂ ) k=1

distribution of raw intensities was right-skewed, while the log2transformed data were much better than the original. For the two published data sets, the same manipulation and control standards were carried out (data not shown). Data quality would generally meet the fundamental requirements with these two basic processes, and then users can proceed with the following analysis. Normalization Performance. Seven common normalization methods mentioned above were implemented on the log2-transformed data. The normalized data should have a center value (median or mean) close to a constant and low variation around this center value within or across the group of samples. Relative log abundance (RLA) plots25 (Figure 2) were used to detect unwanted variations under the different conditions and to preliminarily evaluate the potencies of these methods. Therefore, we would expect to inspect the integral distribution of data within each group. As illustrated in Figure 2a, the RLA plot of no normalization displays the variation trend within-group samples which fluctuated due to uncontrollable experimental operation. The situation underwent some optimization after normalization, in which RUV-random and

(3)

Therefore, it is necessary to infer the PCA score matrix based on the respective normalization data using the prcomp function in advance and then to calculate the Euclidean distance matrix according to its definition.38 A straightforward computation scheme is presented in Figure 1.



RESULTS AND DISCUSSION Pretreatment of Entire Data Sets. All preliminary peak information (intensities, m/z, RT) was extracted for the 4 groups of samples with Progenesis QI software. The data contained a total of 11 027 peaks (Table S3). However, the raw intensity data cannot be processed directly because of missing values (NAs) and skewness,39 which is supposed to be inspected in advance. Missing values, from which no valid information can be derived, were mostly considered as useless objects and frequently broke down the computational procedure; thus, those features with a high rate of NAs were supposed to be deleted. In this work, we applied all 0s to NAs and then added up all NAs in each sample (Figure S2A) as well 11126

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry

Figure 2. Normalization results of experimental data sets. Within-group and across-group RLA plots of maintenance hemodialysis patients blood samples (a) before (“No Normalization”) and after seven different normalization methods: (b) Median Normalization, (c) Standard Normalization, (d) VSN, (e) RUV-random, (f) SVR, (g) EigenMS, and (h) QC-RLSC.

QC-RLSC appeared to have better performance, as RUVrandom attempted to remove variation of no interest based on a linear mixed effects model and QC-RLSC could minimize the impact of experimental or biological variation with a locally estimated scatterplot smoothing function. However, it was still difficult to provide an intuitive assessment of the RLA plots and insufficiently precise to evaluate the suitability of different normalization methods. The score plots of the first two or three principal components in PCA also frequently made use of interpretable visualization of efficiencies of normalization methods, which

can state the summarization of all data sets and the manifestation of data groupings before and after data normalization.42 Figure 3 (also Figures S5 and S6) displays the 2-dimensional PCA score plots for our experimental data and the two published data sets using different normalization methods. As Figure 3 illustrated, we can discover that most of these approaches for handling complicated clinical sample data are ineffective and indistinguishable when compared with no normalization, except in the cases of EigenMS and QC-RLSC. Nevertheless, the QC samples were clustered more tightly after QC-RLSC normalization and, moreover, the experimental 11127

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry

Figure 3. Comparison of various normalization methods based on PCA. (a) No Normalization, (b) Median Normalization, (c) Standard Normalization, (d) VSN, (e) RUV-random, (f) SVR, (g) EigenMS, and (h) QC-RLSC. The different samples were color-visualized with 95% CI according to their group information.

group samples were spread apart from each other, although some samples were cross-linked together, which may demonstrate exactly the complexity of clinical samples. Correspondingly, the results of MTBLS79 (Figure S5) and NMR data (Figure S6) showed a similar complex situation in PCA score plots, which reminded analyzers to choose a normalization method deliberately and carefully. Method Selection Based on Group Entropy. For more accurate and convenient selection of different normalization methods (other than intuitive discrimination or naked feelings), we calculated the entropy based on PCA score distance matrix for each group sample with a James-Stein-type shrinkage estimator in MetaboGroupS. The group entropy took the samples distribution and variation within and across groups into consideration simultaneously. The model would be better with lower entropy within each group and higher entropy across groups. In consequence, the normalization methods that removed unwanted analytical variation and retained the essential ones of interest may outcompete the other methods. However, this was not absolute and should depend on the practical complexity of experimental data sets. Table 2 summarizes the computational results for each group entropy. The entropies from median, standard, VSN, and RUV-random methods were similar in comparison to no normalization, especially in QC samples, which reflected inadequate correction for unwanted signal variations. In contrast, the entropy of QC samples after QC-RLSC normalization was the minimum (2.063) and, moreover, the first two principal components score centers from QC-RLSC normalization data were much further apart from each group sample than the others (Table S5).

Table 2. Entropy Results of Each Group of Samples Deduced from Data Matrix Processed with Seven Normalization Methods entropies methods

group A

group B

group C

QC

none median standard VSN RUV-random SVR EigenMS QC-RLSC

2.632 2.440 2.858 2.516 3.090 2.582 3.091 3.091

3.028 2.639 2.958 2.839 3.082 3.028 2.978 3.091

2.673 2.557 2.719 2.715 2.833 2.749 2.833 2.833

2.303 2.303 2.303 2.303 2.297 2.303 2.257 2.063

In addition, the CV for entropy in QC samples with respect to the other groups can provide a comprehensive recommendation method, as illustrated in Figures 4 and S7. The minimal CV from the QC-RLSC method for our experimental data set was marked with a diamond symbol, which also prompted us to choose the QC-RLSC normalization method for this clinical sample data set to process the subsequential analysis. For the two public data sets, the recommendatory normalization methods were also presented to users (Figure S7), whereas the optimal methods were not always same, which further indicated that there was not one gold method for all kinds of data sets and users should adjust the normalization method on the basis of their own data sets. Noticeably, the whole computation analysis process of the largest-scale one among the three test data sets was completed in approximately 5 min (Table S6), which signified that this tool was not timeconsuming as well. 11128

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry



Table S4: The remaining features after removing those with missing value rates above 0.5 and coefficient of variation above 0.3 (XLSX)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Shisheng Wang: 0000-0001-5812-3941 Hao Yang: 0000-0003-2214-9474 Meng Gong: 0000-0002-1203-5820 Author Contributions §

S.W. and X.C. contributed equally. All authors have given approval to the final version of the manuscript. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the Science and Technology Department of Sichuan Province (No. 2017HH0036 and No. 2018HH0028) and the National Natural Science Foundation of China (Grant No. 81102366). Particularly, we thank Dr. Chengpin Shen for sponsoring and configuring the spare network server. The logo for MetaboGroupS is copyrighted, and permission was obtained for the use of the logo in this publication.

Figure 4. Coefficient of variation of entropy in QC samples with respect to the other groups for the no normalization (original) and seven normalization methods. The minimum is shown with a diamond symbol.





CONCLUSIONS In this work, we developed a free, user-friendly, and powerful web platform, MetaboGroupS, to automatically select an appropriate normalization method on the basis of group entropies for LC-MS/MS-based metabolomics data analysis as well as NMR data calculation.43 The entire process of group entropy computation can reveal the difference and effectiveness of various normalization methods on complicated sample data, especially clinical data. Additionally, other OMICS data such as proteomics and genomics can also be analyzed homoplastically by this software because of the similarity of these data structures and their complexity. Overall, MetaboGroupS is easy-to-use and time-saving for scientists or clinicians who are non-OMICS specialists for data analysis and worth promoting for miscellaneous applications such as drug discovery and biomarker identification.



REFERENCES

(1) Zhao, X.; Zeng, Z.; Chen, A.; Lu, X.; Zhao, C.; Hu, C.; Zhou, L.; Liu, X.; Wang, X.; Hou, X.; Ye, Y.; Xu, G. Anal. Chem. 2018, 90, 7635. (2) Weckwerth, W. Annu. Rev. Plant Biol. 2003, 54, 669−689. (3) Gu, H.; Carroll, P. A.; Du, J.; Zhu, J.; Neto, F. C.; Eisenman, R. N.; Raftery, D. Angew. Chem., Int. Ed. 2016, 55, 15646−15650. (4) Mikolas, E.; Kun, S.; Laczy, B.; Molnar, G. A.; Selley, E.; Koszegi, T.; Wittmann, I. Kidney Blood Pressure Res. 2013, 38, 217−225. (5) Fessenden, M. Nature 2016, 540, 153−155. (6) Dunn, W. B.; Broadhurst, D.; Begley, P.; Zelena, E.; FrancisMcIntyre, S.; Anderson, N.; Brown, M.; Knowles, J. D.; Halsall, A.; Haselden, J. N.; Nicholls, A. W.; Wilson, I. D.; Kell, D. B.; Goodacre, R. Nat. Protoc. 2011, 6, 1060−1083. (7) Chen, J.; Zhang, P.; Lv, M.; Guo, H.; Huang, Y.; Zhang, Z.; Xu, F. Anal. Chem. 2017, 89, 5342−5348. (8) Cambiaghi, A.; Ferrario, M.; Masseroli, M. Briefings Bioinf. 2017, 18, 498−510. (9) Wehrens, R.; Hageman, J. A.; van Eeuwijk, F.; Kooke, R.; Flood, P. J.; Wijnker, E.; Keurentjes, J. J.; Lommen, A.; van Eekelen, H. D.; Hall, R. D.; Mumm, R.; de Vos, R. C. Metabolomics 2016, 12, 88. (10) Sanchez-Illana, A.; Pineiro-Ramos, J. D.; Sanjuan-Herraez, J. D.; Vento, M.; Quintas, G.; Kuligowski, J. Anal. Chim. Acta 2018, 1019, 38−48. (11) Gagnebin, Y.; Tonoli, D.; Lescuyer, P.; Ponte, B.; de Seigneux, S.; Martin, P. Y.; Schappler, J.; Boccard, J.; Rudaz, S. Anal. Chim. Acta 2017, 955, 27−35. (12) De Livera, A. M.; Dias, D. A.; De Souza, D.; Rupasinghe, T.; Pyke, J.; Tull, D.; Roessner, U.; McConville, M.; Speed, T. P. Anal. Chem. 2012, 84, 10768−10776. (13) Veselkov, K. A.; Vingara, L. K.; Masson, P.; Robinette, S. L.; Want, E.; Li, J. V.; Barton, R. H.; Boursier-Neyret, C.; Walther, B.; Ebbels, T. M.; Pelczer, I.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Anal. Chem. 2011, 83, 5864−5872. (14) Shen, X. T.; Gong, X. Y.; Cai, Y. P.; Guo, Y.; Tu, J.; Li, H.; Zhang, T.; Wang, J. L.; Xue, F. Z.; Zhu, Z. J. Metabolomics: Official journal of the Metabolomic Society 2016, 12, 89. (15) Dudzik, D.; Barbas-Bernardos, C.; Garcia, A.; Barbas, C. J. Pharm. Biomed. Anal. 2018, 147, 149−173.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.8b03065. UPLC-QTOF-MS parameters; graphical user interface of MetaboGroupS; distribution of missing value in the experimental data; distribution of coefficient of variation counted from the features in the metabolomics data; boxplots; comparison of various normalization methods; coefficient of variation of entropy in QC samples; the introduction and implementation of data normalization methods; comparison of the first two principle components score center from various normalization methods; time consumption of the entire experimental data analysis on two servers; additional notes (PDF) Table S1: Sample information (XLSX) Table S3: Peaks extraction from Progenesis QI software (XLSX) 11129

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130

Article

Analytical Chemistry (16) Li, B.; Tang, J.; Yang, Q.; Li, S.; Cui, X.; Li, Y.; Chen, Y.; Xue, W.; Li, X.; Zhu, F. Nucleic Acids Res. 2017, 45, W162−W170. (17) Manimaran, S.; Selby, H. M.; Okrah, K.; Ruberman, C.; Leek, J. T.; Quackenbush, J.; Haibe-Kains, B.; Bravo, H. C.; Johnson, W. E. Bioinformatics 2016, 32, 3836−3838. (18) Xia, J.; Wishart, D. S. Nat. Protoc. 2011, 6, 743−760. (19) Gilson, M.; Kouvaris, N. E.; Deco, G.; Zamora-Lopez, G. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 2018, 97, 052301. (20) Nemenman, I.; Bialek, W.; de Ruyter van Steveninck, R. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2004, 69, 056111. (21) Saha, A.; Lahiri, S.; Jayannavar, A. M. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2009, 80, 011117. (22) Truong, G. W.; Anstie, J. D.; May, E. F.; Stace, T. M.; Luiten, A. N. Nat. Commun. 2015, 6, 8345. (23) Delongchamp, R. R.; Velasco, C.; Razzaghi, M.; Harris, A.; Casciano, D. DNA Cell Biol. 2004, 23, 653−659. (24) Boysen, A. K.; Heal, K. R.; Carlson, L. T.; Ingalls, A. E. Anal. Chem. 2018, 90, 1363−1369. (25) De Livera, A. M.; Sysi-Aho, M.; Jacob, L.; Gagnon-Bartsch, J. A.; Castillo, S.; Simpson, J. A.; Speed, T. P. Anal. Chem. 2015, 87, 3606−3615. (26) Karpievitch, Y. V.; Nikolic, S. B.; Wilson, R.; Sharman, J. E.; Edwards, L. M. PLoS One 2014, 9, e116221. (27) Ihaka, R.; Gentleman, R. Journal of computational and graphical statistics 1996, 5, 299−314. (28) Chang, W.; Cheng, J.; Allaire, J.; Xie, Y.; McPherson, J. R package version 0.11; 2015. (https://cran.r-project.org/package= shiny). (29) Nikolic, S. B.; Wilson, R.; Hare, J. L.; Adams, M. J.; Edwards, L. M.; Sharman, J. E. Metabolomics 2014, 10, 105−113. (30) Sangster, T.; Major, H.; Plumb, R.; Wilson, A. J.; Wilson, I. D. Analyst 2006, 131, 1075−1078. (31) Kirwan, J. A.; Weber, R. J.; Broadhurst, D. I.; Viant, M. R. Sci. Data 2014, 1, 140012. (32) Larsen, F. H.; van den Berg, F.; Engelsen, S. B. J. Chemom. 2006, 20, 198−208. (33) Vu, T. N.; Valkenborg, D.; Smets, K.; Verwaest, K. A.; Dommisse, R.; Lemiere, F.; Verschoren, A.; Goethals, B.; Laukens, K. BMC Bioinf. 2011, 12, 405. (34) Lu, H.; Liang, Y.; Dunn, W. B.; Shen, H.; Kell, D. B. TrAC, Trends Anal. Chem. 2008, 27, 215−227. (35) Zhang, S. Journal of Systems and Software 2012, 85, 2541−2552. (36) Paninski, L. Neural computation 2003, 15, 1191−1253. (37) Hausser, J.; Strimmer, K. J. Mach. Learn. Res. 2009, 10, 1469− 1484. (38) Deza, M. M.; Deza, E. In Encyclopedia of Distances; Springer: New York, 2009; pp 1−583. (39) Little, R. J. J. Am. Stat. Assoc. 1988, 83, 1198−1202. (40) Cheung, D. W.; Lee, S. D.; Xiao, Y. IEEE Trans. Knowl. Data Eng. 2002, 14, 498−514. (41) Altman, D. G.; Bland, J. M. Bmj 1996, 313, 1200. (42) Wen, B.; Mei, Z.; Zeng, C.; Liu, S. BMC Bioinf. 2017, 18, 183. (43) Pan, Z.; Gu, H.; Talaty, N.; Chen, H.; Shanaiah, N.; Hainline, B. E.; Cooks, R. G.; Raftery, D. Anal. Bioanal. Chem. 2007, 387, 539− 549.

11130

DOI: 10.1021/acs.analchem.8b03065 Anal. Chem. 2018, 90, 11124−11130