Cluster Analysis Statistical Spectroscopy Using Nuclear Magnetic

Animal selection and treatment followed the standard COMET protocol, as detailed ..... Nicholson , J. K., Connely , J., Lindon , J. C. and Holmes , E...
0 downloads 0 Views 4MB Size
Anal. Chem. 2009, 81, 6581–6589

Cluster Analysis Statistical Spectroscopy Using Nuclear Magnetic Resonance Generated Metabolic Data Sets from Perturbed Biological Systems Steven L. Robinette, Kirill A. Veselkov, Eszter Bohus,† Muireann Coen, Hector C. Keun, Timothy M. D. Ebbels, Olaf Beckonert, Elaine C. Holmes, John C. Lindon, and Jeremy K. Nicholson* Biomolecular Medicine, Sir Alexander Fleming Building, Division of Surgery, Oncology, Reproductive Biology, and Anesthetics, Faculty of Medicine, Imperial College London, SW7 2AZ, United Kingdom We present a new approach for analysis, information recovery, and display of biological 1H nuclear magnetic resonance (NMR) spectral data, cluster analysis statistical spectroscopy (CLASSY), which profiles qualitative and quantitative changes in biofluid metabolic composition by utilizing a novel local-global correlation clustering scheme to identify structurally related spectral peaks and arrange metabolites by similarity of temporal dynamic variation. Underlying spectral data sets are presented in a novel graphical format to represent high-dimensionality biochemical information conveying both statistical metabolite relationships and their responses to experimental perturbation simultaneously in a high-throughput and intuitive manner. The method is exemplified using multiple 600 MHz 1H NMR spectra of rat (n ) 40) urine samples collected over 160 h following the development of experimental pancreatitis induced by L-arginine (ARG) and a wider range of model toxins including acetaminophen, galactosamine, and 2-bromoethanamine. The CLASSY approach deconvolutes complex biofluid mixture spectra into quantitative fold-change metabolic trajectories and clusters metabolites by commonalities of coexpression patterns. We demonstrate that the developing pathological processes cause coordinated changes in the levels of many compounds which share similar pathway connectivities. Variability in individual responses to toxin exposure is also readily detected and visualized allowing the assessment of interanimal variability. As an untargeted, unsupervised approach, CLASSY provides significant advantages in biological information recovery in terms of increased throughput, interpretability, and robustness and has wide potential metabonomic/metabolomic applications in clinical, toxicological, and nutritional studies of biofluids as well as in studies of cellular biochemistry, microbial fermentation monitoring, and functional genomics. The efficient processing, analysis, and visualization of massive arrays of “omics” data poses a continuous challenge in systems * To whom correspondence should be addressed. E-mail: j.nicholson@ imperial.ac.uk. † Current address: Department of Pharmaceutical Chemistry, Semmelweis University, Ho˝gyes Endre u. 9. Budapest 1092, Hungary. 10.1021/ac901240j CCC: $40.75  2009 American Chemical Society Published on Web 07/22/2009

biology. Even metabolic data sets generated by nuclear magnetic resonance (NMR) or mass spectrometry (MS) methods can contain hundreds or thousands of signals from a diverse variety of biochemical structures and classes which pose many data analysis problems in their own right. We present here a novel approach based on statistical correlation clustering for the investigation of structural and pathway analysis of NMR generated metabolic data that efficiently recovers latent biochemical information and metabolite relationships relevant to understanding complex disease processes. 1 H NMR spectra of biofluids contain information on the metabolic and physiological status of an organism and on how this is altered by a stimulus.1 Qualitative and quantitative analytical profiling of individual metabolite concentration changes in response to biological and chemical stressors is now widely used in toxicological and pharmacological studies to investigate the chemical and biological mechanisms of metabolic perturbation and recovery.2-6 Two distinct approaches are often used to extract useful biochemical information from spectra. Pattern recognition using multivariate statistical tools to identify important spectral components is one approach, and targeted, quantitative profiling to assign molecular identities and concentrations is the other.7 While many highly sophisticated multivariate statistical tools have been successfully applied to NMR-based metabonomic data analysis,8,9 these combine influences of many metabolic perturbations into latent variables which can complicate interpretation of metabolic trajectories in time course experiments, in data sets where the separation between sample groups is unclear, and can mask important interanimal (1) Nicholson, J. K.; Wilson, I. D. Prog. Nucl. Magn. Reson. Spectrosc. 1989, 21, 449–501. (2) Nicholson, J. K.; Connely, J.; Lindon, J. C.; Holmes, E. Nat. Rev. Drug Discovery 2002, 1, 153–161. (3) Kell, D. B. Curr. Opin. Microbiol. 2004, 7, 296–307. (4) van der Greef, J.; McBurney, R. N. Nat. Rev. Drug Discovery 2005, 4, 961– 967. (5) Fiehn, O. TrAC, Trends Anal. Chem. 2008, 27, 261–269. (6) Kwon, Y. K.; Lu, W.; Melamud, E.; Khanam, N.; Bognar, A.; Rabinowitz, J. D. Nat. Chem. Biol. 2008, 4, 602–608. (7) Wishart, D. S. Briefings Bioinf. 2007, 8, 279–293. (8) Bollard, M. E.; Stanley, E. G.; Lindon, J. C.; Nicholson, J. K.; Holmes, E. NMR Biomed. 2005, 18, 143–162. (9) Bylesjo, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J. K.; Holmes, E.; Trygg, J. J. Chemom. 2006, 20, 341–351.

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

6581

response variation.10 Quantitative profiling11-13 by manual interpretation of individual highly complex spectra is often a prerequisite to analysis and clustering of concentration changes and intermetabolite correlations, but spectral interpretation can be time-consuming and potentially nontrivial. Furthermore, quantification by peak integration or line shape fitting or related datareduction methods such as binning can introduce artifacts in the set of assigned signals.14 For hypothesis-generating metabonomic studies15 to be truly explorative and maximally informative, high-throughput methods of investigating all concentration changes and their correlations at once that preserve metabolically relevant variables without any manual intervention would be highly advantageous. Statistical correlation analysis between peak intensities across a set of samples such as exemplified by statistical total correlation spectroscopy (STOCSY16-19) has been shown to provide both high, positive intramolecular connectivities as well as a range of intermolecular correlations,20 forming a basis for chemical structure elucidation and pathway analysis of metabolic regulation. The new method, which we term CLASSY for cluster analysis statistical spectroscopy, enables rapid identification and visualization of such correlation effects directly from spectra by a novel local-global clustering method which resolves the smaller set of structural correlations from the larger global structure of pathway correlations. By providing an approach to discovering structural relationships which uses the locality of high positive statistical correlations, CLASSY represents a significant evolution of previous methods in statistical spectroscopy including generalized 2D correlation spectroscopy,21-23 STOCSY and covariance NMR24-26 deconvolution27 and, to our knowledge, constitutes the first application of clustering to identify structural and metabolic relationships between peaks across multiple NMR spectra. (10) Kourti, T. J. Chemom. 2003, 17, 93–109. (11) Lewis, I. A.; Schommer, S. C.; Hodis, B.; Robb, K. A.; Tonelli, M.; Westler, W. M.; Sussman, M. R.; Markley, J. L. Anal. Chem. 2007, 79, 9385–9390. (12) Fernie, A. R.; Trethewey, R. N.; Krotzky, A. J.; Willmitzer, L. Nat. Rev. Mol. Cell. Biol. 2004, 5, 763–769. (13) Coquin, L.; Feala, J. D.; McCulloch, A. D.; Paternostro, G. Mol. Syst. Biol. 2008, 4, 233. (14) Crockford, D. J.; Keun, H. C.; Smith, L. M.; Holmes, E.; Nicholson, J. K. Anal. Chem. 2005, 77, 4556–4562. (15) Nicholson, J. K.; Holmes, E.; Lindon, J. C.; Wilson, I. D. Nat. Biotechnol. 2004, 22, 1268–1274. (16) Cloarec, O.; Dumas, M. E.; Craig, A.; Barton, R. H.; Trygg, J.; Hudson, J.; Blancher, C.; Gauguier, D.; Lindon, J. C.; Holmes, E.; Nicholson, J. K. Anal. Chem. 2005, 77, 1282–1289. (17) Holmes, E.; Cloarec, O.; Nicholson, J. K. J. Proteome Res. 2006, 5, 1313– 1320. (18) Coen, M.; Hong, Y. S.; Cloarec, O.; Rhode, C. M.; Reily, M. D.; Robertson, D. G.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Anal. Chem. 2007, 79, 8956–8966. (19) Keun, H. C.; Athersuch, T. J.; Beckonert, O.; Wang, Y.; Saric, J.; Shockcor, J. P.; Lindon, J. C.; Wilson, I. D.; Holmes, E.; Nicholson, J. K. Anal. Chem. 2008, 80, 1073–1079. (20) Couto Alves, A.; Rantalainen, M.; Holmes, E.; Nicholson, J. K.; Ebbels, T. M. Anal. Chem. 2009, 81, 2075–2084. (21) Noda, I. J. Am. Chem. Soc. 1989, 111, 8116–8118. (22) Noda, I.; Dowrey, A. E.; Marcott, C.; Story, G. M. Appl. Spectrosc. 2000, 54, 236A–248A. (23) Wang, G.; Geng, L. Anal. Chem. 2005, 77, 20–29. (24) Trbovic, N.; Smirnov, S.; Zhang, F.; Bru ¨ schweiler, R. J. Magn. Reson. 2004, 171, 277–283. (25) Bru ¨ schweiler, R.; Zhang, F. J. Chem. Phys. 2004, 120, 5253–5260. (26) Bru ¨ schweiler, R. J. Chem. Phys. 2004, 121, 409–414. (27) Zhang, F.; Bru ¨ schweiler, R. Angew. Chem., Int. Ed. 2007, 46, 2639–2642.

6582

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

The local clustering algorithm presented here is similar to the DemixC27 algorithm for robust deconvolution of covariance total correlation spectroscopy (TOCSY) spectra; however, rather than using the spectral matrix F(ω1,ω2), the local clustering algorithm takes as an input a binary matrix of peak connectivities. This allows local clusters to be unambiguously identified as selfconsistent sets of correlation-connected peaks and does not rely on an estimation of minimum similarity to determine cluster membership. To demonstrate its use as a tool for highthroughput metabonomic analysis that does not require a priori manual interpretation of spectra, CLASSY was applied to high information density data sets collected on an important experimental model of exocrine pancreatitis caused by L-arginine exposure studied as part of the Consortium for Metabonomic Toxicology, COMET.28,29 The CLASSY method significantly enhances chemical shift assignment and analysis of coregulation, providing an improved biochemical interpretation of metabolic perturbations and revealing complexities of variation in response to chemical stress. CLASSY specifically provides a new approach for identifying biomarkers and investigating metabolic relationships in complex in vivo system failure situations, disease progression, and other temporal changes in biological function that can be monitored noninvasively via metabolic profiling methods. EXPERIMENTAL SECTION Collection of Experimental Data. 1D 1H NMR metabonomic data sets were collected from the urine samples of SpragueDawley rats exposed to the model pancreatic toxin L-arginine (ARG). Animal selection and treatment followed the standard COMET protocol, as detailed elsewhere.29 At the 0 h time point, animals in each group were administered an intraperonital injection of vehicle (0.9% saline) with toxin dosages selected to induce overt toxicity without significant mortality (HD, 4000 mg/kg L-arginine hydrochloride, n ) 20), to induce minimal clinical pathology (LD, 1000 mg/kg L-arginine hydrochloride, n ) 10), or saline vehicle alone (C, n ) 10). In all experimental groups, half of the animals were euthanized at 48 h and the remaining animals at 168 h after dosing. Urine samples were collected at -16, 0, 8, 24, 48, 72, 96, 120, 144, and 168 h postdosing, and 1D 1H NMR spectra were acquired and processed for each collected sample using standard metabonomic procedures.30,31 Spectral Preprocessing and Peak Identification. NMR data (each spectral data point was used) were imported into MATLAB (R2008a, Mathworks, Inc., 2008) following automatic phasing, baseline correction and referencing.30 The regions corresponding to water/HDO (δ 4.7-4.9), trimethylsilypropionate (TSP) (δ -0.1 (28) Lindon, J. C.; Nicholson, J. K.; Holmes, E.; Antti, H.; Bollard, M. E.; Keun, H.; Beckonert, O.; Ebbels, T. M.; Reily, M. D.; Robertson, D.; Stevens, G. J.; Luke, P.; Breau, A. P.; Cantor, G. H.; Bible, R. H.; Niederhauser, U.; Senn, H.; Schlotterbeck, G.; Sidelmann, U. G.; Laursen, S. M.; Tymiak, A.; Car, B. D.; Lehman-McKeeman, L.; Colet, J. M.; Loukaci, A.; Thomas, C. Toxicol. Appl. Pharmacol. 2003, 187, 137–146. (29) Lindon, J. C.; Keun, H. C.; Ebbels, T. M.; Pearce, J. M.; Holmes, E.; Nicholson, J. K. Pharmacogenomics 2005, 6, 691–699. (30) Bohus, E.; Coen, M.; Keun, H. C.; Ebbels, T. M. D.; Beckonert, O.; Lindon, J. C.; Holmes, E.; Noszal, B.; Nicholson, J. K. J. Proteome Res. 2008, 7, 4435–4445. (31) Beckonert, O.; Keun, H. C.; Ebbels, T. M. D.; Bundy, J.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Nat. Protoc. 2007, 2, 2692–2703.

to 0.1), and urea (δ 5.6-6.2) were removed from all spectra. Spectra were then aligned by a recursive segment-wise peak alignment algorithm as previously described32 and normalized using the probabilistic quotient method.33 Peaks are detected at zero crossings of a smoothed spectral derivative calculated on the basis of a Savitzky-Golay third order polynomial filter34 with window size 0.005 ppm, using the mean spectrum calculated for each spectral point. Identifying Local Structural Correlation Clusters. As the various NMR peaks from a given molecule necessarily exhibit high positive statistical correlations which are maximal near the peak apexes and theoretically and practically approach unity for nonoverlapped peaks,20 the locality of such peak correlations is used to identify intramolecular connectivities. To identify local clusters of peaks correlated with each other, a correlation matrix Cij is assembled by evaluating the Pearson correlation coefficient between pairs of peaks i and j as identified by peak picking of the mean spectrum as detailed above. Here, the indices i and j denote peak order running from low to high chemical shift. The correlation matrix Cij is then recursively thresholded at increasing correlation coefficient values, in steps of r × 10-3, to a binary matrix Bij, representing the connectivity (bij ) 1) or nonconnectivity (bij ) 0) of peaks i and j at a given correlation threshold. To identify unambiguous local clusters of correlation-connected peaks within matrix B, an overlap matrix Oij is evaluated as the inner product of transposed row vector biT and column vector bj Oij ) biT · bj

to identify global relationships between all local clusters. Transforming correlation coefficients into correlation distances by the following objective function generates pairwise distances between each local cluster.

Dij )

1 - Cij 2

While other distance metrics, such as Euclidean distance, topological overlap, or the square of correlations also generate pairwise distance values, this formulation preserves the effect of the sign by defining correlation coefficients of 1 and -1 as correlation distances of 0 and 1, respectively, resulting in grouping of positive correlations. Global hierarchical clustering is achieved by an average linkage algorithm that interprets pairwise distances to assemble a hierarchical tree resulting in reordered data tables. While other clustering algorithms such as biclustering35 and optimized clustering36 also generate global clusters, hierarchical clustering is a simple yet flexible means of grouping local clusters which is familiar and widely used in many fields of biology. Visualization of Metabolic Perturbation Profiles. (Log) Fold-change measurements are routinely used in microarray analysis to demonstrate differential expression of large sets of gene transcripts.37 Here, fold changes of clustered peak intensities are used to identify and intuitively represent the molecular bases of experimental metabolic perturbation. Spectral information is represented by assembling a log fold-change matrix defined as

(1) Rmj ) log

and thus Oij represents the size of the intersection of the set of peaks correlation-connected with peak i with the set of peak j. Unambiguous clusters of peaks each correlation-connected to all local peaks and to no outside peaks can be identified as rows in Oij which have n occurrences of value n, where n is the number of peaks connected to both the ith and jth peaks. To identify these local clusters, rows of Oij are summed to form a vector with elements Pj. Pj )

∑O

ij

(2)

i

Elements of Pj with value n2 indicate peak j is a member of a cluster with n nodes. All members of this cluster are identified as the n peaks i which are correlation-connected to peak j. As peak clusters are discovered, the algorithm removes the component peaks to reduce the complexity of the remaining correlation matrix and simplify identification of subsequent peak clusters. This process continues until the correlation threshold reaches unity and all edges of connectivity between peaks decompose. Global Hierarchical Clustering. Once unambiguous local clusters are discovered, a hierarchical clustering algorithm is used (32) Veselkov, K. A.; Lindon, J. C.; Ebbels, T. M. D.; Crockford, D.; Volynkin, V. V.; Holmes, E.; Davies, D. B.; Nicholson, J. K. Anal. Chem. 2009, 81, 56–66. (33) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Anal. Chem. 2006, 78, 4281–4290. (34) Wong, J. W. H.; Cagney, G.; Cartwright, H. M. Bioinformatics 2005, 21, 2088–2090.

(3)

Smj Soj

(4)

where Smj is the peak intensity in spectrum m at the position of peak j in the median spectrum So. As the matrix R shares a second dimension with the correlation matrix C, the elements in the structurally clustered correlation matrix can be traced to columns of R to assign relative concentration changes to both an individual sample and to yield chemical identity. Visualization of correlation and fold-change heatmaps are coinformative and represent the chemical bases of experimental metabolic perturbation. RESULTS Local Clustering of Individual Metabolites. Though peaks arising from the same molecular cluster are dispersed on the chemical shift axis as illustrated for assigned metabolites in Figure 1A, local clustering as described in the Experimental Section groups these structurally related peaks and preserves their individual chemical shifts. The most basic element of the clustered correlation matrix is the grouping of local peak clusters which are shown as connected nodes in Figure 1B and represented in the resulting data correlation matrix as boxed on-diagonal block elements (Figure 1C), the size of which is determined by the complexity of the structure’s NMR spectrum. Separation of (35) Cheng, Y.; Church, G. M. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2000, 8, 93–103. (36) DiMaggio, P. A., Jr.; McAllister, S. R.; Floudas, C. A.; Feng, X. J.; Rabinowitz, J. D.; Rabitz, H. A. BMC Bioinf. 2008, 9, 458. (37) Allison, D. B.; Cui, X.; Page, B. P.; Sabripour, M. Nat. Rev. Genet. 2006, 7, 55–65.

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

6583

Figure 1. Clustering of peaks identified in control spectra demonstrates the information flow of CLASSY NMR analysis. Figure 1A shows spectral assignments for 10 commonly observed high-concentration metabolites present in control rat urine. While some structures, such as succinate, give rise to a single NMR peak, others such as hippurate, show more complex spectra with multiple peaks and peak splittings (fine structure) dispersed across the chemical shift axis. Each peak apex is identified and represented as a node by the local clustering algorithm. Figure 1B shows the local correlation clusters corresponding to each molecular structure as identified by CLASSY. Self-consistent sets of nodes corresponding to the number of well-resolved peak apexes for each structure form local clusters when all “in-cluster” correlations are higher than all “out-cluster” correlations. These local clusters are then related to each other by global hierarchical clustering (C). Local clusters are differentiated from global clusters by enclosure within diagonal blocks, resulting in a correlation block matrix. Local clusters are assigned to a chemical structure by relating the chemical shifts and fine structure of each cluster to the compound’s NMR spectrum. Abbreviations: NMNA, N-methyl nicotinate; NMND, N-methyl nicotinamide; HIP, hippurate; 4-CG, 4-cresol glucuronide; 3-HPPA, 3-(3-hydroxyphenyl)propionic acid; PAG, phenylacetylglycine; 2-OG, 2-oxoglutarate; Cit, citrate.

intramolecular and intermolecular correlations is achieved by arranging intramolecular local clusters on-diagonal and correlations between local clusters off-diagonal, resulting in a correlation block matrix. For partially overlapped peaks or peaks in crowded regions such as the aliphatic (δ ) 0.8-1.5), sugar (δ ) 3.5-4.5), and aromatic regions (δ ) 7-8), it is not uncommon to split a set of structurally related peaks into a small number of local clusters which can then be related by global hierarchical clustering. This normally does not hinder interpretation, as local clusters relate multiple chemical shifts, providing a metabolite signature. Though metabolites can be significantly coregulated, especially as a result of experimental stress, individual metabolites are generally able to be distinguished based on small differences in expression patterns, though the number and time resolution of collected samples limit the ability to resolve highly correlated metabolites, such as L-arginine and ornithine, as seen in Figure 2. Sets of correlated peaks provide an input for database matching algorithms which automate assignment of chemical shifts to molecular structures, usage of which can further streamline the 6584

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

process of metabonomic analysis.38,39 Identification of metabolite signature clusters aids mixture profiling by grouping structurally related sets of chemical shifts, regardless of molecular identity. Because spectral peak picking is inherently untargeted, correlation profiling is also useful for identifying peaks from previously unassigned endogenous metabolites and from xenobiotic metabolites as well as the standard set of endogenous compounds. Clustering of Metabolites with Similar Physiological Roles by Toxicity Processes. In multianimal time course toxicity or experimental pathology studies, statistical correlation coefficients reflect the degree to which metabolite concentrations covary through time and across individuals. While local clustering identifies highly related intrastructure peaks, larger groups of clustered metabolites tend to be connected by metabolic pathways or, more often, to function in similar metabolic roles. This is particularly noticeable in cases such as ARG toxicity, where administration of large doses causes elevation of the pathway(38) Robinette, S. L.; Zhang, F.; Bruschweiler-Li, L.; Bru ¨schweiler, R. Anal. Chem. 2008, 80, 3606–3611. (39) Xia, J.; Bjorndahl, T. C.; Tang, P.; Wishart, D. S. BMC Bioinf. 2008, 9, 507.

Figure 2. The mean 600 MHz 1H NMR spectrum (A) and CLASSY clustered correlations (B) from ARG high-dose rat urine samples. While local clustering isolates structurally connected peaks for a large number of compounds present in the 1H NMR spectra, local clustering is complicated by spectral overlap/crowding and the time resolution of sample collection. Peaks in crowded regions, such as glucose, can be split into several local clusters, while molecules which appear in significant quantities in a single time point like ARG and ornithine are difficult to distinguish by statistical correlations. Global clustering identifies three sets of correlated metabolites: those immediately affected by ARG, microbial cometabolites, and endogenous compounds like TCA cycle metabolites. Abbreviations: NMNA, N-methyl nicotinate; HIP, hippurate; PAG, phenylacetylglycine; 4-CG, 4-cresol glucuronide; 4-CS, 4-cresol sulfate; Glc, glucose; ARG, L-arginine; Orn, ornithine; 2AA, 2-amino adipate; 2-OG, 2-oxoglutarate; Cit, citrate; Ala, alanine; Lac, lactate; Val, valine.

connected metabolites ornithine and branched chain amino acids, inducing significant correlation resulting in the formation of pathway related clusters. To represent the underlying metabolic perturbation profiles resultant from development of L-arginine toxicity, peak fold-change intensities relative to the median (on the horizontal axis) are plotted by spectrum (on the vertical axis) as a color-coded heatmap (Figure 3). Visualization of the fold-change matrices allows intuitive analysis of metabolites perturbed by toxic stress. Figure 3 shows the metabolic perturbation profiles of the ARG HD group following the experimental time course. These metabolic profiles demonstrate the development of toxicity resulting in changes to urinary profiles encompassing multiple metabolites on distinct time scales reflecting immediate L-arginine metabolism and subsequent pathology. Because of the fact that nearly all ARG pathway related changes are visible at only the 8 h time point (Figure 3), correlation coefficients group ARG and its first direct metabolite,

ornithine, into a single local cluster. Though not pathway connected to the toxic compound, toxic stress affects multiple sets of physiologically related metabolites similarly. The clustering of gut microbial cometabolites phenylacetylglycine, 4-cresol sulfate, and 4-cresol glucuronide indicate substantial interaction between the gut microbiome and the host toxic processes as seen in Figure 3. A commonly observed effect of toxicity is the decrease in levels and subsequent clustering of TCA intermediates related to metabolic acidosis and a drop in urinary pH, as is demonstrated with three additional COMET experiments shown in Figure 4. Interestingly, the microbial cometabolite hippurate clusters with endogenous decreasing metabolites (see Figure 4). Exceptions to clustering by physiological relationships may indicate important alternate roles for these metabolites, such as a dietary modulation of hippurate excretion. Global patterns of intermolecular statistical correlations also vary in response to toxic stress, and while assigning biological correlations Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

6585

Figure 3. Fold-change profiles of metabolites in the ARG data set relative to predose levels through the experimental time-course represent the major metabolic effects and variable time scales of ARG response and developing pathology. Each pixel represents the fold-change of a single peak apex (columns) in a single spectrum (rows). These peaks are assigned to chemical structures (top) and global clustering groups sets of coregulated structures (bottom) which can be interpreted in terms of response relationships. Abbreviations: NMNA, N-methyl nicotinate; HIP, hippurate; PAG, phenylacetylglycine; 4-CG, 4-cresol glucuronide; 4-CS, 4-cresol sulfate; Glc, glucose; ARG, L-arginine; Orn, ornithine; 2-OG, 2-oxoglutarate; Cit, citrate; Ala, alanine; Lac, lactate; Val, valine; 2AA, 2-amino adipate; Fum, fumarate; Crn, creatine.

to a proximate physiological cause must take experimental design into account, changes to correlation profiles generally contain information about a global metabolic response to stress and these patterns might hint at changes in basic metabolic regulation and organization. Significant changes in correlations, such as sign changes, are more readily interpreted and suggest major differences in regulation. A particularly dramatic example is the change in glucose-arginine correlations from negative in the LD group to positive in the HD group resulting from a shift in glucose uptake to glucose excretion in response to ARG challenge (Figure 5). Characteristic LD glucose uptake is mirrored by nonresponders in the HD group, indicating that nonresponders retain enough pancreatic function to control glucose metabolism. Identification of Variable Responder Groups to L-Arginine Toxicity. Recognizing and interpreting the metabolic differences between responder groups to drug or toxin dosing is critical to pharmaco-metabonomics40 and the translation of biofluid profiling to personalized health care. However, when the metabolic effects of a specific chemical stressor are studied, this important individual variation in metabolite levels can be hidden when modeled as confounding biological background noise. Visualization of all fold-change perturbation profiles presents simultaneously conserved effects of toxicity and individual differences in response. While many of the metabolic effects of ARG dosing, including massive excretion of ARG and ornithine, increases in lactate, alanine, and valine, and decreases in endogenous TCA intermediates are visibly conserved between animals, obvious variation is present in the extent and timing of ARG and ornithine excretion as well as changes in levels of gut microbial cometabolites. Animals resistant to ARG induced pancreatitis show idiosyncratic patterns of metabolic perturbation in comparison to re(40) Clayton, A. T.; Lindon, J. C.; Cloarec, O.; Antti, H.; Charuel, C.; Hanton, G.; Provost, J. P.; Le Net, J. L.; Baker, D.; Walley, R. J.; Everett, J. R.; Nicholson, J. K. Nature 2006, 440, 1073–1077. (41) Rakonczay, Z., Jr.; Hegyi, P.; Dosa, S.; Ivanyi, B.; Jarmay, K.; Biczo, G.; Hracsko, Z.; Varga, I. S.; Karg, E.; Kasraki, J.; Varro, A.; Lonovics, J.; Boros, I.; Gukovsky, I.; Gukovskaya, A. S.; Pandol, S. J.; Takacs, T. Crit. Care Med. 2008, 36, 2117–2127.

6586

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

sponders visible in their fold-change urinary profiles. These animals show a prolonged excretion of ARG in comparison to normal responders visible as the first two profiles at the 24 h time point, shown in Figure 5, while their remaining metabolites’ foldchange profiles return to predose excretion levels more quickly than normal responders. A potentially important observation of nonresponder urinary profiles is the reduced excretion of ornithine and glucose at the 8 h time point relative to responder levels. These levels approximate those observed in the low-dose group, and the decrease in glucose levels relative to control at 8 h in low-dose animals and weak responders is in direct contrast to increases in high-dose responders, suggesting differential regulation of glucose metabolism resulting from time delay in ARG metabolism (Figure 5). Recent studies of ornithine toxicity have suggested that ornithine may at least in part be responsible for ARG induced pancreatitis,41 a conclusion that is supported here by the uniformly lower levels of ornithine in weak responders and that suggests ornithine as a sensitive and specific marker of the extent of ARG induced pancreatic injury. Analysis of low- and high-dose groups generally indicates conservation in the identities of perturbed metabolites and a dosedependent relationship with the extent of the changes. However, appearance of new significantly perturbed metabolites in the highdose group of animals may be important to differentiating markers of acute pancreatic damage from general metabolic processes of toxin metabolism and recovery. Stable and reproducible results across a set of experimental toxins with a range of metabolic effects underpin the utility of visualizing fold-change profiles to analyzing and interpreting the results of metabonomic experiments. Fold-changes of 4-cresol sulfate, 4-cresol glucuronide, and phenylacetylglycine on an animal by animal basis also demonstrate variation in the extent and timing of perturbation to the symbiotic microbiome. Highly variable responses to dosing is evident in the fold-changes of these surrogate markers of toxicity and manifest as banding of coloration representing differential metabolic perturbation

Figure 4. Fold-change profiles generated by CLASSY of metabolites in rat urinary profiles analyzed as part of three additional COMET 1 experiments: acetaminophen (A), galactosamine (B), and 2-bromoethanamine (C) treatment in the rat model. Each image shows both the metabolism of the administered compounds as well as endogenous consequences of toxin metabolism and of developing pathology. As before, each pixel represents the fold-change of a single peak apex (columns) in a single spectrum (rows) relative to the median peak intensity in predose spectra. These peaks are assigned to chemical structures (top) and global clustering groups sets of coregulated structures (bottom) which are interpretable as metabolic themes of toxicity experiments including toxin metabolism, markers of pathology, and commonly observed changes in energy and dietary metabolites. Abbreviations: APAP, acetaminophen; APAPG, acetaminophen glucuronide; APAP-MA, acetaminophen mercapturic acid; APAPS, acetaminophen sulfate; BEA, 2-bromoethanamine; Cit, citrate; FC, fold change; Fum, fumarate; GalN, Galactosamine; Glc, glucose; GTA, glutaric acid; HIP, hippurate; Ile, isoleucine; Lac, lactate; Leu, leucine; Lys, lysine; NMNA, N-methyl nicotinate; NMND, N-methyl nicotinamide; 2-OG, 2-oxoglutarate; RC, and related compounds; Tau, taurine; TA, trans-aconitate; UC, urocanic acid; Val, valine. Analytical Chemistry, Vol. 81, No. 16, August 15, 2009 6587

Figure 5. High-dosed animals that do not develop pancreatitis (blue) show arginine-glucose-ornithine excretion patterns different from responding animals (red) and similar to low-dose animals. Nonresponders to ARG show lower levels of glucose and ornithine in their foldchange (top) and spectral (bottom) profiles. Abbreviations: Glc, glucose; Arg, L-arginine; Orn, ornithine; LD, low-dose group (n ) 10); HD, high-dose group animals with urinary profiles at 8 and 24 h (n ) 16).

Figure 6. High-dose animals also show variability in microbial cometabolites indicating differences in the extent and timing of perturbations to the symbiotic microbiome as a secondary consequence of pathology. Here, three well-resolved peaks of 4-cresol glucuronide are used as markers of microbial cometabolism for all high-dose group animals with complete time course urinary samples (n ) 9). Subgroups of animals with similar microbiotic responses are visible in spectral profiles (left), but clustering animals by 4-cresol glucuronide fold-change excretion trajectories quantifies these relationships (right). While all animals’ urinary profiles show increases in 4-cresol glucuronide levels at 48 h postdosing, weak responders (blue) show smaller increases at 48 h and strong responders (red) and an outlying prolonged strong responder show elevated levels. Subgroups of strong responders are also visible, including slow responders (dark red) which show increased 4-cresol glucuronide through 144 h and fast (yellow) responders that show early 4-cresol glucuronide excretion at 24 h and subsequent reduction by 72 h.

within a single time point (Figure 3). By reorganizing individual metabolite trajectories by animal across each time-point, variability in the time course excretion of response markers becomes evident (Figure 6). Thus, Figure 6 shows how 4-cresol glucuronide excretion trajectories can be used as a marker by which to cluster weak and strong responders. Here, weak responders (blue) show decreased levels of 4-cresol glucuronide peaks, while strong responders (red) and an outlying prolonged strong responder show elevated levels. Subgroups of strong responders are also visible, including slow (dark red) and fast (yellow) responders. 4-Cresol excretion shows subtle variation within the high-dose group, suggesting differential re6588

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

sponses of the host and microbiome to ARG toxicity which are not readily observed at the macroscopic level. DISCUSSION AND CONCLUSIONS Data analysis, visualization, and interpretation continue to present challenges in metabolic profiling and other “omics” disciplines. Here, we have outlined a novel metabolic profiling scheme using CLASSY which increases the throughput, interpretability, and robustness of 1H NMR analysis in comparison to manual targeted profiling and pattern recognition by dimensionality reduction. CLASSY increases throughput by identify-

ing correlated peaks and profiling metabolite concentration changes directly from sets of NMR spectra in an unsupervised manner. Though peak assignment is still required, the CLASSY approach allows users to focus on metabolites significantly modulated by experimental stressors. The novel heatmap presentation of metabolite trajectories increases the interpretability of NMR data sets by grouping structurally related peaks and clustering metabolites by coregulation. Because CLASSY considers all observable spectral peaks, it should be robust to significant changes in metabolite identity and concentration and applicable to a wide range of biochemical studies. Though spectral overlap will continue to pose a problem for all NMRbased metabolic profiling methods using both 1D and 2D spectra and because the strategy presented here considers peak apexes rather than the entire line shape, spectral overlap in which individual peaks are discernible poses a less significant problem than it would for methods which rely on binning or curve resolution. Even in cases of extreme overlap where individual peaks are not identifiable, nonoverlapping peaks still constitute a metabolite fingerprint for which correlations and fold-changes can be observed. The application of CLASSY to 1H NMR spectra of biofluids collected following ARG induced pancreatic damage demonstrates the utility of this method for visualizing and interpreting the results of metabonomic studies covering diverse aspects of experimental pathology with consequent wide variation in biochemical processes and biomarker identities. While an experimental pathology study was chosen for this demonstration, the profiling techniques presented here should be generally applicable to any set of 1D NMR spectra in which signal covariation contains relevant chemical and biological information. The local clustering algorithm for selectively identifying sets of structurally related NMR spectral peaks enables the deconvolution of complex 1D 1H NMR spectral profiles into a set of related chemical shifts which can be subsequently assigned using database matching algorithms. This allows on one hand to assign chemical structures (“in-clusters”) and on the other hand to identify links between groups of metabolites (“out-clusters”) that are pathway-related and either syn- or anticorrelated. This local clustering method could also be applied to extract sets of correlated chemical shifts in covariance TOCSY spectra by peak-picking and using crosspeak position to generate a binary connection matrix equivalent to the correlation connection matrix presented in this paper. The basic assumption of CLASSY is that a given data set contains highly similar local clusters in which all “in-cluster” distances are lower than “out-cluster” distances. As this is often the case with statistical correlations between structurally related 1H NMR peaks, STOCSY-based profiling methods should significantly increase the utility of 1D 1H NMR for studies of small molecules in complex biological mixtures. While STOCSY is particularly useful for interpreting the 1H NMR spectra of biofluids as these spectra are often dominated by simple molecules which do not show extensive spin couplings or

nuclear Overhauser effects as in 2D NMR spectra, the method presented here is a nontargeted, unsupervised approach which is equally capable of identifying novel metabolite signatures as well as the standard set of endogenous metabolites and should be equally applicable to nonmammalian, nonbiofluid metabolic profiling such as metabolic fingerprinting of single cell organisms, fungi, or invertebrates. The representation of NMR spectral data by statistical correlation and fold-change heatmaps should also facilitate the application of many mathematical and computational methods developed for other “omics” data sets such as from transcription arrays to metabonomics and spectroscopy.37,42 We also suggest that matrices of peak intensities should be amenable to coprocessing with array data to provide information about the relationship between gene expression and downstream metabolism. Just as clustering and visualization of microarray data sets substantially increased the interpretability of gene expression data by biologists, increased interpretability of NMR derived data should make metabonomic studies more accessible to a wider audience. While statistical correlations in time course studies of toxicity using multiple animals reflect mainly the influence of variation through time in response to metabolic perturbation, there is a growing recognition that the topology of intermolecular correlations contains information at least as important as measures of concentration changes alone.43-45 Metabonomic studies in which variation is controlled and not significantly modulated by external stress should provide statistical correlations that are both informational and potentially relatable to the architecture of metabolic networks. We expect that NMR based studies of metabolism will significantly benefit from CLASSY pattern analysis in terms of speed, efficiency, interpretability, and completeness, hence enhancing the application of this platform to in vivo systems biology and a wide range of metabolic profiling applications including identification of small molecules with new biological activities, investigations of host-pathogen chemical interactions, analysis of the metabolic consequences of mutations, and potentially clinical monitoring of biofluids and prediction of idiosyncratic drug response in humans. ACKNOWLEDGMENT Steven L. Robinette acknowledges financial support from the Barry M. Goldwater Foundation and the Howard Hughes Medical Institute. We acknowledge the support of the Imperial College MRC-HPA Centre for Environment and Health for this and related projects. Received for review June 8, 2009. Accepted July 14, 2009. AC901240J (42) Eisen, M. B.; Spellman, P. T.; Brown, P. O.; Botstein, D. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 14863–14868. (43) Steuer, R.; Kurths, J.; Fiehn, O.; Weckwerth, W. Bioinformatics 2003, 19, 1019–1026. (44) Steuer, R. Briefings Bioinf. 2006, 7, 151–158. (45) Camacho, D.; de la Fuente, A.; Mendes, P. Metabolomics 2005, 1, 53–63.

Analytical Chemistry, Vol. 81, No. 16, August 15, 2009

6589