Anal. Chem. 2006, 78, 3551-3561
Metabolite Projection Analysis for Fast Identification of Metabolites in Metabonomics. Application in an Amiodarone Study Frank Dieterle,† Alfred Ross,† Go 1 tz Schlotterbeck, and Hans Senn*
Pharmaceuticals Division, F. Hoffman La Roche Ltd., PRBD-E, Building 065/512, 4070-Basel, Switzerland
The assignment of significantly changed NMR signals, which were identified with the help of multivariate models, to individual metabolites in biofluids is a manual and tedious task requiring knowledge in chemometrics and NMR spectroscopy. Metabolite projection analysis, introduced in this work, allows automatic linking of multivariate models with metabolites by skipping the level of manual NMR signal identification. The method depends on the projection of sets of metabolite NMR spectra from a database into PCA or PLS models of NMR spectra of biofluid samples. Metabolites that are significantly changed can be identified graphically in metabolite projection plots or numerically as projected virtual concentration. The method is demonstrated together with a newly introduced algorithm for refined nonequidistant binning using a metabonomics study with amiodarone as administered drug. Amiodarone can induce phospholipidosis in the lung and liver, which is accompanied by associated organ toxicity in these organs. It is shown how metabolite projection analysis allows easy and fast tentative assignment of all structures of metabolites whose concentrations in the urine samples significantly changed upon dosage. These metabolites had also been identified previously by manually interpreting the multivariate models and spectra. Among these metabolites, phenylacetylglycine was also identified as being significantly increased. This metabolite has recently been proposed as urinary biomarker for phospholipidosis. Recently, metabonomics, also referred as metabolomics1 or metabolic profiling,2 has gained increasing interest as a fast and reproducible method directly reflecting biological events and end points. The method involves the determination of changes of concentration levels of small endogenous metabolites in biosamples due to physiological stimuli or genetic modification.3 Metabonomics is an interdisciplinary research discipline, as * Corresponding author. E-mail:
[email protected]. Tel: +41(0)61 6882028. Fax: +41 (0)61 6887408. † These authors contributed equally to this work. (1) Fiehn, O. Plant Mol. Biol. 2002, 48, 155-171. (2) Trethewey, R N.; Krotzky, A. J.; Willmitzer, L. Curr. Opin. Plant Biol. 1999, 2, 83-85. (3) Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica 1999, 29, 11811189. 10.1021/ac0518351 CCC: $33.50 Published on Web 04/21/2006
© 2006 American Chemical Society
experts with different scientific backgrounds such as toxicologists, physicians, spectroscopists, chemometricians, and biochemists are needed. The high number of different scientific experts involved, and the lack of dedicated software for the assignment of metabolites, is often a bottleneck for metabonomics in practice. In this publication, metabolite projection analysis (MPA) is introduced. This method provides a fast and automatic way to identify significantly changed metabolites in a set of NMR spectra taken on body fluids of treated and control animals. The power of the method is that it uses well-established and readily available techniques such as PCA or PLS.4,5 For MPA, these techniques are applied in a way hitherto not described in the metabonomics literature. Established software packages such as AMIX,6 SIMCAP,7 or The Unscrambler8 can be used. The application of MPA is straightforward. No in-depth knowledge of data analysis, chemometrics, or NMR spectroscopy is needed, rendering a first-level analysis of spectra of a metabonomic study feasible for any trained scientist. The method is demonstrated using a metabonomic NMR study that deals with drug-induced phospholipidosis (PL). PL describes histological changes due to accumulation of phospholipids in tissues and is characterized by the presence of multilamellar bodies within the lysosomes of alveolar macrophages.9,10 Druginduced PL is reversible, but may cause associated organ toxicity. Compounds inducing PL are often cationic amphiphilic drugs (CADs). Typical structures of CADs contain a hydrophobic ring structure and an aliphatic amine group, which is positively charged at physiological pH. Due to their amphiphilic nature, such structures often occur in pharmaceutical drugs, as such compounds can easily cross biological membranes. Drug-induced PL is therefore an issue in pharmaceutical drug development. In this study, the CAD investigated was amiodarone, whose structure is shown in Figure 1. Amiodarone is a drug for the treatment of (4) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst. 1987, 2, 3752. (5) Martens, H.; Naes, T. Multivariate calibration, 1st ed.; Wiley: New York, 1989. (6) Neidig, K. P. Amix-Viewer & Amix Software Manual; Bruker BioSpin GmbH, Rheinstetten, Germany, 2005. (7) Umetrics AB. User’s Guide to SIMCA-P, SIMCA-P+, Version 11; Umetrics AB.; Umea, Sweden, 2005. (8) CAMO ASA. The Unscrambler User Manual, Version 9.2, CAMO ASA: Oslo, Norway, 2005. (9) Reasor, M. J.; Ogle, C. L.; Walker, E. R.; Kacew, S. Am. Rev. Respir. Dis. 1988, 137, 510-518. (10) Kodavanti, U. P.; Mehendale, H. M. Pharmacol. Rev. 1990, 42, 327-354.
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006 3551
Figure 1. Structure of amiodarone (left) and phenylacetylglycine (right).
life-threatening tachycardia. Unfortunately, amiodarone is a potent inhibitor of the lysosomal phospholipase A1, leading to an accumulation of phospholipids in lung and liver tissue. The most pronounced associated toxic side effects are a mild to severe liver toxicity and pulmonary fibrosis. Currently, PL is detected by light or electron microscopy of cells in tissues or lymphocytes. This diagnosis is invasive, nonquantitative, time-consuming, and expensive. Recently, a biomarker in urine for PL has been proposed. The concentration level of phenylacetylgycine (PAG), which is an endogenous metabolite of rodents, is significantly increased in urine if PL occurs, independent of the organ affected.11,12 In addition, a decrease of concentrations of citrate cycle intermediates in urine is often observed, which is linked with associated organ toxicity. In this work, it is shown how MPA allows a quick identification of metabolites that significantly change their relative concentration upon amiodarone dosing. No user interaction is necessary for the assignment of NMR signals of urine spectra to these metabolites. All metabolites previously validated by manual interpretation of multivariate models, urine spectra, and metabolite spectra are revealed by MPA in a fully automated way. EXPERIMENTAL SECTION Metabonomic Study Data Set. The animal study and the measurements of samples were performed in a way similar to the standard protocol defined in the Consortium of Metabonomics (COMET).13 The animal study was performed according to the following protocol: Male 6-7-week-old Sprague-Dawley rats (Charles River Deutschland GmbH, Sulfeld, Germany) were housed in metabolism cages. All animals had free access to water and food (Purina chow 5002 standard diet). After a 10-day acclimatization period, animals were randomly assigned to the five-member control and dosed groups. At time point zero, 1000 mg/kg amiodarone hydrochloride was administered orally to animals of the dose group. Urine was collected between 24 and 16 h predose and then continuously while fractionating at 0, 8, 24, 48, 72, 96, and 120 h resulting in 80 samples in total. Samples were collected into 1 mL of 1% sodium azide solution on ice and centrifuged for 15 min before storage at -70 °C until measurement. (11) Nicholls, A. W.; Nicholson, J. K.; Haselden, J. N.; Waterfield, C. J. Biomarkers 2000, 5, 410-423. (12) Espina, J. R.; Shockcor, J. P.; Herron, W. J.; Car, B. D.; Contel, N. R.; Ciaccio, P. J.; Lindon, J. C.; Holmes, E.; Nicholsion, J. K. Magn. Reson. Chem. 2001, 39, 559-565. (13) Lindon, J. C.; Nicholson, J. K.; Holmes, E.; Antti, H.; Bollard, M. E.; Keun, H.; Beckonert, O.; Ebbels, T. M.; Reily, M. D.; Robertson, D.; Stevens, G. J.; Luke, P.; Breau, A. P.; Cantor, G. H.; Bible, R. H.; Niederhauser, U.; Senn, H.; Schlotterbeck, G.; Sidelmann, U. G.; Laursen, S. M.; Tymiak, A.; Car, B. D.; Lehman-McKeeman, L.; Cole, J. M.; Loukaci, A.; Thomas, C. Toxicol. Appl. Pharmacol. 2003, 187, 137-146.
3552
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
Sample Preparation and NMR Spectroscopy. Before measurement, 75 µL of 0.8 M sodium phosphate buffer (pH 7.4) containing 3 mM TSP, 20% D2O, and 9 mM sodium azide was added to 450-µL urine samples. To account for chemical shift differences unrelated to pH, 75 µL of an 80 mM perdeuterated EDTA-d12 solution was added to complex free inorganic cations. The samples were then centrifuged for 15 min. Spectra were measured in disposable 5-mm-diameter NMR tubes at 600 MHz using a cryogenic TXI probe head (Bruker Biospin, Karlsruhe, Germany). The spectral acquisition was done by use of a 1D version of the NOESY pulse sequence with an acquisition time of 1.36 s and a 50-Hz field strength irradiation of the water signal during a 1-s relaxation delay and a 100-ms mixing time. The acquisition was stopped if the signal-to-noise ratio of the citrate resonances exceeded 6000:1 or if maximally 256 scans were summed. A total of 64k data points were collected with a spectral width of 20.036 ppm. Exponential line broadening of 1 Hz was applied prior to Fourier transformation. Spectra were phased, baseline-corrected, and referenced to TSP automatically using a routine programmed in MATLAB (NMRProc 0.3, Dr. Tim Ebbels, Imperial College, London, U.K.). The results were manually checked, and corrected if needed. NMR spectra were data reduced by two different binning methods. The first method, which is widely used in the literature, is based on an equidistant binning of the spectra (0.18-9.98 ppm) with a bin width of 0.04 ppm. The spectral region 4.50-6.02 ppm was excluded to remove variability due to suppression of water resonances and cross-saturation effects. The regions around the citrate resonances (2.50-2.58 and 2.62-2.70 ppm) were combined to account for shifting citrate signals. This binning method thus reduces each spectrum to 205 variables in total. The second binning method developed in-house uses nonequidistant bins. In contrast to equidistant binning, this method takes the width and variability of peak shifts into account. For the nonequidistant binning, an average spectrum of all spectra of the study is calculated (0.1-10 ppm). The borders of the bins are defined by the five-point minima of the average spectrum, whereby a fivepoint minimum is characterized as exactly one sign change of the first derivative from negative to positive between the second and fourth point. Bins of urea and water were excluded (5.356.06 and 4.68-4.92 ppm). Nonequidistant binning results in 636 variables with a correspondingly higher spectral resolution. To account for different dilution of urine, the binned spectra were normalized by an integral normalization (total integral of 100) followed by a probabilistic quotient normalization with the median spectrum of the complete study used as reference.14 Data were centered, but not scaled, prior to model building. In our experience, MPA works best with models of nonscaled data, as the cloud of nonchanged metabolites is most tight in this case. For the 129 metabolites listed in Table 1, 1H NMR spectra were recorded in aqueous solution containing 20% D2O, 0.1 M sodium phosphate buffer, pH 7.4, 1 mM TSP, and 3 mM sodium azide. The spectra were reduced by the two binning methods described above and normalized to the total integral of 100. All preprocessing steps were performed in Matlab15 using procedures written in-house. MPA was done with SIMCA.7 (14) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Anal. Chem. In press. (15) The MathWorks Inc. Matlab; The MathWorks Inc.; Natick, MA, 2005.
Table 1. List of 129 Metabolites for Which 1H NMR Spectra Were Recorded and Deposited in the In-House NMR Metabolite Databasea acetamidophenyl β-D-glucuronide adrenaline acetylacetonate adenosine adipic acid adonitol acetate alanine allantoin aminoadipic-2 acid hippurate-amino-4-sodium salt 3-hydroxyanthranilic acid arabitol-D betaine butyric acid-2-hydroxy-3-methyl carnitine chenodeoxycholic acid chlorogenic acid cholic acid choline sn-glycero-3-phosphocholine citrate creatine creatinine cystine dehydrocholic acid deoxycholic acid dimethylamine dimethylglycine-N,N erythritol ethanolamine acetamide ethanolamine-O-phosphoryl folic acid formate fructose fucose fumarate galactopyranose-D galactopyranoside-methyl-β-D galactosamine glucono-L-1,5-lactone glucosamin-N-acetyl glucose glucose 6-phosphate β-Dsodium salt glucuronic acid-D glutamic acid-L glutamine-L glutaric acid glyceric acid hemicalcium salt, hydrate glycerol glycine glycine-N-acetyl glycocyamine ethanol glucopyranose-3-O-methyl gulonic acid-γ-lactone hippuric acid histidine-L hydroxyisobutyric-2 acid hydroxybutyric-3 acid hydroxyisovaleric acid-Rl hydroxyphenylacetic-4 acid hydroxyphenylacetic-2 acid hydroxyphenylacetic-3 acid
hydroxyphenyllacetic acid-4 hypoxanthine indoxyl sulfate inositol isoleucine kynurenine-DL lactate lecithin-l-R leucine lithocholic acid lysine-5-hydroxy malate-disodium malic acid maltotriose mannitol methylamine methylmalonic acid NAD NADP neuraminic acid-N-acetyl nicotinamide nicotinamide-(N-methyl) hydroxyproline nicotinic acid sodium salt nicotinuricacid norleucine oxoglutarate-(2) phenylacetylglycine phenylalanine picolinic acid piperidine proline propionic acid-3-(3hydroxyphenyl) pyridoxine pyruvic acid sodium salt ribose-D saccharic acid potassium salt isocitric acid sarcosine sebacic acid serin shikimic acid sorbitol-(D) spermidine spermine suberic acid succinic acid disodium salt tartaric acid-(D) taurine threonine-L thymine TMAO trimethylamine tryptamine tryptophan-D tyramine tyrosine-L uridine urocanic acid valeric acid-2-hydroxy-3methyl valine-L xanthurenic acid xylitol xylose-DL
a This spectral information was used for the metabolite projection analyses described in this work.
MATHEMATICAL METHODS MPA is based on multivariate models derived from NMR spectra taken for samples of a metabonomic study. In this work, principal component analysis (PCA) and partial least squares (PLS) are used for building the models, but MPA is not limited to models built with these techniques. Modeling NMR metabonomics data by PCA, PLS-DA, or O-PLS are common methods in the literature whereby typically metabolites contributing to the variance of models are manually assigned by interpretation of significant spectral variables in loading plots.16-24 MPA uses the possibility of projection and prediction of multivariate models in a way not described in the metabonomics literature. Instead of projecting new test samples into a model built with calibration samples, spectra of a set of metabolites are projected into the model. Significant metabolites can be thereby graphically identified in score plots of projected metabolites. Metabolites that explain separation of samples are projected along the direction of samples with increased concentrations of the corresponding metabolites or into the opposite direction of samples with decreased concentrations of these metabolites. Metabolites that do not react upon dosing are projected into a cloud located near to the coordinate origin of the projected score plot, if all data are centered. It is therefore easy to identify metabolites that are significantly changed and to link changes of samples to metabolites, without having to deal with spectral data or annotation of signals. Typically, complete databases of metabolite spectra are projected into the models for the generation of first hypotheses of metabolites responsible for significant changes. If the model was built with a supervised method, significant metabolites can be identified due to their correlation with the supervised variable (e.g., class variable, concentration of administered compound, exposition to administered compound, time, etc.). For PCA a model is build for the matrix X constructed rowwise from binned spectral data of study samples. MPA is performed by projecting the matrix XMet constructed out of binned spectra of metabolites into this model. This is mathematically expressed as
X ) TPT + E (16) Holmes, E.; Nicholson, J. K.; Nicholls, A. W.; Lindon, J. C.; Connor, S. C.; Polley, S.; Connelly, J. Chemom. Intell. Lab. 1998, 44, 245-255. (17) Nicholls, A. W.; Holmes, E. H.; Lindon, J. C.; Shockcor, J. P.; Farrant, R. D.; Haselden, J. N.; Damment, S. J. P.; Waterfield, C. J.; Nicholson, J. K. Chem. Res. Toxicol. 2001, 14, 975-987. (18) Beckonert, O.; Bollard, M. E.; Ebbels, T. M. D.; Keun, H. C.; Antti, H.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Anal. Chim. Acta 2003, 490, 3-15. (19) Forshed, J.; Schuppe-Koistinen, I.; Jacobsson, S. P. Anal. Chim. Acta 2003, 487, 189-199. (20) Beckwith-Hall, B. M.; Holmes, E.; Lindon, J. C.; Gounarides, J.; Vickers, A.; Shapiro, M.; Nicholson, J. K. Chem. Res. Toxicol. 2002, 15, 1136-1141. (21) Holmes, E.; Antti, H. Analyst 2002, 127, 1549-1557. (22) Holmes, E.; Nicholls, A. W.; Lindon, J. C.; Connor, S. C.; Connelly, J. C.; Haselden, J. N.; Damment, S. J. P.; Spraul, M.; Neidig, P.; Nichloson, J. K. Chem. Res. Toxicol. 2000, 13, 471-478. (23) Brindle, J. T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J. K.; Bethell, H. W.; Clarke, S.; Schofield, P. B.; McKilligin, E.; Mosedale, D. E.; Gainger, D. J. Nat. Med. 2002, 8, 1439-1444. (24) Cloarec, O.; Dumas, M. E.; Trygg, J.; Craig, A.; Barton, R. H.; Lindon, J. C.; Nicholson, J. K.; Homes, E. Anal. Chem. 2005, 77, 517-526.
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
3553
TMet ) XMetP Here T represents the score matrix for the spectra of the study samples, PT is the loading matrix, and E the residual matrix. TMet is the score matrix for the projected spectra of the metabolites XMet. Clustering, similarities, and separations of study samples can be graphically investigated in the score plot of the first few principal components (ti, i ) 1, 2, ...). Metabolites responsible for separation of samples can be identified by plotting the corresponding scores tMet,i of projected metabolite spectra. Comparison of these two types of plots allows identification of metabolites responsible for separation of samples. If a PLS model is used for MPA, a regression between the spectra X of study samples and a response matrix Y is established. In the field of metabonomics, the response matrix can consist of one or more class variables, of variables describing drug concentrations, activities, or time points of the corresponding samples. As in PCA, the spectra of metabolites XMet are projected into the established model. As well as the scores of projected metabolites TMet, the regressed predicted response matrix YMet of metabolites is of interest here. YMet represents the correlation of the metabolites with the original response matrix Y. If Y is a matrix with class information, the predicted YMet contains information on upand downregulation of metabolites for the corresponding classes. In mathematical terms, MPA with PLS models can be expressed as follows:
X ) TPT + E Y ) UCT + F ) TCT + G (second expression due to inner relationship) TMet ) XMetP YMet ) TMetCT ) XMetPCT Here T represents the score matrix for the spectra of study samples, PT is the corresponding loading matrix, CT represents the Y-weight matrix, and E, F, and G are residual matrixes. TMet is the score matrix of the projected metabolite spectra XMet. YMet is the predicted Y matrix for the metabolite spectra. The principle behind MPA can be visualized as follows: if a metabolite significantly influences a multivariate model (for example, in terms of separation of samples), the loadings of the model significantly reflect the signals of this metabolite. Thus, if a set of metabolites is projected into this model, metabolites whose signals match significant loadings are projected away from the cloud of nonsignificant metabolites in the direction of samples with increased concentrations of these metabolites (or into the opposite direction for samples with decreased concentrations of the corresponding metabolites). Metabolites not significantly changed, whose signals overlap substantially with signals of metabolites significantly changed, will appear as false positives. To minimize this effect, a high resolution of study sample and metabolite spectra is advantageous. This is also shown in this work. RESULTS In a first approach, a PCA of the equidistantly binned spectra was performed. Score and loading plots of the first two compo3554
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
nents, which explain 66% of the cross-validated variance, are shown in Figure 2. A typical approach of metabonomics research is the manual interpretation of these plots. Here, not only an understanding of multivariate data analysis but also a sound knowledge of NMR spectra of metabolites is needed. A typical manual interpretation workflow is as follows: inspection of the score plot shows a separation between dosed and nondosed samples along the first principal component (PC 1). Nearly all dosed samples are separate from nondosed samples, whereby for later time points, dosed samples are most separated from nondosed samples. This means that the effect visible along PC 1 evolves with time for dosed samples. The separation of samples into two groups along PC 2 is not dose related. Mainly samples at time points -16 and 8 h are separated from other samples. The most significant variables for the dose-specific separation along PC1 are identified as 2.66 and 2.54 ppm. These signals may be assigned to citrate by manual inspection of a set of metabolite spectra. The downregulation of citrate for dosed samples is confirmed by integral values calculated. The citrate integral is shown in Figure 3, along with other integrals of significantly changed metabolites of the citrate cycle. Other TCA cycle metabolites identified and confirmed manually as significantly downregulated upon dosing are 2-oxoglutarate (2.44 and 3 ppm) and isocitrate (2.40 ppm). Hippurate, a non-TCA cycle related metabolite, was identified as being downregulated (3.96, 7.56, and 7.84 ppm). PAG, which has been proposed as a biomarker for PL, was identified as being significantly upregulated for dosed samples by integration of variables (3.68, 3.76, 3.84, and 7.36 ppm) derived from the PCA model. In addition, creatinine (3.04 and 4.08 ppm) and creatine (3.04 and 3.92 ppm) are also upregulated for dosed samples. Relative concentration levels of significantly changed metabolites shown in Figure 3 and in Figure 4 demonstrate that the inspection of the PCA model followed by a manual assignment of significantly changed variables allows the identification of changed metabolites. The citrate cycle intermediates are significantly downregulated, but fumarate was not initially identified in the loading plot. In addition, the concentration level of hippurate is decreased and that of creatinine increased for later time points of dosed animals. For single dosed animals at later time points, the concentration levels of creatine are highly increased. For the proposed biomarker PAG, the concentration levels of the dosed animals start to increase significantly 24 h after dosing. At the end of the study, the highest concentration levels of PAG are observed, but the effect might still have evolved, if the urine had been collected longer than 120 h after dosing. The effect responsible for the separation of samples along PC 2 is more difficult to interpret. The loading plot indicates a signal at 1.92 ppm, which primarily contributes to variance explained by PC 2, but also somewhat to that explained by PC 1. Inspection of the raw spectra revealed that the corresponding signal is a singlet, which was assigned to acetate with the help of our database of metabolite spectra. The concentration levels of acetate are greatly increased for all samples collected at time points -16 h and at 8 h. This finding was traced back to an external contamination: a cleaning agent, which contained acetate for decalcification, was used for cleaning the metabolism cages, and some acetate remained in the cages.
Figure 2. PCA of the equidistantly binned spectra. In the score plot (first row), samples are labeled with time points of sampling. Dosed samples are marked by black circles; control samples are marked by green squares. In the full-scale (second row) and zoomed loading plot (third row), variables are labeled with chemical shifts.
In addition, concentration levels of acetate for some dose group samples at later time points are increased by a factor of nearly two. It is therefore not clear whether the increased concentrations of acetate for the dosed samples at later time points are of endogenous origin or are due to the external contamination. Any external source of variation can render more principal components in the model necessary to uncover changes of other less significant endogenous metabolites. This example of a manual interpretation of multivariate models and NMR spectra shows that the identification of changing metabolites can be a tedious task that requires a lot of spectroscopic expertise and well-sorted lists of metabolites. To simplify this task, MPA was performed by use of the PCA model built from equidistantly binned spectra discussed above. The 1H NMR spectra of 129 metabolites, listed in Table 1, were projected into this model. The corresponding score plot (TMet) is shown in Figure 5. Here the majority of metabolites appear as a dense cloud of projected scores. The signals of these metabolites do not overlap with signals that contribute to the variance of the model. This means that these metabolites are not responsible for separations and effects visible in the score plot of study samples (first row of Figure 2). Interestingly, the center of the cloud is not exactly located at the origin of the score plot. This can be attributed to different “baselines” of metabolite and urine spectra of the study. Urine spectra typically only have few spectral regions without intensity. This is due to many overlapping peaks especially from large molecules (e.g., proteins and lipids). In contrast, metabolite spectra show zero intensity in most regions. Therefore, the default centering procedure performed during data preprocessing causes the systematic shift between the coordinate origin and the center of the cloud of metabolites. The score plot of the projected metabolites can be interpreted in the same way as the loading plot of a study sample PCA. Thus, in this PCA projection, metabolites shifted to the right (left) side of the cloud are negatively (positively) correlated with dose (which is negatively correlated with PC 1; see score plot in Figure 2). Metabolites shifted along PC 2 are responsible for the separation of the samples along PC 2 in the study sample score plot. As expected, the metabolites citrate, malate, isocitate, succinate, and, to some extent, hippurate can be easily identified as being negatively correlated with dose. Along PC 2, acetate pops out as the metabolite responsible for this separation. This demonstrates the power of MPA: only a few metabolites to be confirmed manually stick out from the cloud of unchanged metabolites. Only these metabolites contribute to the variance of this PCA model. Here PAG is most positively correlated with dose. In addition, creatinine, and creatine are shifted from the cloud toward the direction of positive correlation. Unexpectedly, several sugar molecules are also shifted into the same direction. This finding could not be confirmed by inspection of raw spectra. Thus, these sugar metabolites are false positives to be resolved manually. The reason is that, due to the very coarse binning, these analytes have several overlapping signals with PAG, creatine, and creatinine in the aliphatic region shown in Figure 6. It can be seen that for equidistant binning a significant spectral overlap between three bins between 3.66 and 3.78 ppm exits. As PAG, creatine, and creatinine are positively correlated with dose and have overlapping signals with sugar molecules, the latter ones appear also positively Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
3555
Figure 3. Relative concentration levels of citrate cycle intermediates determined as areas of corresponding peaks. The concentration levels of malic acid, also decreased for dosed samples, are not shown. These peaks show severe overlap preventing exact quantification. Samples are labeled with time points of sampling. Dosed samples are marked by black dots; control samples are marked by green squares.
correlated with dose for the binning used here. This figure also demonstrates how the finer nonequidistant binning proposed below reduces this spectral overlap between different analytes. For example, the spectral overlap between PAG and the sugar molecules is reduced by 50% in the spectral region shown in Figure 6. Therefore, false positive sugar molecules disappear for the MPA using nonequidistantly binned data (see Figure 7 and Figure 8) described below. The unexpected appearance of lithocholic acid in the figure can be attributed to the spectrum of this metabolite. Lithocholic acid was retrospectively identified as being insoluble in watersthe spectrum contains only noise. This noise is dramatically “blown up” by the integral normalization employed and contains substantial amplitude in any bin. Consequently, noise will be projected away from the cloud of nonchanged metabolites, as it correlates with any vector. Therefore, it is important to check the quality of the metabolite spectra before projection, to avoid false positives. In general, false positive metabolites cannot be prevented safely when using MPA. However, as MPA uses all signals for the projection simultaneously, false positive metabolites appear only if the majority of signals of the false positive metabolite overlap 3556
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
with signals of significant metabolites. This was shown above for false positive lithocholic acid (noise spectrum) and sugar metabolites. The advantage of MPA is that neither a single overlapping signal with a dose-related compound nor several overlapping signals with metabolites that do not covary will falsify the result. Therefore, the chance to identify false positive metabolites is by far reduced compared to that obtained by a procedure that simply labels single changed bins with names of metabolites with signals in this bin. An example of this advantage can be seen by use of the nonequidistant binning described below. In Figure 6, it is shown that for nonequidistant binning 50% spectral overlap of the sugar molecules with PAG compared to the standard binning remains. This reduced relative amount of overlapping bins does not cause false positive metabolites (see Figure 7 and Figure 8). Here the majority of signals of the sugar molecules do not overlap with significantly changed covarying metabolites, but individual bins still experience overlap. Consequently, the simple labeling procedure proposed above will propose sugar molecules as false positive hits. In summary, for the coarse equidistant binning only a few false positive metabolites, to be resolved by manual inspection of raw
Figure 4. Relative concentration levels of non-TCA cycle related metabolites, which significantly changed during the study. Samples are labeled with time points of sampling. Dosed samples are marked by black dots; control samples are marked by green squares.
spectra, remain. All manually confirmed and significantly changed metabolites were identified by MPA without back trace to raw datasno false negative hits appeared. To improve on falsely proposed metabolites, the finer resolved data of the nonequidistant binning procedure will be analyzed in the following. Score and loading plots of the PCA and the score
plot of the corresponding MPA are shown in Figure 7. The score plot looks very similar except for a mirroring along PC 1. The loading plot looks similar except for a mirroring and more variables (for example, 4 variables for citrate instead of 2 variables), but it is obvious that the higher resolution, which is peak-shape optimized, allows a clearer alignment of variables of Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
3557
Figure 6. Differences between equidistant and nonequidistant binning. The spectral region encompassing the aliphatic signals of PAG is shown for a typical urine sample, the average spectrum of the complete study, and four metabolites from the spectral database (top to bottom). The three blue boxes with solid lines define three 0.04 ppm width integral regions used in the standard equidistant binning approach. The black dotted lines define in the same spectral range nine regions of the nonequidistant binning, which were calculated from the 5-point minima in the average spectrum. Spectral regions of three carbohydrates, which overlap with signals of PAG, are crosshatched for nonequidistant binning and hatched or crosshatched for the equidistant binning. When comparing both binning methods, it becomes clear that spectral overlap between PAG and the sugar molecules reduces by ∼50% when using the nonequidistant binning method. The consequences can be seen in Figure 5, as carbohydrates appear as false-positive metabolites here. This is due to the spectral overlap with PAG. In Figures 7 and 8, false-positive carbohydrates disappear.
Figure 5. Score plot of the metabolite projection into the PCA model shown in Figure 2. Metabolites are labeled by names used in Table 1. 3558 Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
dose-correlated metabolites along PC 1. This simplifies interpretation, e.g., of the signal at 4.05 ppm belonging to creatinine. In addition, new variables are visible along PC 2 such a 3.45 and 6.60 ppm. These two singlet peaks belong to an as yet unidentified metabolite, whose relative concentration levels are shown in Figure 4. Structure elucidation of this metabolite is in progress. When looking at the metabolite projection shown in Figure 7, a similar enhancement due to higher resolution is visible: downregulated metabolites such as the citrate cycle intermediates citrate, succinate, isocitrate, 2-oxoglutarate, and malate as well as hippurate are clearly separated from the cloud of nonchanged metabolites along PC 1. These metabolites represent the dose effect. The upregulated metabolites creatine, creatinine, and PAG are clearly separated into the direction of positive correlation with dose and can be easily identified. In addition, betaine and trimethylaminoxide (TMAO) are separated along PC 1. TMAO levels slightly increase for samples at later time points (not doserelated), which parallels the separation of early and later time points along PC 2. The separation of betaine is based on the cooverlap of peaks with TMAO (3.27 ppm) and the unidentified metabolite at 3.45 ppm (see discussion above). MPA of nonequidistantly binned data profits from higher resolution. This allows an improved identification of metabolites significantly changed in a study just by looking at the metabolite projection plots. Here only betaine remains as a false positive hit, again no false negatives were found.
Figure 7. PCA of nonequidistantly binned spectra followed by metabolite projection analysis. In the score plot (top left), samples are labeled by time point of sampling. Dosed samples are marked by black dots; control samples are marked by green squares. In the full-scale (top right) and zoomed loading plot, variables are labeled with corresponding chemical shifts. In the score plot of projected metabolites (middle right and bottom row), the metabolites are labeled with their names.
As well as projecting metabolites into PCA models, the projection can also be performed into other multivariate models such as PLS models demonstrated in Figure 8. The results of the PLS, which was implemented as regression versus dose of each sample, are very similar compared with the corresponding PCA
in terms of score and weight plots and in terms of metabolite projection plots. Interestingly, also for PLS, the second factor (similar to second principal component of PCA) covers the effect of the acetate contamination. An interesting point of the PLS is the possibility to predict a pseudoconcentration for each metabolite Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
3559
Figure 8. PLS of the nonequidistantly binned spectra and subsequent metabolite projection analysis. In the score plot (top left), samples are labeled by time point of sampling. Dosed samples are marked by black dots; control samples are marked by green squares. In the weights plot (top right), the variables are labeled with chemical shifts. In the score plot of projected metabolites (middle row and bottom row left), metabolites are labeled with names from Table 1. The bottom right plot shows the predicted concentration of the projected metabolites for the first factor, whereby metabolites with outstanding high and low predicted concentrations are labeled. These metabolites are significantly negatively or positively correlated with dose.
by projection into the model. Although this pseudoconcentration has nothing to do with the absolute concentration of the metabolite (but corresponds to a predicted virtual concentration of the drug 3560
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
amiodarone), the relative values allow a quick identification of metabolites correlated with dose. Figure 8 shows for most metabolites a low and rather uniform pseudoconcentration. These
metabolites are not correlated with dose. Only for the metabolites PAG, creatinine, and creatine, the outstanding positive pseudoconcentration indicates a positive correlation with dose. For the metabolites citrate, succinate, malic acid, isocitrate, 2-oxoglutarate, and hippurate, negative predictions indicate a downregulation upon dosing. Pseudoconcentration allows an easy identification of all manually confirmed metabolites changed upon dosing. How does MPA compare with a simple renaming of ppm values of significant loadings with metabolites that have signals at these chemical shifts? First, most metabolites have several signals. It is unclear for a metabolite how to summarize contributions of related signals to get a unique location on a loading plot or how to extract a dose-related pseudoconcentration. This task is easily achieved by MPA. Second, for most variables, several metabolites contribute. A simple renaming of bins is prone to ambiguous assignments. This is, for example, seen for the creatine- and creatinine-related signals in the case of nonequidistant binning: here the bin at 3.04 ppm, which is significantly changed by creatine and creatinine, also has contributions from spermidine and tyrosine. In addition, the variable at 4.05 ppm, significantly influenced by creatinine, also contains signals of glucono-1,5lactone and tryptophan. The signal at 3.94 ppm, changed if creatine varies, also covers a signal of hippurate-amino-4-sodium salt. In this case, as many as seven substances will be proposed out of three bins, which have to be sorted out by manual inspection of raw data. Note: only a few unambiguous signals covered by a single metabolite can be found. This is, for example, the case for 2.55 (citrate) or 2.64 ppm (malic acid). For the coarser equidistantly binned data, the situation is even worse. In contrast, metabolites occur only as significantly changed in MPA if most of the signals of the metabolite are significantly changed and if these signals co-vary. This is a major advantage of MPA. MPA results of the three different models can be compared as follows: all approaches found all manually validated and significantly changed metabolites (citrate, 2-oxoglutarate, isocitrate, fumarate, malate, PAG, hippurate, acetate, creatinine, creatine). MPA of the PCA model of equidistantly binned data shows several false positive hits reduced in number by use of nonequidistantly binned data. In our hands, use of PLS is most convenient for MPA. Here the correlation with dose is obtained straightforwardly by the prediction of pseudoconcentrations. For PCA (25) Nicholson, J. K.; Wilson, I. D. Prog. Nucl. Res. Magn. Spectrosc. 1989, 21, 449-501. (26) Timbrell, J. A. Toxicology 1998, 129, 1-12. (27) Lindon, J. C.; Holmes, E.; Bollard, M. E.; Stanley, E. G.; Nicholson, J. K. Biomarkers 2004, 9, 1-31. (28) Lindon, J. C.; Holmes, E.; Nicholson, J. K. Prog. Nucl. Res. Magn. Spectrosc. 2004, 45, 109-143.
models, this correlation can only be estimated in score plots of samples. Therefore, the PLS of the nonequidistantly binned data is the most potent approach presented in this work. Note: Pseudoconcentrations are obtained also if MPA is done on a PLS model derived from equidistantly binned data. Here more falsepositive hits have to be sorted out, however. In summary, MPA allows an easy identification of all confirmed metabolites significantly changed in the study. Most identified metabolites are expected to change in case of severe drug-induced phospholipidosis. Especially PAG, proposed as a biomarker for PL, was expected to be upregulated. The citrate cycle intermediates, whose concentration levels were decreased for dosed animals, are expected to be downregulated if strong PL is associated with organ toxicity. Other non-PL-associated metabolites such as hippurate, creatine, and creatinine, which were found significantly changed, are often found to correlate with druginduced toxic effects.25-28 MPA circumvents the tedious task of manually assigning chemical shifts to metabolites. Further research on MPA has to be performed if quantitative criteria for the significance of changes of metabolites (similar to a Hoteling T2 ellipse) are required. CONCLUSIONS In this work, MPA was introduced as an extension to common multivariate analysis of metabonomic data sets. The advantage of MPA is that it offers a fast and easy way to identify involved metabolites without the need to inspect a high number of reference NMR spectra. The method does not need any dedicated software and can be applied using readily available packages for multivariate data analysis, without in-depth knowledge of chemometrics or NMR spectroscopy. A compulsory prerequisite of the method is access to an NMR spectral database of endogenous and xenobiotic metabolites. Application of the method to other spectroscopic methods (LC-MS, MS-MS, IR) by use of corresponding spectral databases is conceivable. ACKNOWLEDGMENT The authors acknowledge Dr. U. Niederhauser for planning and supervising the animal studies at RCC, and the partners of the COMET project and the group of Prof. J. Nicholson at Imperial College for continuous discussions and providing the program NMRPROC. The authors are also indebted to Dr. D. Banner for helpful discussions and for reading the manuscript. Received for review October 13, 2005. Accepted March 3, 2006. AC0518351
Analytical Chemistry, Vol. 78, No. 11, June 1, 2006
3561