A Multivariate Screening Strategy for Investigating Metabolic Effects of Strenuous Physical Exercise in Human Serum Elin Pohjanen,† Elin Thysell,† Pa1 r Jonsson,† Caroline Eklund,‡ Anders Silfver,‡ Inga-Britt Carlsson,§ Krister Lundgren,§ Thomas Moritz,§ Michael B. Svensson,‡ and Henrik Antti*,† Research Group for Chemometrics, Department of Chemistry, Umeå University, SE-901 87 Umeå, Sweden, Department of Surgical and Perioperative Science, Sports Medicine, Umeå University, SE-901 87 Umeå, Sweden, and Umeå Plant Science Center, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-901 87 Umeå, Sweden Received January 4, 2007
A novel hypothesis-free multivariate screening methodology for the study of human exercise metabolism in blood serum is presented. Serum gas chromatography/time-of-flight mass spectrometry (GC/TOFMS) data was processed using hierarchical multivariate curve resolution (H-MCR), and orthogonal partial least-squares discriminant analysis (OPLS-DA) was used to model the systematic variation related to the acute effect of strenuous exercise. Potential metabolic biomarkers were identified using data base comparisons. Extensive validation was carried out including predictive H-MCR, 7-fold full cross-validation, and predictions for the OPLS-DA model, variable permutation for highlighting interesting metabolites, and pairwise t tests for examining the significance of metabolites. The concentration changes of potential biomarkers were verified in the raw GC/TOFMS data. In total, 420 potential metabolites were resolved in the serum samples. On the basis of the relative concentrations of the 420 resolved metabolites, a valid multivariate model for the difference between pre- and postexercise subjects was obtained. A total of 34 metabolites were highlighted as potential biomarkers, all statistically significant (p < 8.1E-05). As an example, two potential markers were identified as glycerol and asparagine. The concentration changes for these two metabolites were also verified in the raw GC/TOFMS data.The strategy was shown to facilitate interpretation and validation of metabolic interactions in human serum as well as revealing the identity of potential markers for known or novel mechanisms of human exercise physiology. The multivariate way of addressing metabolism studies can help to increase the understanding of the integrative biology behind, as well as unravel new mechanistic explanations in relation to, exercise physiology. Keywords: Chemometrics • Exercise • GC/MS • Human Metabolism • Metabolomics • Metabonomics • Physiology • Predictive • Screening • Serum
Introduction Traditionally, studies in human exercise metabolism are performed by measuring and comparing the concentrations of one or a few metabolites in blood plasma/serum or muscle tissue pre- and post-exercise14,36 These metabolites are usually preselected based on some expected hypothesis, and the aim is to statistically either verify or disproof this hypothesis by means of independent study results. This approach has been successfully applied for many applications and has provided extensive mechanistic proof for different physiological processes in relation to exercise.32 However, as a means for a global unbiased screening of metabolic perturbations, this approach * To whom correspondence should be addressed. Phone, +46 90 786 53 58; fax, +46 90 13 88 85; e-mail,
[email protected]. † Department of Chemistry, Umeå University. ‡ Department of Surgical and Perioperative Science, Sports Medicine, Umeå University. § Swedish University of Agricultural Sciences. 10.1021/pr070007g CCC: $37.00
2007 American Chemical Society
is limited. This means that detection of unexpected or novel mechanistic phenomena or markers will be almost impossible to obtain. For this purpose, a robust methodology that can simultaneously quantify and identify a large number (hundreds to thousands) of metabolites and also reliably can manage to compare multiple samples based on this global metabolite representation is needed. By the combination of powerful analytical techniques with multivariate statistical methods for data processing and modeling, complex biological samples, for example, biofluids and tissues, can be characterized and compared based on the relative concentrations of hundreds to thousands of low molecular compounds, that is, metabolites. This methodology, commonly known as metabolomics6,8 or metabonomics,12,27 has been shown to be efficient in screening of biomarkers or biomarker patterns and to increase the mechanistic understanding in relation to different biochemical processes, for example, drug toxicity,3 genetic modification,10 and disease12 Journal of Proteome Research 2007, 6, 2113-2120
2113
Published on Web 04/12/2007
research articles in various biological sample matrices, for example, urine,25 blood,9 cerebrospinal fluid,22 and plant extracts.13 Nuclear magnetic resonance spectroscopy (NMR), gas chromatography/mass spectrometry (GC/MS), and liquid chromatography/mass spectrometry (LC/MS) are analytical techniques used for the characterization of samples in metabolomic studies. The output from these techniques can be seen as metabolite fingerprints comprising the relative concentrations of a large number of metabolites reflecting the metabolic status of the investigated samples. By tradition, NMR has been the preferred analytical tool for global analysis of mammalian metabolism in relation to, for example, drug-induced toxicity, life style, disease, or treatment efficacy. The mass spectrometrybased methods, and in particular GC/MS, have, until now, mainly been developed for and applied to plant biology studies.11 However, due to the higher sensitivity (compared to NMR) and the increasing reproducibility of today’s GC/MS and LC/MS systems, they have become important analytical platforms for all types of metabolomic applications, including the study of human metabolism. Compared to LC/MS, GC/MS is considered to be a more reproducible technique with an advantage concerning compound identification due to the use of a harder ionization, which results in a more extensive fragmentation of the detected molecules. This makes GC/MS an attractive alternative for multiple sample comparisons based on the generated metabolite fingerprints in the search for identified metabolic markers or marker patterns characteristic of specific biological processes. Today, a number of successful applications of GC/MS-based metabolomics can be found in the literature.4,5,20 Complex biological samples, for example, urine, plasma, or cerebrospinal fluid, analyzed with GC/MS often give rise to overlapping peaks in the GC chromatogram as well as mixed mass spectra due to coelution of small molecules (metabolites) with similar properties. To avoid misleading quantification and identification, the pure chromatographic and mass spectral profiles for the detected metabolites need to be retrieved. To achieve this, a methodology resembling mathematical chromatography called curve resolution, or deconvolution,17,21,26 can be applied to the acquired GC/MS data. The use of an appropriate deconvolution method has been shown to be relevant for the resulting amount and quality of extracted compounds in complex biological samples.23 Hierarchical multivariate curve resolution (H-MCR)18 is a multivariate curve resolution method33 developed for resolving complex metabolic GC/MS data simultaneously for all included samples. Applying H-MCR to an acquired GC/MS data set results in a data matrix where all samples are described by a common set of variables (resolved metabolite profiles). This data matrix is hence suitable for sample comparisons by means of multivariate or other statistical analysis. Recently, it was also shown that deconvolution using H-MCR can be performed predictively for independent sample sets,19 which could have large impact in areas such as high-throughput analysis of large data bases, data bank mining, or even for developing diagnostic systems for clinical use. In addition, this predictive property makes it possible to validate the robustness and quality of the resolved metabolite profiles, which is vital for producing reliable and representative data. Data generated in metabolomics studies, or similarly in proteomic29 and transcriptomic30 studies, are of multivariate character. This implies that the included samples are described by a large number of highly correlated variables, for example, 2114
Journal of Proteome Research • Vol. 6, No. 6, 2007
Pohjanen et al.
hundreds to thousands of metabolites simultaneously quantified in a sample. To achieve reliable comparisons between multiple samples based on the whole set of variables, statistical methods that can handle correlated (dependent) variables are a requirement. This cannot be obtained by applying classical statistical methods, such as ANOVA, multiple linear regression analysis, or Student’s t test, that all assume variable independence. Multivariate data analysis37 is concerned with the analysis and interpretation of complex data structures built up by many and highly correlated variables. Multivariate projection methods for data exploration, for example, principal components analysis (PCA),39 and regression, for example, partial least-squares projection to latent structures (PLS)16,38 and recently the Orthogonal PLS methodology (OPLS),34 have proven particularly useful for deciphering the systematic changes between many samples characterized by many variables. A number of attributes associated with projection-based model systems, such as the ability to separate systematic variation from noise, outlier detection, possibilities for model validation, and prediction of independent samples, have been especially attractive for moving the metabolomics field forward in terms of understanding complex interactions in biological systems.2,15 In this paper, we present a strategy for a hypothesis-free global metabolite screening in human blood serum in relation to strenuous physical exercise. The strategy is based on sensitive detection of serum metabolites by gas chromatography/time-of-flight mass spectrometry (GC/TOFMS), multivariate data processing by means of H-MCR, multivariate data analysis and evaluation using multivariate projection methods, and identification of interesting metabolites via spectral data base comparisons. The serum GC/TOFMS data (Figure 1A) was resolved into its pure metabolite profiles (chromatographic and mass spectral profiles) using H-MCR (Figure 1B), and OPLSDA (Discriminant Analysis) was used to extract the systematic variation related to the acute effect of strenuous exercise (Figure 1C), comprising of 90 min of ergometer-cycling at individualized workloads. Metabolites highlighted by the OPLS-DA model as influencing the separation between pre- and post-exercise samples, that is, metabolites changing in concentration with exercise, were subjected to identification by comparing their resolved mass spectrum to existing data bases including spectra from known compounds (Figure 1D). Extensive validation of the strategy was also performed. This was done in several steps including predictive H-MCR processing and OPLS-DA predictions of a set of analytical replicates and a set of independent validation samples, 7-fold full cross-validation31 of the OPLSDA model, variable permutation for highlighting interesting metabolites, and pairwise t tests for examining the statistical significance of highlighted metabolites. In addition, the highlighted metabolites were also investigated in the raw GC/ TOFMS data to verify the concentration changes. Identification of the metabolites also provided possibilities for validation in a physiological context. However, this is something that is currently under investigation and will be addressed in a separate publication.
Methods Section Subjects. Twenty-four healthy and regularly trained male subjects (age, 25.7 ( 2.7 years; height, 182.5 ( 7.6 cm; body weight, 77.4 ( 8.8 kg; with an ergometer-cycling maiximum oxygen uptake (VO2peak) at 59.1 ( 7.3 mL kg-1 min-1) volunteered to participate in the study. Subjects gave their informed
Investigating Metabolic Effects of Strenuous Physical Exercise
Figure 1. An overview of the presented strategy. (A) Collected serum samples were characterized using GC/MS analysis which resulted in a three-dimensional data structure consisting of one sample direction, one chromatographic direction (Time), and one mass spectral direction (m/z). (B) Hierarchical Multivariate Curve Resolution (H-MCR) was applied to the acquired GC/MS data files, generating quantitative variables (relative metabolite concentrations) for sample comparison and mass spectral profiles for identification. (C) The resolved quantitative variables were correlated to the experimentally induced perturbation, i.e., pre- and post-exercise, by means of multivariate regression analysis. (D) The resolved mass spectra were subjected to spectral library search for possible identification.
consent following information on the nature of the study and the possible risks involved. The study was approved by the regional ethical committee (#05-069M). Each subject participated in a pre-experimental VO2peak test and four identical experimental tests, which consisted of 90 min standardized ergometer-cycling (see below), with an interval of 1 week between each occasion. Experimental Procedures. The subjects arrived at the laboratory at 8:00 a.m. and signed a health form. Venous blood samples were taken after 15 min of bed rest (pre-exercise samples) using a vacutainer system (Becton Dickinson, U.K.). Thereafter, subjects were equipped with an intravenous catheter (Optiva2, Medex) in a superficial forearm vein. Subjects then performed 90 min of ergometer-cycling, using an electronically braked bicycle (Rodby, RE 829, Enho¨rna, Sweden). Each 90 min test-session consisted of nine equal 10 min sections without rest between sections. The work loads during the sections corresponded to 40% (2 min), 60% (6 min), and 85% (2 min) of the ergometer-cycling VO2peak work-load at the pre-experimental test. The pre-experimental ergometer-cycling VO2peak was determined during a graded incremental test with an online gas and air flow analyzer system (MetaMax II, CORTEX Biophysik GmbH, Leipzig, Germany), previously validated by Larsson et al.24 Subjects were asked to maintain a steady cadence of 80 rpm during the whole session of each
research articles
Figure 2. An illustration of the mathematical deconvolution of overlapping compounds, i.e., hierarchical multivariate curve resolution (H-MCR). The total intensity chromatogram (A) and mass spectrum (B) are here built up by three different profiles, which are not separated completely by the chromatographic system. However, when H-MCR is applied, pure chromatographic and spectral profiles for the coeluting compounds can be obtained.
experimental test, and after every 10 min of cycling, 100 mL of water was ingested. Immediately after 90 min of completed cycling, blood was collected from the vein catheter into vacutainer tubes (post-exercise samples). Serum was extracted from the collected blood samples following 8 min centrifugation (+4 °C at 3000g) and immediately frozen and stored in -80 °C. Prior to GC/TOFMS analysis, the serum samples were extracted and derivatized according to A et al.1 The samples were injected in splitless mode by an Agilent 7683 autosampler (Agilent, Atlanta, GA) into an Agilent 6890 gas chromatograph equipped with a 10 m × 0.18 mm i.d. fused silica capillary column with a chemically bonded 0.18 µm DB 5-MS stationary phase (J&W Scientific, Folsom, CA). The column effluent was introduced into the ion source of a Pegasus III time-of-flight mass spectrometer, GC/TOFMS (Leco Corp., St Joseph, MI). A more detailed description of the pre-experimental procedures, blood sampling, sample preparation, derivatization, and GC/ TOFMS protocol can be found in the Supporting Information. Multivariate Data Processing. To perform multiple sample comparison by multivariate modeling, all samples have to be described by a common set of variables, that is, in this case, the relative concentrations of possible metabolites. To achieve this, H-MCR18 was applied to the obtained GC/TOFMS data files to further resolve compounds (metabolites) that were not separated by the chemical chromatography. Compounds not separated by chemical chromatography will produce overlapping chromatographic profiles and mixed mass spectra (Figure 2) leading to difficulties regarding metabolite detection, quantification, and identification. The H-MCR procedure was used to resolve the GC/MS data for all included samples simultaneously producing a data matrix X where all samples are Journal of Proteome Research • Vol. 6, No. 6, 2007 2115
research articles
Figure 3. A graphical visualization of multivariate regression analysis, e.g., OPLS-DA. The multidimensional data space described by the X data, where each sample corresponds to a point and each variable (metabolite) one dimension, was reduced by projecting the samples onto a discriminant plane that is fitted to describe the maximum separation between predefined classes.
described by a common set of quantitative variables describing the relative concentrations of the resolved compounds (metabolites), that is, the area under the resolved chromatographic profile. In addition, each resolved compound has a corresponding mass spectral profile, stored separately, which can be subjected to spectral database comparison or de novo identification to enable further biological interpretation. The resulting X matrix was normalized using the concentrations of 11 added internal standards, eluting over the whole chromatographic time range. Two validation sets, one including samples from independent subjects and one containing analytical replicates, were predictively resolved and normalized according to the H-MCR and normalization parameters obtained for the model samples. Multivariate Statistical Analysis. Multivariate regression analysis in terms of orthogonal partial least-squares discriminant analysis (OPLS-DA)35,41 was applied to extract the systematic variation in the quantified serum profiles in X related to the response y, a dummy vector describing the sample class, that is, in this case pre- or post-exercise. The multidimensional data space described by the X data was reduced by projecting the samples to a discriminant plane that are fitted to describe the maximum separation between the classes (Figure 3). This reduced space facilitates interpretation of highly correlated variables (metabolites) by focusing on the systematic variation in the X data discriminating the sample classes.7 Thus, OPLSDA modeling was applied to reveal the class separation and, in addition, highlight the overall metabolic pattern8 related to the response (class). The calculated OPLS-DA model was validated by a 7-fold full cross-validation31 and two prediction sets, one including only samples from independent subjects samples and one including analytical replicates. A permutation test was performed to highlight variables (metabolites) showing significant relative concentration changes in relation to class (pre- or post-exercise). The y-vector (where 0 ) pre-exercise and 1 ) post-exercise) was permuted randomly 10 000 times, and for every permutation, a PLS40 model was created between the X data, that is, the quantified serum profiles included in the model, and the permuted y-vector. Variables showing a stronger correlation to y in the original model were highlighted, that is, variables with elevated PLS weight values (w1-values) compared to the permuted y models. The highlighted variables were further investigated in the resolved data, based on the model samples, by performing a paired t test to determine whether the metabolite concentrations differed significantly between the pre- and post-exercise samples using a 95% confidence level. This was performed 2116
Journal of Proteome Research • Vol. 6, No. 6, 2007
Pohjanen et al.
under the assumptions that the paired differences are independent and identically normally distributed. To test the robustness of the H-MCR processing procedure, the Pearson’s correlation between the resolved chromatographic profiles for the analytical replicates was calculated for the metabolites highlighted as significantly changing with exercise. Identification and Verification of Identified Variables. Variables were identified by means of a spectral data base search. Match values ranking the spectra were calculated using the dot product of the two spectra (i.e., the resolved spectrum and the database spectrum), with higher m/z peaks having more weight than lower m/z peaks, since higher m/z values are considered to be more compound-specific. Furthermore, a reverse logic which ignores “impurity” (i.e., nonmatching) peaks in the resolved spectrum was performed. This reverse match factor is not penalized for peaks in the target spectrum that are not present in the database spectrum. The match values have a range from 0 to 999, where 999 indicates an identical match.28 Further validation of the identified metabolites was done by interpreting them in a biological or physiological context (to be presented in a separate publication). All resolved profiles (metabolites) of interest were also validated in the raw GC/TOFMS data by calculating the area under the chromatographic peak for each sample. Data Processing and Analysis. Nonprocessed GC/TOFMS files (netCDF format) were exported to MATLAB software 7.3 (Mathworks, Natick, MA), where all data processing procedures, such as smoothing, alignment, time-window setting, and hierarchical multivariate curve resolution (H-MCR) were peformed.18 The permutation tests were performed using an inhouse MATLAB script. The multivariate statistical analysis was performed using SIMCA-P+ 11.5 software (Umetrics AB, Umeå, Sweden). NIST MS Search 2.0 (NIST, Gaithersburg, MD) was used for compound identification based on comparison between resolved spectra and standard spectra from NIST 98 mass spectra library, Umeå Plant Science Centre mass spectra library, or the mass spectra library hosted by the MaxPlanck Institute in Golm (http://csbdb.mpimp-golm.mpg.de/csbdb/gmd/gmd.html).
Results Multivariate Data Processing. Prior to H-MCR processing, three samples, from the original 96 (24 subjects, two occasions and two time points), were excluded due to low analytical quality. This resulted in a total of 93 samples, plus 29 analytical replicates, which were subjected to further processing and analysis. In addition, the samples were split into a training/ model set consisting of 69 samples and a validation set consisting of 24 independent samples and 29 analytical replicates. GC/TOFMS data files from the 69 training samples were subjected to the H-MCR procedure. Alignment and smoothing using a moving average was performed prior to dividing the chromatogram into 61 time windows where the edges were positioned manually at global low intensity areas in the chromatograms. In total, 420 chromatographic profiles (quantitative variables) with corresponding mass spectrum were resolved from the 61 time windows. The validation samples were then processed predictively according to the training H-MCR parameters. Multivariate Statistical Analysis. An OPLS-DA model was calculated by correlating the 420 resolved serum profiles for the 69 model samples X to the response y. Two significant components (one predictive and one orthogonal to the re-
research articles
Investigating Metabolic Effects of Strenuous Physical Exercise
Figure 4. (A) An OPLS-DA score plot of the training set describing the pre- and post-exercise serum samples from 18 subjects at two different test occasions. The first test occasion (triangles) as well as the second test occasion (squares) revealed a clear separation between the pre-exercise samples (gray) and post-exercise samples (white). (B) An OPLS-DA predictive score plot of the test set describing the pre- and post-exercise samples from six independent subjects at two test occasions. All independent test samples were predictively classified with 100% accuracy by using the model parameters from the training set. (C) An OPLS-DA loading plot of all 420 resolved profiles (metabolites) where 34 variables/metabolites were highlighted as potentially interesting by the performed permutation. These 34 variables are displayed as squares in the plot, where gray squares represent metabolites correlating to pre-samples (decreasing in concentration with exercise) and white squares variables correlating to post-samples (increasing in concentration with exercise). (D) A paired t test of the 34 highlighted variables (metabolites) on a 95% confidence level. All variables with a decreased concentration in the post-exercise sample are displayed as gray bars, and variables with an increased concentration in the post-exercise samples are shown as white bars. Two selected variables, Win05-C03 (third resolved profile in chromatographic window 5) (a) and Win19-C04 (fourth resolved profile in chromatographic window 19) (b), were highlighted in the plot and subjected to further identification and investigation in the raw data.
sponse) were extracted describing 18.8% of the variation in X (R2X ) 0.188) with 7.18% of the variation in X correlated to y (R2Xycorr ) 0.0718), describing 87% of the variation in y (R2Y ) 0.87) and predicting 71.8% of the variation in y (Q2 ) 0.718). This resulted in a clear separation between the pre- and postexercise samples for all test persons at the two test occasions (Figure 4A). Prediction of the validation samples revealed that all 24 independent test samples (Figure 4B) as well as the 29 analytical replicates (not shown) were predicted into the model with 100% accuracy with regards to class. Thirty-four variables (metabolites) were highlighted as potentially interesting by means of the performed permutation test (Figure 4C). Further investigation of the highlighted variables (metabolites) in the resolved data, based on the model samples, using a paired t test, revealed a significant change (p < 8.2E-05) in concentration for all 34 variables (Figure 4D). Finally, to evaluate the reproducibility of the H-MCR processing procedure, the Pearson’s correlation between the analytical replicates was investigated resulting in >0.80 correlation of 210 from the original 420 resolved variables (metabolites); 28 of the 34 highlighted metabolites showed a correlation >0.80. Identification and Verification of Identified Variables. Two selected variables, Win05-C03 (third resolved profile in chromatographic window 5; increasing in concentration in postexercise samples) and Win19-C04 (fourth resolved profile in chromatographic window 19; decreasing in concentration in post-exercise samples), were further identified by spectral comparison and verified in the raw data (Figure 4C,D). The
metabolic change for both Win05-C03 and Win19-C04 revealed a comparable change in concentration for all subjects at both test occasions (Figure 5). Win05-C03 was identified as glycerol with a match value of 794 and a reverse match value of 888, while Win19-C04 was identified as asparagine with a match value of 963 and a reverse match value of 968 (Figure 5). These metabolites could then be used for further biological evaluation in combination with metabolites with similar response. A more detailed biological (physiological) interpretation including all interesting identified metabolites will be presented in a separate publication.
Discussion The suggested metabolomic strategy provides a multivariate screening tool for mechanistic investigations in human samples, here exemplified in blood serum, in relation to physical exercise. It was shown that by combining sensitive analytical detection using GC/TOFMS with multivariate data processing and data analysis the metabolic pattern changes associated with acute strenuous exercise could be elucidated and verified among 420 potential metabolic entities. Further statistical validation and metabolite identification also offered possibilities for a detailed biological interpretation. This concept reveals the metabolic interactions in a system and hence makes it possible to verify existing mechanistic hypothesis, get a deeper mechanistic knowledge in relation to known mechanisms, or even to discover novel mechanistic phenomena as well as novel markers or marker patterns for various physiological states. Journal of Proteome Research • Vol. 6, No. 6, 2007 2117
research articles
Pohjanen et al.
Figure 5. Raw data verification was performed by calculating the area under the chromatographic peak in the GC/TOFMS raw data file for each subject. The pre-exercise samples are displayed as gray bars in the plot, and the post-exercise samples are shown as white bars. (A) Variable Win19-C04 revealed a decreased concentration in the post-exercise samples for all subjects and was further identified as asparagine in the spectral data base search with a match value of 963 and reverse math value of 968. (B) Variable Win05-C03 revealed an increased concentration in the post-exercise samples for all subjects and was identified as glycerol in the spectral data base search with a match value of 794 and reverse math value of 888.
The results highlight the importance of extracting reliable and representative data for performing efficient metabolomic screening. Here, this was done by applying hierarchical multivariate curve resolution (H-MCR) to the acquired GC/TOFMS data. The H-MCR processing resolved 420 potential metabolite profiles common for all included samples consisting of chromatographic profiles and corresponding mass spectra. This allowed a reliable sample comparison by means of multivariate data analysis. OPLS-DA was the preferred choice for extracting the systematic differences between the pre- and post-exercise serum samples and highlighting the metabolites associated with these differences. Identification of metabolites of interest was accomplished by comparing the resolved mass spectra for these specific metabolites to existing data bases containing spectra of known compounds. In this study, only 2 out 34 highlighted metabolites were identified. The reason for this was that the purpose of this paper was to present the screening strategy and not to provide a full biological interpretation. This will instead be the focus of a separate publication. Validation is a major issue of the presented strategy and is performed in all steps of the chain from data processing to biological evaluation. The predictive property of the H-MCR processing made it possible to predictively resolve a set of analytical replicates as well as an independent sample set. The analytical replicates were then used to validate the reproducibility of the curve resolution by correlating the resolved areas under the chromatographic profiles between analytical replicates. In this case, it showed that, out of the 34 highlighted metabolites, 28 showed a correlation >80% to its analytical replicate. The independent sample set, also predictively resolved, was predicted into the OPLS-DA model together with the analytical replicates. This resulted in 100% classification accuracy between pre- and post-exercise samples. By predictively resolving and classifying a set of independent samples, we were able to test the processing and modeling robustness extensively. In our opinion, this procedure should be performed routinely for verifying results in these types of studies before drawing any biological conclusions. From this perspective, the 2118
Journal of Proteome Research • Vol. 6, No. 6, 2007
combination of H-MCR and multivariate projections is unique, since they can both be carried out predictively, which makes it an extremely useful tool for metabolic screening. The fact that the study included the same subjects pre- and postexercise also made it possible to perform a pairwise t test of the metabolites highlighted by the multivariate analysis. A comparison between the results from the multivariate screening and the t-statistics showed that they do coincide. This is encouraging since it implies that the multivariate screening approach does highlight statistically significant entities among many at the same time as it extracts robust and predictive patterns built up by the interactions between the included metabolites. An important point to make from these results is that multivariate and classical statistical methods should not be seen as competing methods. Instead, they can complement each other, creating strategies for considering multivariate interactions in complex systems which are still statistically significant. For this purpose, the multivariate data processing and analysis will extract relevant and validated metabolic patterns, while the classical statistical methods will provide verification of individual markers. The transparency of the method also allows the verification of the concentration changes for the individual metabolites in the raw data, which further adds to the reliability of biological interpretation. In summary, the presented results demonstrate how a multivariate strategy can be efficiently applied for processing and analyzing complex GC/TOFMS data with the aim to screen for metabolic changes in human blood serum in relation to strenuous physical exercise. This metabolomic strategy, novel to the area of exercise physiology, was shown to facilitate interpretation and validation of metabolic interactions as well as revealing the identity of potential markers for known or novel mechanisms. This multivariate approach of addressing metabolism studies will hopefully help to increase the understanding of the integrative biology behind, as well as unravel new mechanistic explanations in relation to, exercise physiology.
Investigating Metabolic Effects of Strenuous Physical Exercise
research articles
Acknowledgment. We thank Lotta Alfredson for technical assistance and laboratory work during the exercise study. This work was supported by grants from the Swedish Strategic Research Foundation (SSF), the Swedish Association of Persons with Neurologically Disabilities (NHR), the Swedish Research Council (VR), Wallenberg Consortium North (WCN), and the Kempe Foundation.
(16) Ho¨skuldsson, A. A combined theory for PCA and PLS. J. Chemom. 1995, 9 (2), 91-123. (17) Idborg-Bjo¨rkman, H.; Edlund, P. O.; Kvalheim, O. M.; SchuppeKoistinen, I.; Jacobsson, S. P. Screening of biomarkers in rat urine using LC/electrospray ionization-MS and two-way data analysis. Anal. Chem. 2003, 75 (18), 4784-4792. (18) Jonsson, P.; Johansson, A. I.; Gullberg, J.; Trygg, J. A. J.; Grung, B.; Marklund, S.; Sjo¨stro¨m, M.; Antti, H.; Moritz, T. Highthroughput data analysis for detecting and identifying differences between Samples in GC/MS-based metabolomic analyses. Anal. Chem. 2005, 77 (17), 5635-5642. (19) Jonsson, P.; Johansson, E. S.; Wuolikainen, A.; Lindberg, J.; Schuppe-Koistinen, I.; Kusano, M.; Sjo¨stro¨m, M.; Trygg, J.; Moritz, T.; Antti, H. Predictive metabolite profiling applying hierarchical multivariate curve resolution to GC-MS datassA potential tool for multi-parametric diagnosis. J. Proteome Res. 2006, 5 (6), 14071414. (20) Jonsson, P.; Stenlund, H.; Moritz, T.; Trygg, J.; Sjo¨stro¨m, M.; Verheij, E. R.; Lindberg, J.; Schuppe-Koistinen, I.; Antti, H. A strategy for modelling dynamic responses in metabolic samples characterized by GC/MS. Metabolomics 2006, 2 (3), 135-143. (21) Karjalainen, E. J. The spectrum reconstruction problemsUse of alternating regression for unexpected spectral components in 2-dimensional spectroscopies. Chemom. Intell. Lab. Syst. 1989, 7 (1-2), 31-38. (22) Kawashima, H.; Oguchi, M.; Ioi, H.; Amaha, M.; Yamanaka, G.; Kashiwagi, Y.; Takekuma, K.; Yamazaki, Y.; Hoshika, A.; Watanabe, Y. Primary biomarkers in cerebral spinal fluid obtained from patients with influenza-associated encephalopathy analyzed by metabolomics. Int, J. Neurosci. 2006, 116 (8), 927-936. (23) Kell, D. B. Metabolomics and systems biology: making sense of the soup. Curr. Opin. Microbiol. 2004, 7 (3), 296-307. (24) Larsson, P. U.; Wadell, K. M. E.; Jakobsson, E. J. I.; Burlin, L. U.; Henriksson-Larsen, K. B. Validation of the MetaMax II portable metabolic measurement system. Int. J. Sports Med. 2004, 25 (2), 115-123. (25) Lutz, U.; Lutz, R. W.; Lutz, W. K. Metabolic profiling of glucuronides in human urine by LC-MS/MS and partial least-squares discriminant analysis for classification and prediction of gender. Anal. Chem. 2006, 78 (13), 4564-4571. (26) Manne, R.; Grande, B. V. Resolution of two-way data from hyphenated chromatography by means of elementary matrix transformations. Chemom. Intell. Lab. Syst. 2000, 50 (1), 35-46. (27) Nicholson, J. K.; Connelly, J.; Lindon, J. C.; Holmes, E. Metabonomics: a platform for studying drug toxicity and gene function. Nat. Rev. Drug Discovery 2002, 1 (2), 153-161. (28) NIST, NIST MS Search User Guide, Gaithersburg, MD. http:// chemdata.nist.gov/massspc/Srch_v1.7/Ver20Man.pdf, 2005. (29) Pandey, A.; Mann, M. Proteomics to study genes and genomes. Nature 2000, 405 (6788), 837-846. (30) Schena, M.; Shalon, D.; Davis, R. W.; Brown, P. O. Quantitative monitoring of gene-expression patterns with a complementaryDNA microarray. Science 1995, 270 (5235), 467-470. (31) Stone, M. Cross-validatory choice and assessement of statistical prediction. J. R. Stat. Soc. 1974, 36B 111-133. (32) Svensson, M. B.; Ekblom, B.; Cotgreave, I. A.; Norman, B.; Sjoberg, B.; Ekblom, O.; Sjodin, B.; Sjodin, A. Adaptive stress response of glutathione and uric acid metabolism in man following controlled exercise and diet. Acta Physiol. Scand. 2002, 176 (1), 43-56. (33) Tauler, R. Multivariate curve resolution applied to second order data. Chemom. Intell. Lab. Syst. 1995, 30 (1), 133-146. (34) Trygg, J.; Wold, S. Orthogonal projections to latent structures (OPLS). J. Chemom. 2002, 16 (3), 119-128. (35) Trygg, J.; Wold, S. Orthogonal projections to latent structures (OPLS). J. Chemom. 2002, 16 (3), 119-128. (36) Weltman, A.; Pritzlaff, C. J.; Wideman, L.; Weltman, J. Y.; Blumer, J. L.; Abbott, R. D.; Hartman, M. L.; Veldhuis, J. D. Exercisedependent growth hormone release is linked to markers of heightened central adrenergic outflow. J. Appl. Physiol. 2000, 89 (2), 629-635. (37) Wold, S. Chemometrics: Mathematics and Statistics in Chemistry; D. Reidel Publishing Company: Dordrecht, Holland, 1984. (38) Wold, S.; Albano, C.; Dunn, W. J.; Edlund, U.; et.al. Multivariate analysis in chemometrics, Chemometrics: Mathematics and Statistics in Chemistry; B.R. Kowalski ,Ed.; D. Reidel Publishing Company, Dordrecht, Holland. 1984.
Supporting Information Available: A detailed description of the pre-experimental procedures, blood sampling, sample preparation, derivatization, and GC/TOFMS protocol. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) A, J.; Trygg, J.; Gullberg, J.; Johansson, A. I.; Jonsson, P.; Antti, H.; Marklund, S. L.; Moritz, T. Extraction and GC/MS analysis of the human blood plasma metabolome. Anal. Chem. 2005, 77 (24), 8086-8094. (2) Cloarec, O.; Dumas, M. E.; Craig, A.; Barton, R. H.; Trygg, J.; Hudson, J.; Blancher, C.; Gauguier, D.; Lindon, J. C.; Holmes, E.; Nicholson, J. Statistical total correlation spectroscopy: An exploratory approach for latent biomarker identification from metabolic H-1 NMR data sets. Anal. Chem. 2005, 77 (5), 12821289. (3) Coen, M.; Lenz, E. M.; Nicholson, J. K.; Wilson, I. D.; Pognan, F.; Lindon, J. C. An integrated metabonomic investigation of acetaminophen toxicity in the mouse using NMR spectroscopy. Chem. Res. Toxicol. 2003, 16 (3), 295-303. (4) De Souza, D. P.; Saunders, E. C.; McConville, M. J.; Likic, V. A. Progressive peak clustering in GC-MS metabolomic experiments applied to Leishmania parasites. Bioinformatics 2006, 22 (11), 1391-1396. (5) Denkert, C.; Budczies, J.; Kind, T.; Weichert, W.; Tablack, P.; Sehouli, J.; Niesporek, S.; Konsgen, D.; Dietel, M.; Fiehn, O. Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors. Cancer Res. 2006, 66 (22), 10795-10804. (6) Dunn, W. B.; Bailey, N. J. C.; Johnson, H. E. Measuring the metabolome: current analytical technologies. Analyst 2005, 130 (5), 606-625. (7) Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Trygg, J.; Wikstro¨m, C.; Wold, S. Multi- and Megavariate Data Analysis, 2nd ed.; Umetrics Academy: Umeå, Sweden, 2006. (8) Fiehn, O. Metabolomicssthe link between genotypes and phenotypes. Plant Mol. Biol. 2002, 48 (1-2), 155-171. (9) Fiehn, O.; Kind, T. Metabolite Profiling in Blood Plasma; 2006; Vol. 358, pp 3-18. (10) Gavaghan, C. L.; Holmes, E.; Lenz, E.; Wilson, I. D.; Nicholson, J. K., An NMR-based metabonomic approach to investigate the biochemical consequences of genetic strain differences: application to the C57BL10J and Alpk : ApfCD mouse. FEBS Lett. 2000, 484 (3), 169-174. (11) Glinski, M.; Weckwerth, W. The role of mass spectrometry in plant systems biology. Mass Spectrom. Rev. 2006, 25 (2), 173-214. (12) Griffin, J. L.; Nicholls, A. W. Metabolomics as a functional genomic tool for understanding lipid dysfunction in diabetes, obesity and related disorders. Pharmacogenomics 2006, 7 (7), 1095-1107. (13) Gullberg, J.; Jonsson, P.; Nordstro¨m, A.; Sjo¨stro¨m, M.; Moritz, T. Design of experiments: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry. Anal. Biochem. 2004, 331 (2), 283-295. (14) Hannukainen, J.; Nuutila, P.; Borra, R.; Kaprio, J.; Kujala, U.; Janatuinen, T.; Heininen, O.; Kapanen, J.; Viljanen, T.; Haaparanta, M.; Ro¨nnemaa, T.; Parkkola, R.; Knuuti, J.; Kalliokoski, K. Increased physical activity decreases hepatic free fatty acid uptake: a study in monozygotic twins. J. Appl. Physiol. 2007, 578 (1), 347-358. (15) Holmes, E.; Antti, H. Chemometric contributions to the evolution of metabonomics: mathematical solutions to characterising and interpreting complex biological NMR spectra. Analyst 2002, 127 (12), 1549-1557.
Journal of Proteome Research • Vol. 6, No. 6, 2007 2119
research articles (39) Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2 (1-3), 37-52. (40) Wold, S.; Sjo¨strom, M.; Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58 (2), 109130.
2120
Journal of Proteome Research • Vol. 6, No. 6, 2007
Pohjanen et al. (41) Wold, S.; Trygg, J.; Berglund, A.; Antti, H. Some recent developments in PLS modeling. Chemom. Intell. Lab. Syst. 2001, 58 (2), 131-150.
PR070007G