Metabonomics and Biomarker Discovery: LCMS Metabolic Profiling

Dec 20, 2005 - Silvia Wagner,† Karoline Scholz,† Michael Donegan,‡ Lyle Burton,§ Julia Wingate,§ and. Wolfgang Vo1lkel*,†. Department of Tox...
0 downloads 0 Views 214KB Size
Anal. Chem. 2006, 78, 1296-1305

Metabonomics and Biomarker Discovery: LC-MS Metabolic Profiling and Constant Neutral Loss Scanning Combined with Multivariate Data Analysis for Mercapturic Acid Analysis Silvia Wagner,† Karoline Scholz,† Michael Donegan,‡ Lyle Burton,§ Julia Wingate,§ and Wolfgang Vo 1 lkel*,†

Department of Toxicology, University of Wu¨rzburg, Versbacher Strasse 9, 97078 Wu¨rzburg, Germany, MDS Sciex, 500 Old Connecticut Path, Framingham, Massachusetts 01701, and Applied Biosystems, 71 Four Valley Drive, Concord, Ontario, Canada L4K 4V8

In the field of metabonomics, 1H NMR and full scan mass spectrometry methods have usually been combined with principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) to detect patterns in biofluids that correspond to specific effects, usually a toxic site effect of a compound. Confounders together with great interindividual variation complicate such analysis in humans, and therefore, metabonomic data are almost restricted to animals. In our study, a constant neutral loss (CNL) scan on a linear ion trap demonstrated increased sensitivity and specificity compared to a full scan approach and was performed to detect mercapturic acids (MA), a class of effect markers. The method was applied to human volunteers administered 50 and 500 mg of acetaminophen (AAP), a model compound known to form MAs. Using a new algorithm to prepare the CNL data for chemometrics, discrimination of control and postdose samples could be performed using PCA and PLS-DA. The loadings plots clearly revealed AAP-MA as a marker, even at low-dose levels. Orthogonal signal correction (OSC) was carried out to investigate background information that is not due to exposure. Surprisingly, the OSC data provided a classification of male and female subjects showing the performance of the new approach. Within the few last years, metabonomics, metabolic fingerprinting, and metabolite profiling have gained great prominence, all having in common a screening approach for the detection of typical patterns in biofluids. Up to now, in the field of metabonomics, metabolites have been traditionally considered to be small endogenous compounds such as sugars, amino acids, organic acids, or creatinine resulting from biochemical pathways (catabolism) and are often referred to as the “usual suspects”.1 Classical metabonomics approaches investigate the levels of these * To whom correspondence should be addressed. Tel: +49(0)931/ 201-48432. Fax: +49(0)931/201-48865. E-mail: [email protected]. † University of Wu ¨ rzburg. ‡ MDS Sciex. § Applied Biosystems. (1) Robertson, D. G. Toxicol. Sci. 2005, 85, 809-822.

1296 Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

metabolites during the study, e.g., after exposition to toxic compounds in urine and plasma of mostly laboratory animals.2-4 Greater varieties in human subjects due to genetic or lifestyle factors together with lower concentrations of biofluid metabolites are the drawbacks that limit the expansion of metabonomics to human data. Therefore, only a few human metabonomics studies using either NMR or MS techniques have been described so far. These studies were performed in patients who suffer from diseases such as diabetes type II,5,6 coronary heart disease,7 epithelial ovarian cancer,8 and liver cancer9 and with volunteers restricted to a diet rich in isoflavones10 or subjects given chamomile tea.11 Furthermore, additional work has been done to investigate human inter- and intrasubject biochemical variation.12-15 (2) Coen, M.; Lenz, E. M.; Nicholson, J. K.; Wilson, I. D.; Pognan, F.; Lindon, J. C. Chem. Res. Toxicol. 2003, 16, 295-303. (3) Holmes, E.; Nicholls, A. W.; Lindon, J. C.; Ramos, S.; Spraul, M.; Neidig, P.; Connor, S. C.; Connelly, J.; Damment, S. J.; Haselden, J.; Nicholson, J. K. NMR Biomed. 1998, 11, 235-244. (4) Lenz, E. M.; Bright, J.; Knight, R.; Westwood, F. R.; Davies, D.; Major, H.; Wilson, I. D. Biomarkers 2005, 10, 173-187. (5) Wang, C.; Kong, H.; Guan, Y.; Yang, J.; Gu, J.; Yang, S.; Xu, G. Anal. Chem. 2005, 77, 4108-4116. (6) Yang, J.; Xu, G.; Hong, Q.; Liebich, H. M.; Lutz, K.; Schmuelling, R. M.; Wahl, H. G. J. Chromatogr., B 2004, 813, 53-58. (7) Brindle, J. T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J. K.; Bethell, H. W.; Clarke, S.; Schofield, P. M.; McKilligin, E.; Mosedale, D. E.; Grainger, D. J. Nat. Med. 2002, 8, 1439-1444. (8) Odunsi, K.; Wollman, R. M.; Ambrosone, C. B.; Hutson, A.; McCann, S. E.; Tammela, J.; Geisler, J. P.; Miller, G.; Sellers, T.; Cliby, W.; Qian, F.; Keitz, B.; Intengan, M.; Lele, S.; Alderfer, J. L. Int. J. Cancer 2005, 113, 782788. (9) Yang, J.; Xu, G.; Zheng, Y.; Kong, H.; Pang, T.; Lv, S.; Yang, Q. J. Chromatogr., B 2004, 813, 59-65. (10) Solanky, K. S.; Bailey, N. J.; Beckwith-Hall, B. M.; Bingham, S.; Davis, A.; Holmes, E.; Nicholson, J. K.; Cassidy, A. J. Nutr. Biochem. 2005, 16, 236244. (11) Wang, Y.; Tang, H.; Nicholson, J. K.; Hylands, P. J.; Sampson, J.; Holmes, E. J. Agric. Food Chem. 2005, 53, 191-196. (12) Lenz, E. M.; Bright, J.; Wilson, I. D.; Hughes, A.; Morrisson, J.; Lindberg, H.; Lockton, A. J. Pharm. Biomed. Anal. 2004, 36, 841-849. (13) Lenz, E. M.; Bright, J.; Wilson, I. D.; Morgan, S. R.; Nash, A. F. J. Pharm. Biomed. Anal. 2003, 33, 1103-1115. (14) Zuppi, C.; Messana, I.; Forni, F.; Ferrari, F.; Rossi, C.; Giardina, B. Clin. Chim. Acta 1998, 278, 75-79. (15) Zuppi, C.; Messana, I.; Forni, F.; Rossi, C.; Pennacchietti, L.; Ferrari, F.; Giardina, B. Clin. Chim. Acta 1997, 265, 85-97. 10.1021/ac051705s CCC: $33.50

© 2006 American Chemical Society Published on Web 12/20/2005

Two approaches are thus conceivable considering human data: increasing the sensitivity of the analytical methods, concentrating on a certain class of metabolites (also referred to as metabolic or metabolite profiling), or both.16,17 The use of new and sensitive LC-MS techniques instead of conventional NMR spectroscopy in the field of metabonomics can take both criteria into account. Recently, a HPLC-MS profiling approach for the diagnosis of liver cancer from urine samples was published.9 The authors of the study concentrate on the cis-diol tumor marker class known to be correlated with the effect under investigation. A similar approach was reported concerning plasma phospholipid profiles and type II diabetes.5 In both cases, the marker class was extracted from the biofluid prior to analysis. Classical full scan survey scans were then carried out for HPLC-MS analysis. We now introduce a new profiling approach based on a mercapturic acid screening. Mercapturic acids (MA) are of special interest since reactive compounds from endogenous or exogenous sources are conjugated to glutathione (GSH), a tripeptide that serves as corresponding nucleophile. Degradation of the GSH conjugates by transpeptidase cleavage and subsequent acetylation leads to the corresponding mercapturic acids that are excreted in urine.18 Perbellini et al. stated that synthesis of an unlimited number of mercapturic acids is hypothetically possible in this way.19 Since there should be a good correlation between exposure to reactive compounds and the MA pattern in urine, this marker class can be used for evaluation of the electrophilic burden of an organism. A variety of exogenous and endogenous sources of such reactive compounds are known. Acetaminophen is a well-known drug that undergoes bioactivation to an electrophile and forms a mercapturic acid.20 Organic solvents (n-hexane, benzene, perchloroethene),19,21 food ingredients (e.g., isothiocyanates, safrole),22,23 food heating products (acrylamide, heterocyclic aromatic amines),24 and other (yet unknown) factors contribute to a more or less extent to the burden of human subjects. In addition to being formed from xenobiotics, mercapturic acids formed from endogenous compounds were recently reported.25,26 4-Hydroxy-2-nonenal and 1,4-dihydroxynonene (DHN) constitute end products of lipid peroxidation as a result of the formation of reactive oxygen species (ROS), also known as oxidative stress. Both compounds lead to mercapturic acids and interestingly their urinary levels are known to be affected by compounds that form ROS, but not glutathione adducts.27,28 Therefore, a correlation (16) Fiehn, O. Plant Mol. Biol. 2002, 48, 155-171. (17) Krishnan, P.; Kruger, N. J.; Ratcliffe, R. G. J. Exp. Bot. 2005, 56, 255-265. (18) Haufroid, V.; Lison, D. Int. Arch. Occup. Environ. Health 2005, 78, 343354. (19) Perbellini, L.; Veronese, N.; Princivalle, A. J. Chromatogr., B 2002, 781, 269-290. (20) Andrews, R. S.; Bond, C. C.; Burnett, J.; Saunders, A.; Watson, K. J. Int. Med. Res. 1976, 4, 34-39. (21) Vo ¨lkel, W.; Friedewald, M.; Lederer, E.; Pa¨hler, A.; Parker, J.; Dekant, W. Toxicol. Appl. Pharmacol. 1998, 153, 20-27. (22) Fennell, T. R.; Miller, J. A.; Miller, E. C. Cancer Res. 1984, 44, 3231-3240. (23) Vermeulen, M.; van Rooijen, H. J.; Vaes, W. H. J. Agric. Food Chem. 2003, 51, 3554-3559. (24) Boettcher, M. I.; Schettgen, T.; Kutting, B.; Pischetsrieder, M.; Angerer, J. Mutat. Res. 2005, 580, 167-176. (25) Alary, J.; Debrauwer, L.; Fernandez, Y.; Cravedi, J. P.; Rao, D.; Bories, G. Chem. Res. Toxicol. 1998, 11, 130-135. (26) Alary, J.; Fernandez, Y.; Debrauwer, L.; Perdu, E.; Gueraud, F. Chem. Res. Toxicol. 2003, 16, 320-327. (27) Hartley, D. P.; Kolaja, K. L.; Reichard, J.; Petersen, D. R. Toxicol. Appl. Pharmacol. 1999, 161, 23-33.

between oxidative stress and mercapturic acid patterns in urine is expected. Compared to laboratory animals, humans are exposed to a larger variety of compounds capable of leading to reactive metabolites depending on diet, smoking habits, environment, (oxidative) stress, or other lifestyle factors. Oxidative damage is especially correlated to a variety of diseases, including allergies, morbus parkinson, morbus alzheimer, arteriosclerosis, coronary heart diseases, and cancer.29,30 The early diagnosis and prevention of these widespread diseases is the goal of modern research in the field of life science. Having once selected a certain marker class, the second aim is to increase sensitivity of the analytical method. As was recently reported, using collision-induced dissociation in the negative ion mode, a characteristic constant neutral loss (CNL ) of 129 amu was observed indicating the presence of mercapturic acids.31,32 More than 50 mercapturic acids were detected from rat urine using CNL screening.33 In addition, some endogenous mercapturic acids, such as DHN-MA were identified in these samples, showing the sensitivity of this approach and thus providing the foundation for using this CNL scanning for mercapturic acid profiling in human urine samples. To test whether the new approach could be transferred from rat to human urine samples, a therapeutical dose of acetaminophen (500 mg) as the mercapturic acid-forming model compound was orally administered to 10 male and female volunteers. LC-MS/MS analysis by a hybrid quadrupole linear ion trap (QTRAP) in combination with a CNL survey scan and the acquisition of product ion mass spectra (EPI) was selected for mercapturic acid scanning. In addition, the samples were analyzed by a “classical” metabonomic profiling approach using high-resolution full scan analysis on a hybrid quadrupole timeof-flight instrument (QqTOF). For the purpose of further evaluation, a second experiment was performed administering only a tenth of the primary acetaminophen dose (50 mg) to 25 healthy male and female volunteers to determine the sensitivity and specificity of the method at lower dose levels. To our knowledge, this was the first time a CNL screening approach was successfully used in the new area of metabonomics and metabolic fingerprinting. Chemometric models such as principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) together with orthogonal signal corrected (OSC) models common in metabonomic data processing were employed for discrimination of control and postdose samples. In the present study, great care was taken with parameter optimization for peak detection, alignment, and filtering prior to chemometric analyses. EXPERIMENTAL SECTION Chemicals. All solvents were HLPC grade and purchased from Roth (Karlsruhe, Germany). The acetaminophen mercapturic (28) Vo ¨lkel, W.; Alvarez-Sanchez, R.; Weick, I.; Mally, A.; Dekant, W.; Pa¨hler, A. Free Radical Biol. Med. 2005, 38, 1526-1536. (29) Datta, K.; Sinha, S.; Chattopadhyay, P. Natl. Med. J. India 2000, 13, 304310. (30) Galli, F.; Piroddi, M.; Annetti, C.; Aisa, C.; Floridi, E.; Floridi, A. Contrib. Nephrol. 2005, 149, 240-260. (31) Manini, P.; Andreoli, R.; Bergamaschi, E.; De Palma, G.; Mutti, A.; Niessen, W. M. Rapid Commun. Mass Spectrom. 2000, 14, 2055-2060. (32) Jones, A. D.; Winter, C. K.; Buonarati, M. H.; Segall, H. J. Biol. Mass Spectrom. 1993, 22, 68-76. (33) Scholz, K.; Dekant, W.; Vo¨lkel, W.; Pa¨hler A. J. Am. Soc. Mass Spectrom. 2005, 16, 1976-1984.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1297

acid standard was kindly supplied by F. Hoffmann-La Roche (Basel, Switzerland). Exposure of Human Subjects to Acetaminophen. A single therapeutic dose of acetaminophen (Ratiopharm, 500 mg/person) was orally administered to five healthy male and five female human subjects (age range 24-39 years). In a second study, a 10-fold lower dose of acetaminophen (Ratiopharm, 50 mg/person) was given to 15 healthy female and 10 healthy male human subjects (age range 20-60 years). All subjects enlisted in the study had to refrain from alcoholic beverages and medicinal drugs two days before and throughout the experiment. Subjects did not abuse alcohol and were either nonsmokers or only occasional smokers. The study was carried out according to the Declaration of Helsinki, after approval by the Regional Ethical Committee of the University of Wuerzburg, Germany, and after written informed consent by the subjects. Urine samples from the test persons were collected in the morning on two consecutive days. Prior to the two overnight collecting periods of 8 h (periods I and II), the subjects were assigned to completely empty their bladder. Urine samples on the first morning served as the control samples (no administration of acetaminophen prior to collecting period I) and were thereafter collected under the same conditions as the samples after administration (oral administration of a 500-mg, respectively, 50-mg acetaminophen dose prior to collecting period II). After total urinary volume was determined, aliquots were stored at -20 °C until analysis. Sample Preparation of Human Urine. For the high-dose study (500-mg acetaminophen dose), 650 µL of acetonitrile (-20 °C) was added to 650 µL of thawed urine to precipitate proteins prior to HPLC-MS/MS analysis. After centrifugation at 4 °C and 14 000g for 20 min, 1.0 mL of the supernatant was dried under vacuum and reconstituted in 100 µL of distilled water, thus resulting in a 5-fold increase in concentration of the urine samples. After treatment in an ultrasonic bath (15 min at room temperature), samples were again centrifuged and aliquots of 20 µL were injected onto the column. Urine samples after administration of 50 mg of acetaminophen (low-dose study) were thawed and centrifuged at 4 °C and 14 000g for 20 min, and aliquots of 5 µL were directly taken for LC-MS/MS analysis. Creatinine Analysis. Analysis of creatinine in human urine was carried out using an in-house LC-ESI-MS/MS method working with an internal creatinine-d3 standard. Chromatography. HPLC separation was carried out on a ReproSil-Pur C18-AQ 5.0 µm, 150 × 2 mm, analytical column (Dr. Maisch, Ammerbuch, Germany) using an additional precolumn with C18 packing material (Phenomenex, Aschaffenburg, Germany). A linear gradient at a flow rate of 0.25 mL/min was used to elute metabolites of human urine (solvent A, acetonitrile; solvent B, 0.1% formic acid). Initial conditions were 5% solvent A, hold isocratically for 2 min followed by a linear gradient of 5-50% A in 23 min, and further to 90% A in 2 min. These conditions were held for 2 min followed by reequilibration in 11 min to the initial conditions. Instrumentation. The HPLC system consisted of a quaternary solvent pump combined with an autosampler Series 1100 (Agilent, Waldbronn, Germany). A QTRAP (Applied Biosystems/MDS Sciex, Concord, ON, Canada) with a TurboIonSpray source was 1298

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

used as mass spectrometer suitable for LC-MS/MS experiments in both negative and positive ion modes. A QqTOF (QSTAR XL, Applied Biosystems/MDS Sciex) was used with the TurboIonSpray source for full scan measurements. Constant Neutral Loss and Full Scan Survey Scans. Survey scans were performed with either a constant neutral loss scan (CNL of 129 Da) or full scan mode in the range of m/z ) 200-450 covering the mass range of expected drug metabolites. Source and gas settings for both survey scans were identical. The CNL method consisted of three experiments, namely, the survey scan (CNL), information-dependent acquisition (IDA) criteria, and two enhanced product ion (EPI) scans with various collision energies (CEs). CNL scans were carried out in the negative ion mode using a TurboIonSpray source (ESI). Source voltage was set at -4.2 kV, vaporizer temperature at 400 °C. The mass spectrometer operated with gas settings of 50 psi for turbogas, 45 psi for nebulizer gas, 30 psi for curtain gas, and 10 psi for collision gas. Scan time per cycle was 3.0 s with a pause between mass ranges of 5.0 ms for each scan mode. Resolution of Q1 was set to “unit” and resolution of Q3 to “low”. IDA criteria were set such that signals in the survey scan exceeding 1000 counts/s triggered an EPI. If more than one m/z value was above threshold, the most intense peak was selected. Former target ions were then excluded for at least 40 s. The mass tolerance was set to 250 mmu, both entrance potential and cell entrance potential at -10 V, declustering potential at -50 V, and cell exit potential at -2 V, respectively. For structural identification, two EPIs with different CEs were performed, with the first EPI operating with a CE of -20 V and the second EPI working with a CE of -60 V. Fragments formed in the EPI scans were detected in the range of m/z ) 50 to m/z ) 500 using dynamic fill and a scan rate of 4000 amu/s. Data Analysis. To perform multivariate data analysis (MVDA), a matrix of peaks present in the data files must be generated. For this purpose, mass/retention time pairs of all detected peaks are formed and aligned using mass and retention time tolerances. The resulting matrix has a column for each aligned peak and a row for each sample, each cell containing the appropriate peak area. This peak finding, peak alignment, and peak filtering was carried out with MarkerView software 1.0 (Applied Biosystems/ MDS Sciex). Both full scan and CNL data were processed using this software with the following parameters: minimum retention time 2.0 min, maximum retention time 35 min, noise threshold 500 counts/s, minimum spectral peak width 0.1 amu, minimum retention time peak width 3 scans, maximum retention time peak width 100 scans, retention time tolerance 1.0 min, mass tolerance 0.4 amu (CNL)/0.1 amu (TOF), and maximum number of peaks 500. The resulting aligned peak list containing the sorted peak areas was then exported to Excel 2002 (Microsoft Germany, Unterschleissheim, Germany) for normalization based on urine creatinine content. Chemometric analyses including PCA and PLS-DA models were performed by SIMCA-P software 10.5 (Umetrics, Umeå, Sweden) on the resulting three-dimensional data set (retention time/mass pair, sample identifier, area). For evaluation and comparison of the different methods, the data were either autoscaled (mean-centered and scaled to unit variance) or “pareto” scaled as described in the text.

Figure 1. Total ion current chromatograms (TIC) of a control urine sample from the high-dose study acquired in the negative ion mode (a) TIC of CNL. Peaks denote different kinds of mercapturic acids that all show a common CNL of 129 amu; (b) TIC of full scan/TOF data.

RESULTS AND DISCUSSION LC-MS Analysis with Constant Neutral Loss and Full Scan Mode. The first step in evaluating the new mercapturic acid profiling approach was to compare it to conventional LC-MS fingerprinting using full scan analysis on a TOF instrument. For this purpose, human urine samples, either after exposition to 500 mg of acetaminophen or control urine, were analyzed using either a CNL scan on a QTRAP or full scan mode with TOF detection on a QqTOF. To be able to compare the different approaches, HPLC and MS parameters were kept the same as far as possible. Analyses were carried out in the negative ion mode since sensitivity was higher than in the positive ion mode, and only in this mode was a common CNL observed for all mercapturic acids. The HPLC parameters used enabled the detection of both polar and nonpolar components, hydrophilic and lipophilic mercapturic acids in the case of CNL scans, and unknown endogenous and exogenous metabolites, in the case of full scan data. Figure 1A shows an LC-ESI-MS/MS chromatogram (TIC of CNL) of a control urine sample acquired in the negative ion mode. Many endogenous metabolites with a CNL of 129 amu, likely mercapturic acids, are observed in the chromatogram. Many more analytes were found when full scan mass spectra were obtained from the urine samples (control urine, Figure 1B). Urine is a complex mixture of endogenous and exogenous metabolites, and therefore, it is almost impossible to adequately separate all analytes via liquid chromatography; thus peaks may overlap each other or cluster together, especially in full scan mode. However, by focusing on the profiling of mercapturic acids using CNL scanning, this drawback of traditional LC-MS approaches could be minimized. In this particular case, the chromatography could be optimized

since every mercapturic acid contains an identical N-acetylcysteine moiety and the chromatographic behavior of this class of compounds will be similar. Data Preprocessing and Alignment. MVDA is a chemometric tool used to investigate metabolic fingerprints or metabolite profiles in complex matrixes such as urine and plasma. This tool was applied to differentiate human urine samples (control and postdose) that were obtained from the acetaminophen exposure test study (500-mg dose) and analyzed in either CNL or full scan mode. MVDA requires that the raw data undergo some preprocessing to generate a data matrix, the columns of which represent all variables (mass-retention time pairs for detected peaks found in at least one sample) and the rows of which contain the observations (samples) that are included for analysis. Both peak finding and this “peak alignment” step were performed using a new software tool called MarkerView, which is capable for both CNL and full scan data. Each cell of the matrix contains the peak area of a particular variable in the sample, resulting in a threedimensional data set. If a peak (variable) is not present in the sample, the cell value is zero. These peak finding and alignment steps are the most critical points in the field of MVDA and pattern recognition. Although success of the whole metabonomics project depends on the selection and optimization of the parameters for detection and alignment, these parameters of LC-MS data preprocessing are rarely described in the literature. Often data preprocessing is completely unreported and PCA or PLS-DA models are carried out on raw data sets that cannot be evaluated by the reader. So parameters were optimized step-by-step until Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1299

Table 1. Data Preprocessing Parameters for Peak Detection, Peak Alignment, and Filtering of CNL and Full Scan/TOF Data CNL data

TOF data

Peak Finding minimum retention time (min) maximum retention time (min) noise threshold (counts/s) minimum spectral peak width (amu) minimum peak width (scans) maximum peak width (scans)

2.0 35.0 500 0.1 3 100

2.0 35.0 500 0.1 3 100

Peak Alignment, Filtering retention time tolerance (min) mass tolerance (amu) maximum number of peaks

1.0 0.4 500

1.0 0.1 500

the aligned peak lists were in best agreement obtained from manual inspection of selected mass chromatograms. Due to the void volume and reequilibration of the column, the peak finding process was limited to a time window of 2-35 min. Only chromatographic peaks whose width was between 3 and 100 scans were considered valid. The intensity threshold was estimated by extracting several masses from the chromatograms and investigating the thresholds. The mean threshold of all extracted ion chromatograms was 500 count/s and set as default. For CNL and full scan/TOF experiments, the mass tolerance for alignment was set to 0.4 and 0.1 amu, respectively. To avoid spikes being spuriously detected as peaks, a minimum spectral peak width of 0.1 amu seemed to be optimal. The retention time tolerance was the most critical parameter in the peak alignment process. Even if the retention time was stable and varied less than 0.1 min over all samples, setting the retention time tolerance too low, e.g., 0.2 min, resulted in very low resolution of peak clusters. Therefore, the tolerance was optimized in a stepwise manner (0.2-min steps) by reviewing the results of the peak picking and comparing them to the original chromatograms. The best resolution of clusters and poorly resolved peaks was accomplished using a large retention time tolerance, and further work was carried out setting this critical parameter to the optimum of 1.0 min. No further information concerning classification or groupings in the scores or loadings plots was obtained by setting the maximum number of peaks higher than 500, so it was set to 500. A table of the parameters that were used is given in Table 1. Each data set was subsequently processed using these optimized and unchanging parameters. Peak areas of the resulting aligned peak lists were normalized by creatinine content of the samples prior to chemometric analysis, thus taking differences in urine volume into account. After preprocessing of the data, the normalized aligned peak lists were exported to SIMCA-P for MVDA. Principle Component Analysis. For comparison of CNL and full scan data sets, PCA, an unsupervised method for pattern recognition, was performed first. Unsupervised means that no prior knowledge concerning groups or tendencies within the data sets is necessary. After autoscaling (mean-centering and scaling to unit variance), the data were displayed as scores (ti) and loadings (pi) in a coordinate system of latent variables called principle components (PCi) resulting from the data reduction step. Although relatively uncommon in spectroscopic data handling, it 1300 Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

was very important to perform autoscaling of the data sets because the variables (metabolites) are qualitatively and quantitatively different in CNL and full scan mode. Scaling to unit variance within autoscaling is the most objective method to scale data and avoids variables with high intensities being necessarily considered as more important than those with low intensities. Figure 2 illustrates the PCA scores plots that were generated from the CNL scan and full scan mode, respectively, where the ellipse marks the 95% Hotelling T2 control chart. With n ) 20 samples, 20 × 0.05 ) 1 observation statistically lies outside the critical limit and represents a so-called “strong outlier”. The scores plot of CNL data (Figure 2A) shows that PC1 has the ability to clearly distinguish the control from the dosed group, whereas PC2 characterizes interindividual differences. PC2 shows a clear variation in the endogenous metabolite content, since after normalization by creatinine, any variations in peak areas resulting from differences in urine volume have been eliminated. Both principle components are significant: PC1 accounts for 11.4% of the total variance and PC2 for 9.7%. There is one strong outlier, a control urine sample, in the data. Figure 2B shows the scores plot of the corresponding full scan data. Here PC2 leads to classification of the two groups and accounts for 13.8% of total variance, whereas PC1 denotes interindividual differences. It is remarkable that both groups overlap in this plot. The high value of PC1, 26.2% of total variance, is attributed to one of the strong outliers. The autoscaling process leads to relatively low variation in PC1 and PC2, but this has to be accepted for the purpose of comparison of the two approaches. The scores plots clearly illustrate the strength of the mercapturic acid profiling approach. Even with unsupervised PCA, the control and dosed groups can be separated. Although the discriminating principal components are of similar size, 11.4 and 13.8%, respectively, the scores plot of the CNL data shows better classification and no overlapping of the groups as compared to the full scan data plot. Both plots reveal the same control urine as a strong outlier. Therefore, it could be assumed that it is the biochemical composition of this urine sample that is different from that of the clusters and that statistical reasons are not responsible for classifying this sample as an outlier. Partial Least Squares Discriminant Analysis. For further investigation of the two approaches, PLS-DA, a supervised method, was applied to the two data sets. To test the robustness of the CNL and full scan methods, outliers were not excluded prior to chemometric analysis. Both scores plots show two clusters that do not overlap (data not shown). Each model detected one urine sample as an outlier. The same sample as the outlier found in the PCA plots. The deviation in the Y-matrix (R2(Y)) is a crucial factor in interpreting PLS-DA plots. R2(Y) of PC1 (the principal component that leads to classification) accounted for 94.5% of total Y-variance in CNL data whereas R2(Y) of PC1 of the full scan data counted for only 77.2%. Cross-validation factors (Q2(Y)) are calculated in SIMCA-P software by default dividing the data set into seven groups. One group is deleted and a model of the remaining data is established. In a second step, the group that was first removed is predicted by the model. In this fashion, parallel models are performed and samples are predicted. The predictive ability of the whole model is then characterized by the Q2 parameter. Q2 for the Y-matrix was computed to be 64.4% for CNL and 58.2%

Figure 2. PCA scores plots of the autoscaled data obtained from the high dose study (n ) 20). (A) CNL data: both principal components are significant and account for 11.4 and 9.7% of total variance, respectively. (B) Full scan/TOF data: PC1 accounts for 26.2%, PC2 for 13.8% of total variance; both components are significant. PC2 leads to discrimination of the clusters but with distinct overlapping. The ellipse marks the 95% Hotelling T2 control chart: (9) control group; (2) postdose group.

for TOF data. It can be concluded from both the PCA and the PLS-DA analyses that, despite varying methods, the results of the classical and the new approach were similar with a certain predominance of the mercapturic acid profiling. Potential Biomarkers. The loadings plots reveal those variables that are responsible for classification in the scores plots and are therefore potential biomarkers. The further away the variables are from the origin, the stronger their effect on dis-

crimination. The detection of potential biomarkers was difficult since data were autoscaled for model comparison. Loadings plots of autoscaled data show widely scattered variables, and therefore, this method is not the scaling procedure of choice for biomarker recognition.34 Therefore, a second PLS-DA model of the same data was generated using pareto scaling, an alternative to the mean(34) Cloarec, O.; Dumas, M. E.; Trygg, J.; Craig, A.; Barton, R. H.; Lindon, J. C.; Nicholson, J. K.; Holmes, E. Anal. Chem. 2005, 77, 517-526.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1301

Figure 3. Biomarker identification on the basis of a PLS-DA model of the pareto scaled CNL data (high-dose study). (A) The loadings plot reveals the expected acetaminophen mercapturic acids with m/z ) 311.2, 312.2, 313.1, and 327.1 denoted by a box symbol. (B) The time series plot indicates the AAP-MA with m/z ) 311.2 as biomarker of exposure as it occurs only in the postdose samples. The fluctuations in this group are probably due to interindividual differences in drug metabolism. Key: (m) male, (f) female, (c) control, and (e) postdose group.

centering and autoscaling procedures. Pareto scaling is an appropriate scaling tool for MS data that compensates to some extent for the significance of the absolute intensity.34 The dosed group in both CNL and full scan data is classified by negative PC1 scores in the scatter plot (data not shown). Characteristic variables of this group, therefore give highly negative loadings in PC1 and low positive or negative loadings in PC2. The PLS-DA loadings plot of the CNL data unequivocally 1302 Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

shows the variables m/z ) 311.2, 312.2, 313.1, and 327.1 as potential biomarkers (Figure 3A). The [M - H]- ion with m/z ) 311.2 and a retention time of 12.9 min corresponds to the expected mercapturic acid of the electrophilic N-acetyl-p-benzoquinonimine, the major reactive metabolite of the acetaminophen drug, together with its isotopic pattern (m/z ) 312.2 and 313.1) as confirmed by an authentic standard. The ion with m/z ) 327.1 shows an EPI spectrum similar to that of m/z ) 311.2 and was tentatively

Figure 4. PLS-DA scores plot of the low-dose samples obtained from pareto scaled CNL data. PC1 counts for 17.8%, R2(Y)PC1 for 52.9%. Only the first component is statistically significant. The ellipse marks the 95% Hotelling T2 control chart. Key: (9) control group; (2) postdose group.

identified as 3-hydroxyacetaminophen mercapturic acid, a second metabolite of acetaminophen described in the literature.35,36 Unambiguous identification of this second metabolite was not performed, since this was not within the scope of this work. The loadings plot of the full scan data is quite complex (data not shown). Although data were normalized by creatinine content and pareto scaled in the same way as for the CNL data, the variables were evenly distributed and far away from the origin. Due to the complexity of this loadings plot and tailing effects, the correct biomarker (AAP-mercapturic acids) could not easily be detected. There are a multitude of metabolites in urine that have a stronger impact on the model and are therefore located further away from the origin than the mercapturic acid under investigation. This minor metabolite would probably be overlooked using the conventional full scan profiling method due to the higher intensities of major metabolites such as glucuronides and sulfates or endogenous metabolites present in urine after acetaminophen exposure. Biomarker Identification. The loadings plots from the PCA or PLS-DA model provide variables (biomarkers) that are responsible for classification in the scores plot, e.g., the AAP-MA with m/z ) 311.2. In addition, time series plots show the trend of a variable throughout all samples such as time or dose dependence and confirm the plausibility of a particular variable as a biomarker. The “step” shape of the graph in Figure 3B for the CNL data shows that the AAP-MA is present only in the postdose samples. The intersubject variability in drug metabolism explains the fluctuations in the postdose area. To be able to understand the affected biochemical pathways, it is imperative to identify the biomarkers after they are detected. (35) Chen, W.; Koenigs, L. L.; Thompson, S. J.; Peter, R. M.; Rettie, A. E.; Trager, W. F.; Nelson, S. D. Chem. Res. Toxicol. 1998, 11, 295-301. (36) Factor, S. A.; Weiner, W. J.; Hefti, F. Ann. Neurol. 1989, 26, 286-288.

For LC-MS metabolite profiling approaches, the high mass resolution of TOF detection was previously used almost exclusively for biomarker identification. The calculation of the elemental composition of a compound based on the exact mass may give an idea about the chemical structure, and for many catabolic compounds, general databases may be available,37 but are yet lacking for the majority of the MS data.38 Recently, Yang et al. described a metabolite profiling approach that uses “enhanced” MS survey scans on a hybrid quadrupole linear ion trap. In a second experiment, EPI mass spectra were acquired to get additional structural information on the class of interest, i.e., phospholipids, detected in the survey scan.6 This may be a reasonable approach if no databases are available, for example, for metabolites derived from unknown xenobiotics. Therefore, the approach presented here uses CNL survey scans allowing a preselection of potential biomarkers due to a common structural moiety (N-acetylcysteine) that is fixed for all metabolites. Structural information of the moiety pertaining to the xenobiotic is provided by additional EPI mass spectra using, for example, the IDA mode as described recently.33 Low-Dose Experiment for Proof of Concept. For proof of concept, a second study was performed. In contrast to the first study, in this case, the dose of acetaminophen was reduced 10fold and administered to 25 volunteers. Urine was not concentrated, and the injection volume was reduced from 20 to 5 µL to evaluate the sensitivity of the method. The PLS-DA scores plot of the CNL data after pareto scaling is shown in Figure 4. The outliers of the overlaid PCA scores plot were not excluded from PLS-DA in order to test the robustness of the model. Samples (37) Lenz, E. M.; Bright, J.; Knight, R.; Wilson, I. D.; Major, H. J. Pharm. Biomed. Anal. 2004, 35, 599-608. (38) Williams, R. E.; Lenz, E. M.; Evans, J. A.; Wilson, I. D.; Granger, J. H.; Plumb, R. S.; Stumpf, C. L. J. Pharm. Biomed. Anal. 2005, 38, 465-471.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1303

Figure 5. Validation of the low-dose CNL data (n ) 50) by a pareto scaled PLS-DA model. N ) 34 samples from both control (9) and postdose groups (2) were chosen at random for the trainings set. The test set consisted of the remaining 16 samples (b), and Y was predicted on the basis of the trainings set model. Class membership was assigned on the basis of a priori offset of 0.5 and a scaling factor of 0.5. Key: (m) male, (f) female, (c) control, and (e) postdose group.

were largely divided into clusters by the first latent variable (PC1). The second principal component was not statistically significant but was included in the PLS-DA model for facility of inspection. R2(Y)PC1 and Q2(Y)PC1 accounted for 52.9 and 26.5%, respectively, of the total variance. There is apparently more variance in the postdose samples than in the control samples. The corresponding loadings plot of the PLS-DA model (data not shown) clearly highlights the mercapturic acid of AAP as the variable with highest influence on discrimination. Validation was carried out by removing one-third (test or prediction set, n ) 16) of the data at random. The remaining twothirds of the data (work or training set, n ) 34) was then processed using the resulting PLS-DA model to predict the test set. Predictions were carried out on a scatter plot with sample number plotted against the predicted Y values. Class membership was assigned on the basis of a 0.5 offset and a scaling factor of 0.5. According to this, a sample from the test set was assigned as “control” or class 1 if the predicted Y value was between 0.5 and 1 and as “dosed” or class 0 if the Y value was lower than 0.5 (Figure 5). This procedure was repeated five times generating five independent and random test sets out of the entire data set. The different PLS-DA models showed excellent prediction rates (correct classification of the test set samples) of 75-100% attributed to 12-16 correct classifications out of 16 yielding in a correct rate (correct classification of all samples) of 92-100% (4650 out of 50). For further evaluation, cross-validation of training and test sets were performed, denoting the training set as test set and vice versa. Prediction rates of 82.8-91.2% (28-31 out of 34) and correct rates of 88-94% (44-47 out of 50) showed high sensitivity and specificity combined with high robustness of the model. 1304 Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

Performance of Data Filtration. OSC is a data filtration tool for metabonomics, metabolic fingerprinting, and metabolite profiling, which can significantly improve PCA and PLS-DA performance. OSC removes orthogonal, and thus uncorrelated, information of the Y-matrix from the model, yielding in a better discrimination of the clusters. The Y-matrix is a so-called dummy matrix that assigns either 0 or 1 to each sample depending on class membership (1 for control, 0 for dosed group or vice versa). By using this procedure, the model (X-matrix) is signal-corrected for extraneous variation in the samples that is not of interest for classification. Common confounders that are removed this way could be gender, age, weight, genetic defects, state of health, lifestyle, environment, and so forth. In general, OSC filtered models serve as a beneficial tool to exclude interindividual variability, confounders that may interfere with chemometric analysis. Indeed, for the low-dose acetaminophen study, R2OSC and Q2OSC for PC1 (pareto scaling for X, autoscaling for Y) increased significantly to 89.8 and 64.7%, respectively, a total increase of 36.9 and 38.2% compared to the unfiltered PLS-DA model. OSC processing has an interesting secondary effect. Signal filtering divides the data set into two subsets, the filtered X-matrix from which the orthogonal data were removed and the orthogonal data that were removed from the X-matrix. The orthogonal data matrix was used to investigate the extraneous information content of the samples. Data filtration was carried out by removing two orthogonal components (angles 90.00° and 89.99°) leaving 79.4 and 67.9% of the sum of squares (variation) in the X-block that is 20.6 and 11.5% of total variance counted for background information. Plotting the two first OSC components illustrates the separation of male and female urine samples (Figure 6) along PC2.

Figure 6. Scores plot of the removed orthogonal data that was filtered from the initial PLS-DA model showing the discrimination of male (O) and female (b) samples. OSC-PC1 counts for 20.6%, OSC-PC2 for 11.5% of total variance.

This result has been previously described in the literature and is not very surprising since conventional screening approaches also reveal gender-related markers such as the “usual suspects” or steroids.15 However, it is the first time that this has been reported for a mercapturic acid screening approach. Further investigation needs to be done on the identification of gender-related mercapturic acids. Since this gender differences are considered to be of complex nature, a larger data set has to be analyzed for this purpose. CONCLUSION Metabonomics or metabolic profiling studies are difficult to perform in humans by classical approaches such as 1H NMR or full scan mass spectrometry methods due to a large variety of confounders. This possibly explains the scarcity of data available in this area on human subjects. A potential approach for avoiding these problems is the preselection of markers that are used for metabolic fingerprinting. This can be done by modern mass spectrometers using scan modes such as CNL, precursor ion, or MRM. A very interesting class of metabolite are the mercapturic acids formed after cleavage and acetylation of glutathione adducts. The glutathione pathway is the most important route for the detoxification of reactive metabolites, and therefore, MAs are important effect markers that (39) Qu, J.; Liang, Q.; Luo, G.; Wang, Y. Anal. Chem. 2004, 76, 2239-2247.

reflect the electrophilc burden of an organism. Mercapturic acids yield a common CNL of 129 Da, which can be used for preselection of this marker class. Due in part to a new algorithm, the CNL data can be easily exported to a statistical software tool for chemometric analysis and visualization. A model high-dose and a low-dose study served as an evaluation tool to test the performance of the new mercapturic acid profiling approach. The results presented together with the validation and crossvalidation factors of the chemometric models showed the practicability of the method for metabolite profiling even at low-dose levels in humans and of discriminating between gender, respectively. In our opinion, the approach can easily be transferred to other metabolites such as glucuronides or sulfates in the area of toxicokinetics or to other areas where numerous samples have to be analyzed in complex matrices and where a CNL is available, such as that previously published for glycosides in dry plant samples.39 ACKNOWLEDGMENT This work was supported by the Deutsche Forschungsgemeinschaft (VO 860/2-2).

Received for review September 23, 2005. Accepted November 22, 2005. AC051705S

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1305