Anal. Chem. 2009, 81, 2506–2515
Virtual Quantification of Metabolites by Capillary Electrophoresis-Electrospray Ionization-Mass Spectrometry: Predicting Ionization Efficiency Without Chemical Standards Kenneth R. Chalcraft, Richard Lee, Casandra Mills, and Philip Britz-McKibbin* Department of Chemistry, McMaster University, Hamilton, Ontario, L8S 4M1, Canada A major obstacle in metabolomics remains the identification and quantification of a large fraction of unknown metabolites in complex biological samples when purified standards are unavailable. Herein we introduce a multivariate strategy for de novo quantification of cationic/ zwitterionic metabolites using capillary electrophoresiselectrospray ionization-mass spectrometry (CE-ESI-MS) based on fundamental molecular, thermodynamic, and electrokinetic properties of an ion. Multivariate calibration was used to derive a quantitative relationship between the measured relative response factor (RRF) of polar metabolites with respect to four physicochemical properties associated with ion evaporation in ESI-MS, namely, molecular volume (MV), octanol-water distribution coefficient (log D), absolute mobility (µo), and effective charge (zeff). Our studies revealed that a limited set of intrinsic solute properties can be used to predict the RRF of various classes of metabolites (e.g., amino acids, amines, peptides, acylcarnitines, nucleosides, etc.) with reasonable accuracy and robustness provided that an appropriate training set is validated and ion responses are normalized to an internal standard(s). The applicability of the multivariate model to quantify micromolar levels of metabolites spiked in red blood cell (RBC) lysates was also examined by CE-ESI-MS without significant matrix effects caused by involatile salts and/or major co-ion interferences. This work demonstrates the feasibility for virtual quantification of low-abundance metabolites and their isomers in real-world samples using physicochemical properties estimated by computer modeling, while providing deeper insight into the wide disparity of solute responses in ESI-MS. New strategies for predicting ionization efficiency in silico allow for rapid and semiquantitative analysis of newly discovered biomarkers and/or drug metabolites in metabolomics research when chemical standards do not exist. There is growing interest in developing metabolomics as a complement to functional genomic studies that are required for * To whom correspondence should be addressed. Fax: +1-905-522-2509. E-mail:
[email protected].
2506
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
new advances in drug development,1 disease diagnosis,2 environmental toxicology,3 and agriculture.4 Two instrumental platforms widely used in metabolomics research include nuclear magnetic resonance (NMR) and mass spectrometry (MS), which provide quantitative and qualitative information suitable for comprehensive metabolite analyses.5,6 Electrospray ionization-mass spectrometry (ESI-MS) is the method of choice for direct analysis of polar metabolites due to its high sensitivity and direct compatibility with separation techniques, such as liquid chromatography (LC)7 and capillary electrophoresis (CE).8 A major challenge in metabolomics remains the identification of a large fraction of unknown yet biologically relevant metabolites that do not correspond to known candidates within conserved metabolic pathways. In cases when no match is found within public databases,9 several empirical candidate structures can be deduced from accurate mass, isotopic composition, and fragmentation information.10,11 However, reliable quantification of novel metabolites remains elusive while they are not commercially available, difficult to synthesize, or costly to purify. This dilemma is considerable when less than 10% of total metabolite peaks detected in biological samples can be quantified due to limited access to chemical standards.12-14 Thus, new strategies that permit direct quantification of recently identified metabolites (e.g., biomarkers, xenobiotics, etc.) based on their (1) Lindon, J. C.; Holmes, E.; Nicholson, J. K. FEBS J. 2007, 274, 1140–1151. (2) Dunn, W. B.; Broadhurst, D. I.; Deepak, S. M.; Buch, M. H.; McDowell, G.; Spasic, I.; Ellis, D. I.; Brooks, N.; Kell, D. B.; Neyses, L. Metabolomics 2007, 3, 413–426. (3) Lee, S. H.; Woo, H. M.; Jung, B. H.; Lee, J. G.; Kwon, O. S.; Pyo, H. S.; Choi, M. H.; Chung, B. C. Anal. Chem. 2007, 79, 6102–6110. (4) Lisec, J.; Schauer, N.; Kopka, J.; Willmitzer, L.; Fernie, A. R. Nat. Protoc. 2006, 1, 387–396. (5) Moco, S.; Bino, R. J.; Vos, R. C. H. d.; Vervoort, J. Trends Anal. Chem. 2007, 26, 855–866. (6) Lenz, E. M.; Wilson, I. D. J. Proteome Res. 2007, 6, 443–458. (7) Wilson, I. D.; Plumb, R.; Granger, J.; Major, H.; Williams, R.; Lenz, E. A. J. Chromatogr., B 2005, 817, 67–76. (8) Monton, M. R. N.; Soga, T. J. Chromatogr., A 2007, 1168, 237–246. (9) Wishart, D. S.; Tzur, D.; Knox, C.; Eisner, R. E.; Guo, A. C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S. Nucleic Acids Res. 2007, 35, D521–D526. (10) Kind, T.; Fiehn, O. BMC Bioinf. 2006, 7, 234–244. (11) Lim, H. K.; Chen, J.; Sensenhauser, C.; Cook, L.; Subrahmanyam, V. Rapid Commun. Mass Spectrom. 2007, 21, 1821–1832. (12) Styczynski, M. P.; Moxley, J. F.; Tong, L. V.; Walther, J. L.; Jensen, K. L.; Stephanopoulos, G. N. Anal. Chem. 2007, 79, 966–973. (13) Kind, T.; Tolstikov, V.; Fiehn, O.; Weiss, R. H. Anal. Biochem. 2007, 363, 185–195. (14) Soga, T.; Ohashi, Y.; Ueno, Y.; Naraoka, H.; Tomita, M.; Nishioka, T. J. Proteome Res. 2003, 2, 488–494. 10.1021/ac802272u CCC: $40.75 2009 American Chemical Society Published on Web 03/10/2009
putative chemical structure are needed in ESI-MS when purified standards remain unavailable. The development of ESI-MS15,16 as an efficient means for ionizing intact biological molecules has revolutionized modern instrumental analysis. Recent evidence17 supports an ion evaporation mechanism for describing gas-phase ion formation involving the desolvation of low molecular weight metabolites from charged droplets. Despite the explosive growth in ESI-MS applications, a fully quantitative model for predicting solute ionization response is still lacking.18,19 This is relevant given the wide disparity in solute ionization efficiency which can result in apparent responses that differ by over 3 orders of magnitude despite equimolar concentrations in solution.20 This phenomenon is clearly a reflection of the distinct physicochemical properties of solutes that impact gas-phase ionization processes under an electric field. Iribarne and Thomson21 first postulated that the variation of ion response in ESI-MS can be related to differences in the relative affinities of solutes for the charged droplet surface. Kebarle and co-workers later attributed differences in analyte sensitivity based on the rates of ion evaporation from ESI droplets due to factors associated with solute solvation energy and surface activity.22,23 An equilibrium partitioning model for ESI-MS was later introduced by Enke24 to describe solute ionization response via competitive displacement of analytes with other co-ions for the surface of a charged droplet where ion desorption is favored. To date, several reports have examined the relationship of various solute thermodynamic properties on ionization efficiency in ESI-MS, including nonpolar surface area, free energy of solvation, octanol-water partition coefficient, reversed-phase high-performance liquid chromatography (HPLC) retention time, and gas-phase proton affinities.25-29 However, most ESI models have had limited success in predicting the ionization efficiency of metabolites in complex sample matrixes due to two significant challenges: (1) univariate correlations of single physicochemical parameters to describe gas-phase ionization are often inadequate to model the behavior of diverse classes of solutes that differ significantly in terms of their charge, polarity, or size, and (2) real-world samples induce solute ionization suppression due to background matrix effects that are highly variable and sample-dependent. The latter issue can be addressed if appropriate off-line sample pretreatment is performed (e.g., desalting) prior to direct-infusion ESI-MS studies, or preferably if a high-efficiency separation technique is (15) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64–71. (16) Dole, M.; Hines, R. L.; Mack, L. L.; Mobley, R. C.; Alice, M. B.; Ferguson, L. D. J. Chem. Phys. 1968, 49, 2240–2249. (17) Nguyen, S.; Fenn, J. B. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 1111– 1117. (18) Cole, R. B. J. Mass Spectrom. 2000, 35, 763–772. (19) Kebarle, P. J. Mass Spectrom. 2000, 35, 804–817. (20) Chalcraft, K. R.; Britz-McKibbin, P. Anal. Chem. 2009, 81, 307–314. (21) Iribarne, J. V.; Thomson, B. A. J. Chem. Phys. 1976, 64, 2287–2294. (22) Tang, L.; Kebarle, P. Anal. Chem. 1993, 65, 3654–3668. (23) Sumner, J.; Nicol, G.; Kebarle, P. Anal. Chem. 1988, 60, 1300–1307. (24) Enke, C. G. Anal. Chem. 1997, 69, 4885–4893. (25) Henriksen, T.; Juhler, R. K.; Svensmark, B.; Cech, N. B. J. Am. Soc. Mass Spectrom. 2005, 16, 446–455. (26) Null, A. P.; Nepomuceno, A. I.; Muddiman, D. C. Anal. Chem. 2003, 75, 1331–1339. (27) Cech, N. B.; Krone, J. R.; Enke, C. G. Anal. Chem. 2001, 73, 208–213. (28) Cech, N. B.; Enke, C. G. Anal. Chem. 2000, 72, 2717–2723. (29) Amad, M. H.; Cech, N. B.; Jackson, G. S.; Enke, C. G. J. Mass Spectrom. 2000, 35, 784–789.
coupled to ESI-MS to resolve low-abundance metabolites from major co-ion interferences present in a sample mixture, including isomeric and isobaric ions. Recently, Caetano et al. explored several multivariate models for predicting the ion response of drug standards using 429 different molecular descriptors in order to determine the preferred ionization method in LC-MS, namely, ESI or atmospheric pressure chemical ionization (APCI).30 After model refining, only three molecular descriptors were found to be significant in predicting ion responses in LC-ESI-MS when using multivariate calibration. However, the physical significance of these low-order theoretical descriptors for influencing ESI response was unclear.30 In addition, the performance of the predictive model in quantifying solutes in biological samples was not examined. In this report, we introduce a multivariate strategy for predicting the relative response factor (RRF) of 46 different cationic/zwitterionic metabolites in CE-ESI-MS as a quantitative measure of their ionization efficiency. Three intrinsic physicochemical properties of a solute (i.e., molecular volume (MV), octanol-water distribution coefficient (log D), and absolute mobility (µo)) were identified as key variables positively correlated with a higher RRF when using multiple linear regression (MLR) with cross-validation. Measurement of the relative ion response (RIR) was a critical feature in this work for normalizing apparent solute responses that minimized long-term instrumental drift. Method robustness was examined by using predicted RRFs to quantify micromolar levels of training/test metabolites with reasonable accuracy (≈40% average absolute bias) 1 year after initial model development. The feasibility for semiquantitative analysis of lowabundance metabolites in complex biological samples using CE-ESI-MS was also investigated by spiking metabolite standards directly in filtered red blood cell (RBC) lysates. To the best of our knowledge, this is the first validated strategy that demonstrates the proof-of-concept for de novo quantification of metabolites by ESI-MS based on intrinsic physicochemical parameters derived from their chemical structure. EXPERIMENTAL SECTION Chemicals and Reagents. All metabolite standards were purchased from Sigma-Aldrich and used without further purification, including O-acetyl-L-carnitine (C2), adenine (Ad), adenosine (A), L-alanine (Ala), β-alanine (β-Ala), D-Ala-D-Ala (DiAla), γ-aminobutyric acid (GABA), p-aminobenzoic acid (PABA), L-arginine (Arg), L-asparagine (Asn), L-aspartic acid (Asp), atenolol (At), O-butyryl-L-carnitine (C4), L-carnitine (C0), L-carnosine (Carn), creatinine (Crea), p-chloro-L-tyrosine (ClTyr), L-citrulline (Cit), L-cystathionine (Cyst), 2,3-dihydroxy-L-phenylalanine (DOPA), dopamine (DopN), L-glutamic acid (Glu), L-glutamine (Gln), guanine (Gu), guanosine (G), histamine (HisN), L-histidine (His), L-homocysteine (HCy), 5-hydroxyl-L-tryptophan (OHTrp), L-leucine (Leu), L-isoleucine (Ile), L-lysine (Lys), metoprolol (Meto), methanol (MeOH), N-methyl-L-aspartic acid (MeAsp), 3-methyl-L-histidine (MeHis), 1-methyl-adenosine (MeA), L-methionine (Met), L-methionine sulfone (MetS), nicotinamide (NAm), nicotinic acid (NA), p-nitro-L-tyrosine (NTyr), n-octanol (OctOH), O-octoyl-Lcarnitine (C8), L-ornithine (Orn), oxytetracycline (Oxytet), oxi(30) Caetano, S.; Decaestecker, T.; Put, R.; Daszykowski, M.; Van Bocxlaer, J.; Vander Heyen, Y. Anal. Chim. Acta 2005, 550, 92–106.
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
2507
dized glutathione (GSSG), L-phenylalanine (Phe), L-threonine (Thr), tryptamine (TrpN), tyramine (TyrN), L-tryptophan (Trp), O-propionyl-L-carnitine (C3), L-tyrosine (pTyr), m-L-tyrosine (mTyr), o-L-tyrosine (oTyr), reduced glutathionine (GSH), serotonin (Sero), L-theanine (Thea), and L-valine (Val). Apparatus and Conditions. All experiments were performed on an Agilent CE system equipped with an XCT 3D ion trap mass spectrometer, an Agilent 1100 series isocratic pump, and a G16107 CE-ESI-MS coaxial sheath-liquid sprayer interface (Agilent Technologies Inc., Waldbronn Germany). All separations were performed on uncoated fused-silica capillaries (Polymicro Technologies Inc., Phoenix, AZ) with 50 µm i.d. and 80 cm total length using 1.4 M formic acid, pH 1.8 as the background electrolyte (BGE). All samples were prepared in 200 mM ammonium acetate, pH 7 (sample electrolyte), which were injected hydrodynamically into the capillary inlet for 75 s under low pressure (50 mbar). The sample injection plug length was equivalent to 6.2 cm or about 8% of the total capillary length, which provided an integrated method for on-line sample preconcentration of low-abundance metabolites during electromigration prior to ionization.20,31 CE separations were performed at 20 °C with an applied voltage of 25 kV. The sheath liquid consisting of 1:1 MeOH/H2O with 0.1% formic acid was supplied by the 1100 series isocratic pump at a flow rate of 10 µL/min. Nitrogen was used as both a nebulizing and a drying gas supplied at 6 psi and 10 L/min, respectively, whereas helium at 6 × 10-6 mbar was used as a damping gas for the ion trap. All MS analyses were performed using a 5 kV cone voltage in positive-ion mode (+ESI) at 300 °C. MS data was recorded within a range from m/z 50 up to m/z 800 using a target mass of m/z 250 with an ion charge control (ICC) target of 100 000 ions under an ultrascan mode of 26 000 m/z s-1. All calibration and validation studies were performed using fixed CE-ESI-MS settings described above, whereas different batches of BGE/sample and sheath liquid solutions were prepared frequently throughout the duration of the study (≈1 year). Fused-silica capillaries used for CE separations that also serve as the sample outlet in ESI had average life spans of about 1 month after initial conditioning for 20 min each using MeOH, 0.1 M HCl, deionized H2O, and BGE. The distal end of the capillary was precisely cut with a Shortix diamond capillary column cutter (SGT Middelburg, Holland) followed by burning a short segment (≈1 cm) section of the polyimide coating in order to expose the bare fusedsilica capillary outlet end that is then inserted into the coaxial sheath liquid interface. In general, the capillary was flushed with BGE for 10 min to ensure adequate rinsing prior to every sample injection. Evidence of capillary deterioration based on unstable currents after repeated buffer rinsing or conditioning signaled preparation of a new capillary. Experimental Design and Multivariate Analysis. Experiments were designed by randomly dividing a group of 58 polar metabolites into a training set (47) and a holdout or test set (10) with DiAla serving as the internal standard (IS). Metabolites were selected based on their compatibility to CE when using an acidic background electrolyte under suppressed electroosmostic flow (EOF) conditions, such that solutes had a net positive electro(31) Lee, R.; Ptolemy, A. S.; Niewczas, L.; Britz-McKibbin, P. Anal. Chem. 2007, 79, 403–415.
2508
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
phoretic mobility (µep) directed toward to the cathode/ion source.20 Several classes of cationic/zwitterionic metabolites and their isomers were chosen in order to cover a wide cross section of chemical diversity in terms of molecular weight, polarity, and charge density. Model refining was first performed to assess the optimum number of metabolites and variables to retain in the training set by a stepwise regression process in order to maximize the predictive accuracy of the model as determined by several criteria,32,33 such as high R2 and Q2, as well as low mean square error of calibration (MSEC) and mean square error of prediction (MSEP), which are summarized in Table S1 of the Supporting Information. An optimized training set of 36 metabolites and their three experimentally determined physicochemical parameters (i.e., MV, log D, and µo) were then used to predict the RRF of training and test metabolites using multivariate calibration. Multiple linear regression for correlating solute physicochemical parameters to the measured RRF of metabolites was performed by Excel (Microsoft Inc., Redmond, WA).34 Principal component analysis (PCA) and partialleast-squares (PLS) regression was also performed using a multivariate analysis add-in for Excel developed by Dr. Richard G. Brereton at Bristol University that is available for download.35 All multivariate analyses were processed using standardized data sets normalized to the inverse of their standard deviation whenever indicated, since it provides an effective way to scale parameters having different absolute values and ranges. PCA was performed as an unsupervised dimensionality reduction method for mapping intersolute variations among training/ test metabolites that was useful for identifying groupings, outliers, and qualitative trends reflective of their distinct chemical properties. MLR was selected as the preferred multivariate calibration method relative to PLS in this study as it generated higher Q2 and lower MSEP for predicted RRFs of the independent test set. All data processing involving electropherograms and nonlinear regression was performed using Igor Pro 5.0 (Wavemetrics Inc., Lake Oswego, OR). Measurement of Solute Physicochemical Parameters. In this work, four solute physicochemical parameters were initially selected as variables to predict the ionization efficiency of polar metabolites in ESI-MS, namely, MV, log D, µo, and effective charge (zeff). Further description of the methods used to measure these intrinsic solute properties are described in the Supporting Information, where Figure S1 and Table S2 summarize the experimentally determined parameters of metabolites examined in the original training set. Prediction of Solute Physicochemical Parameters. Model validation was performed on a test set of 10 metabolites not included in the original training set whose physicochemical parameters were estimated in silico as a way to demonstrate the feasibility for virtual quantification of metabolites by ESI-MS based on their chemical structure. In this work, computer molecular modeling was used to determine MV based on the Connolly solvent-excluded volume, which was performed using Chem3D (32) Aptula, A. O.; Jeliazkova, N. G.; Schultz, T. W.; Cronin, M. T. D. QSAR Comb. Sci. 2005, 24, 385–396. (33) Brown, M.; Dunn, W. B.; Ellis, D. I.; Goodacre, R.; Handl, J.; Knowles, J. D.; O’Hagan, S.; Spasic, I.; Kell, D. B. Metabolomics 2005, 1, 39–51. (34) Brereton, R. G. Analyst 2000, 125, 2125–2154. (35) Brereton, R. G. Chemometrics: Data Analysis for the Laboratory and Chemical Plant; Wiley and Sons Ltd., 2003.
Ultra software, version 8.0 (CambridgeSoft Inc., Cambridge, MA). All chemical structures were energy-minimized using an iterative molecular mechanics 2 (MM2) algorithm with molecular dynamics as a way to determine a stable molecular configuration prior to computing relevant parameters. As a measure of solute hydrophobicity and surface activity, log D for metabolites was predicted using ALOGPS 2.1 (Virtual Computational Chemistry Laboratory, http://www.vcclab.org). Overall, a good correlation between experimentally measured and predicted log D using ALOGPS 2.1 was demonstrated for model metabolites with a slope ) 0.828 ± 0.056 and R2 ) 0.8519. The Hubbard-Onsager hydrodynamic model for ion migration36 was used to predict µo, where the valence charge (z0) and MV of metabolites were used as input parameters as described in a previous report.31 The latter model was developed by performing nonlinear regression fitting to experimentally determined µo values associated with model metabolites resulting in coefficient terms of a ) 0.00345 ± 0.00087, b ) 5.9 ± 13, and c ) 0.484 ± 0.051 with a χ2 of 3.76 × 10-8. A linear correlation between predicted and measured µo values was observed as reflected by a slope ) 0.919 ± 0.037 and R2 ) 0.9341. Prediction of pKa of test metabolites was performed using ACD/Laboratory pKa DB (Advanced Chemical Development Inc., Toronto, Canada), which was then converted into zeff based on the pH of the BGE used for CE separations. A reasonably good predictive accuracy for pKa using ACD/ Laboratory pKa DB was confirmed by correlation of experimental and predicted pKa among weakly acidic model metabolites with a slope ) 0.780 ± 0.049 and R2 ) 0.8912. Table S3 of the Supporting Information summarizes the chemical structures and predicted physicochemical parameters for the test metabolites used for model validation, whereas Supporting Information Figure S2 compares the performance of log D and µo variables estimated by computer modeling relative to experimental values measured for the training set. Method Calibration and Metabolite Quantification. Calibration curves were constructed for training/test metabolites using CE-ESI-MS by measuring their average RIR in terms of integrated peak areas at six different concentration levels (2, 8, 16, 30, 50, 100 µM) in triplicate (n ) 3) relative to 50 µM DiAla as the IS.20 The measured RRF for each metabolite was then derived from the slope of the calibration curve (i.e., relative sensitivity) as determined by least-squares linear regression. The concentration range for calibration was selected such that the lowest response at 2 µM was above the limit of quantification (LOQ, S/N > 10) for the method to ensure adequate precision (coefficient of variation (CV) < 10%). The total relative peak area ratio for metabolites as derived from extracted ion electropherograms were measured for the molecular ion (MH+), as well as any significant fragment ions (e.g., amines, MH+ - 17) or isotope contributions (e.g., 35Cl and 37Cl) provided that they comprised more than about 5% of the base peak as highlighted in Figure S3 of the Supporting Information. This method of integration accounts for in-source collision-induced dissociation processes detected for certain metabolites so that all major gas-phase species constitute the total apparent ion response of a solute. Most metabolites were detected as their singly charged protonated molecular ion (i.e., (36) Vcˇela´kova´, K.; Zuskova´, I.; Kenndler, E.; Gasˇ, B. Electrophoresis 2004, 25, 309–317.
MH+) by CE-ESI-MS without the formation of different salt adductscommonlyobservedindirect-infusionESI-MSexperiments. RBC Lysate Preparation and External Model Validation. Standard solutions involving 26 different training/test metabolites were spiked directly into filtered RBC lysates in order to assess the accuracy and robustness of the multivariate model for quantifying micromolar levels of metabolites in complex biological samples. Spiking studies were performed on metabolites undetected in the original RBC lysate samples, which were done in triplicate (n ) 3) at low (5 µM), mid (25 µM), and high (60 µM) concentration levels. Further experimental details on the sample collection/pretreatment procedure for filtered RBC lysates are described in the Supporting Information. RESULTS AND DISCUSSION Quantitative Model for Solute Ionization in ESI-MS. Figure 1 depicts a multivariate model for describing solute ionization in CE-ESI-MS, where mixtures of ions are first separated within a narrow fused-silica capillary under an applied electric field. A coaxial sheath-flow ESI interface was used in this work to stabilize droplet formation,37 where a makeup solvent flow and nebulizer gas mix with the effluent from the distal end of the capillary tip, which is positioned orthogonal to the orifice entrance of the mass analyzer. Although this type of interface can decrease solute detectability via postcapillary sample dilution,38 it has distinct advantages for quantitative ESI-MS modeling in terms of its overall robustness by providing a stable yet homogeneous electrolyte composition for spray formation during separation thereby reducing the impact of sample matrix effects. CE also serves other functions to enhance ESI-MS performance, including the resolution of isobaric/isomeric ions, as well as on-line sample preconcentration with sample desalting for improved concentration sensitivity.20,31 These features are important since involatile salts and abundant comigrating ions can suppress solute ionization, which has hampered the development of quantitative ESI-MS models for predicting ion responses in real-world samples.39 Researchers have long considered ESI as an electrolytic cell where droplet charging is mediated via an electrophoretic process under an intense electric field which induces charge separation of electrolytes within solution.18,40 Recent evidence supports an ion evaporation mechanism17 for describing ESI involving the desorption of low molecular weight solutes from the surface of charged droplets. In this work, four fundamental solute physicochemical properties were explored as variables to predict the RRF of polar metabolites in ESI-MS. The µo and zeff of an ion were selected as fundamental electrokinetic parameters associated with the charge density of a metabolite, which can influence the rate of solute desorption (i.e., kS) into the gas phase.40 Indeed, these same parameters can also be used to simulate electromigration behavior in CE for qualitative identification of unknown metabolites based on their characteristic relative migration time (RMT).31,41 In addition, log D and MV of an ion (37) Manisali, I.; Chen, D. D. Y.; Schneider, B. B. Trends Anal. Chem. 2006, 25, 243–256. (38) Maxwell, E. J.; Chen, D. D. Y. Anal. Chim. Acta 2008, 627, 25–33. (39) Constantopoulos, T. L.; Jackson, G. S.; Enke, C. G. J. Am. Soc. Mass Spectrom. 1999, 10, 625–634. (40) Tang, L.; Kebarle, P. Anal. Chem. 1991, 63, 2709–2715. (41) Hrusˇka, V.; Jarosˇ, M.; Gasˇ, B. Electrophoresis 2006, 27, 984–991.
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
2509
Figure 1. Multivariate model for describing ion evaporation in ESI-MS, where thermodynamic (KS) and electrokinetic (kS) driving forces influence the apparent ionization efficiency of polar metabolites into the gas phase. The relative response factor (RRF) of a solute can be predicted based on intrinsic physicochemical properties associated with its chemical structure, such as MV, log D, µo, and zeff (pKa). Note that CE with a coaxial sheath liquid interface is important for resolving isobaric/isomeric ions, while providing stable electrolyte conditions during spray formation that minimize ionization suppression effects, which is critical to quantitative yet robust ESI-MS models applicable to complex biological samples.
were chosen as key thermodynamic/molecular descriptors reflective of the surface activity of solutes, which determine the tendency for ions to partition toward the droplet surface (i.e., KS) at the liquid-gas interface.24 The wide variability of ion responses in ESI-MS is highlighted in Figure S4 of the Supporting Information, where a 700-fold difference in ionization efficiency was measured for metabolites used in our original training set despite their equimolar concentration in solution. For instance, β-Ala and Meto were solutes measured with the lowest and highest RRF, respectively, whereas significant differences in RRF were also noted among certain isomeric metabolites, such as oTyr and pTyr. These observations emphasize the large variability in ESI responses among biologically relevant metabolites, where major and subtle structural changes in chemical structure can significantly impact solute ionization efficiency. Due to long-term changes in spray stability, solution composition, and capillary alignment, which can result in significant changes to apparent ion signals,42 all responses were quantified in terms of their RIR, where the solute response at a specific concentration was normalized to an IS (50 µM diAla). The major hypothesis in this work is that the RRF of a metabolite can be predicted from a limited set of intrinsic solute parameters associated with ion evaporation processes in ESI-MS, which impact the relative population of ions in the gas-phase. However, the model depicted in Figure 1 assumes that the mass analyzer itself does not contribute to bias in ion sampling/transmission of solutes prior to their detection. Qualitative Trends in Metabolites for Model Training. Figure 2a depicts a 2D scores plot from PCA analysis of 47 different training/test metabolites (including IS) after model refining based on their four intrinsic physicochemical parameters. (42) Leito, I.; Herodes, K.; Huopolainen, M.; Virro, K.; Kummapas, A.; Kruve, A.; Tanner, R. Rapid Commun. Mass Spectrom. 2008, 22, 379–384.
2510
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
The scores plot provides an effective way to distinguish polar metabolites in terms of their distinct chemical properties, whose coordinates are determined by the relative weighting of variables depicted in the loadings plot as shown in Figure 2b. The loadings plot also reveals that zeff is an intrinsic solute property strongly collinear with µo. The former parameter was subsequently excluded during multivariate calibration since it did not contribute significantly to the predictive accuracy of the model (refer to Table S1 of the Supporting Information). Overall, the significance of PC1 is inferred as being associated with the effective charge density and hydrophilicity of an ion (µo, log D), which generally increases from left (NTyr) to right (HisN), whereas PC2 is related to the average molecular size of an ion (MV), which increases from bottom (Ad) to top (C8) of the scores plot. Indeed, PCA offers an insightful qualitative map for charting the influence of specific modifications on the chemical properties of metabolites, ranging from biogenic amines, modified amino acids, nucleosides/purines, small peptides to medium-chain acylcarnitines. Also, the impact of common metabolic transformations can be readily tracked by both the magnitude and direction of change in their coordinates in Figure 2a, such as from His to MeHis (3-methylation) or Carn (β-Ala-His dipeptide), from pTyr to NTyr (3-nitration) or TyrN (decarboxylation), or from C0 to C2 (O-acetylation) or C8 (Ooctanoylation). In addition, different classes of metabolites with similar overall chemical properties can be revealed by their overlaid coordinates in a scores plot, such as Ad (purine base) and NAm (vitamin B derivative). In fact, PCA using the same four solute variables provides an effective way to screen for an appropriate yet balanced group of test metabolites during model development, where solute outliers having chemical properties distinct from the training set can be rationally excluded, such as GSSG and Quin (refer to Figure S1 of the Supporting Information).
Figure 2. (a) 2D scores plot from PCA highlighting the relationship of 47 different training (b) and test (2) metabolites in terms of their four (standardized) measured or estimated physicochemical properties (MV, log D, µo, and zeff), respectively, relative to the internal standard (9). (b) 2D loadings plot highlights the relative contribution of each physicochemical parameter on the coordinates of metabolites in the scores plot. The impact of specific chemical modifications (e.g., methylation, nitration, hydroxylation, etc.) on changes in the surface activity (i.e., MV, log D) and charge density (i.e., µo and zeff) of different classes of cationic/zwitterionic metabolites is effectively depicted on this multivariate map.
The scores plot in Figure 2a highlights the central position of the IS (i.e., DiAla) used for normalizing apparent ion responses in ESI-MS, which ideally possesses chemical properties and ionization behavior representative of the training/test set of metabolites. DiAla also has useful electromigration properties relative to the training/test metabolites (refer to Figure 4a) for determination of solute RMT in CE separations with excellent precision (0.82), low MSEC and MSEP (0.90) with an average measured/predicted RRF ratio of 1.08 ± 0.75. Overall, there were only three metabolites that had a significant bias in their predicted RRF (e.g., ratio >2), namely, Ad, Glu, and Gln. Figure 3b compares the weighting of three intrinsic solute parameters (i.e., standardized variables) that were positively correlated with higher RRF for the training set of metabolites when using MLR calibration. In this work, MV and log D were two statistically significant (P < 0.05) variables associated with RRF that highlight the importance of solute surface activity for enhancing ionization efficiency.22,23 The major impact of molecular size for predicting RRF was rather unexpected given the ease at which it can be determined in silico for small molecules. Indeed, MV represents a versatile solute property relevant to physical models describing ion solvation energy in water,44 ion mobility in electrophoresis,45 as well as ion surface affinity in ESI-MS.46 For instance, singly charged ions (i.e., MH+) with larger MV tend to have a lower charge density and weaker solvation energy while occupying a higher fraction of the charged droplet surface that facilitates their desorption into the gas phase with greater ionization efficiency. Although µo was a variable of lower significance (P > 0.05) for predicting solute RRF for metabolites in the full training/test set, its inclusion was found to improve model (44) Dill, K. A.; Truskett, T. M.; Vlachy, V.; Hribar-Lee, B. Annu. Rev. Biophys. Biomol. Struct. 2005, 34, 173–199. (45) Roy, K. I.; Lucy, C. A. Electrophoresis 2003, 24, 370–379. (46) Wu, Z.; Gao, W.; Phelps, M. A.; Wu, D.; Miller, D. D.; Dalton, J. T. Anal. Chem. 2004, 76, 839–847.
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
2511
Figure 3. (a) Predictive accuracy of the multivariate calibration model as summarized by a linear correlation plot (y ) 0.945x + 0.0021, R2 ) 0.8737) between the measured relative response factor (RRF) by CE-ESI-MS and the predicted RRF using MLR regression for 46 different training (b) and test (2) metabolites. The graph inset (b) highlights that MV and log D (as standardized variables, where error bars represent (1σ) are statistically the most significant physicochemical parameters (P < 0.05, *) positively correlated with higher solute ionization efficiency. The MLR equation used to predict the RRF of metabolites based on their measured/predicted physicochemical parameters (standardized variables) was y ) [(1.58 ( 0.14) × 10-2](MV) + [(3.4 ( 1.3) × 10-3](log D) + [(1.7 ( 1.4) × 10-3]µo - (4.14 ( 0.074) × 10-2, where R2 ) 0.8284, MSEP ) 5.41 × 10-5, and Q2 ) 0.9936.
accuracy during subsequent external validation. In general, bulky hydrophobic solutes (i.e., large MV and high log D as located in top-left quadrant of Figure 2) were found to generate the highest RRF since they can more readily accumulate near the surface of charged droplets due to their higher surface activity,24 such as C8, At, and MeA. Nevertheless, bulky yet highly hydrophilic solutes (i.e., large MV yet low log D located in upperright quadrant of Figure 2) can still possess moderate RRFs due to electrokinetic factors that favor a faster rate of ion desorption22 due to their higher charge density (i.e., high µo), such as Arg, Cyst, and Carn. Overall, solutes with the lowest MV (located in the bottom of Figure 2) had the weakest RRF irrespective of their differences in log D or µo properties, such as Gu, HCy, and HisN. It is important to note that the impact of solute variables for predicting RRF in ESI-MS is highly dependent on the composition of metabolites selected for model training. For instance, when modeling a subset of 16 training metabolites composed of aromatic amines and neutral side-chain amino acids that had a more narrow distribution of MV, three solute variables were determined to be statistically significant (MV, log D, and µo with P < 0.05), where µo was the parameter having the largest coefficient positively correlated to higher solute RRF as shown in Figure S5 of the Supporting Information. External Validation of Model Accuracy and Robustness. Two subsequent external validation studies were also performed for assessment of the overall accuracy and long-term robustness of the multivariate model when using CE-ESI-MS. First, the analysis of 20 different polar metabolite standards (10 training/ 10 test solutes) was done at three different concentration levels (5, 25, and 60 µM) not used in the original calibration 1 year after model development under similar operating conditions. Virtual quantification of micromolar levels of metabolites was performed 2512
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
by calculating the ratio of the measured RIR (n ) 3) for a solute at a given concentration by CE-ESI-MS over its predicted RRF as determined by MLR calibration. Table 1 summarizes major solute physicochemical parameters as well as the overall performance of the multivariate model based on its accuracy in predicting solute concentrations compared to standard solutions of known concentration. Overall, an average absolute error of about 40% was achieved for metabolite quantification across all three concentration levels, including test solutes whose intrinsic parameters were estimated by computer modeling. A second external validation study was next examined to assess the applicability of the multivariate model to quantify low-abundance metabolites directly in biological samples without complicated offline sample pretreatment. This was performed by spiking 26 different metabolite standards (not originally detected in the sample) directly in filtered RBC lysates at three different concentration levels (5, 25, and 60 µM). In this case, CE was critical to the success of the quantitative ESI-MS model, which was originally trained on neat standard solutions in the absence of sample matrix effects. Figure 4a shows that training/test metabolites spiked at 5 µM level in the RBC lysate migrate within about a 6 min “interference-free” separation window between excess salt (e.g., Na+) and reduced glutathione (GSH), which represent major intracellular co-ion interferences. This time-dependent resolution permits the quantification of low-abundance metabolites by ESI-MS with high specificity, including several isobaric and isomeric ions (e.g., Carn/NTyr, Leu/Ile, Lys/Gln, oTyr/mTyr/ pTyr/MetS, etc.) without significant ionization suppression effects. The migration order of metabolites in CE is determined by their apparent electrophoretic mobility,31 where solutes with lower charge density elute from the capillary into the ion source
Figure 4. (a) Overlay of extracted ion electropherograms showing the analysis of 5 µM of training (red traces selected) and test (blue traces) metabolites spiked in filtered RBC lysates by CE-ESI-MS. Note that ionization suppression caused by sample matrix effects was not a significant problem since excess salt and GSH were resolved from low-abundance metabolites prior to ESI-MS. (b) Summary of the predictive accuracy of the multivariate model for metabolite quantification in filtered RBC lysates based on 26 different training (i) and test (ii) metabolite standards spiked in triplicate at 5, 25, and 60 µM (n ) 78), where the dashed lines represent the average measured/predicted concentration ratio and its standard deviation (1.03 ( 0.37). An overall predictive error of 49% was determined in this study that demonstrates the proof-of-concept for semiquantitative analysis of micromolar levels of metabolites in biological samples in cases when purified chemical standards are unavailable.
with longer migration times relative to the IS (i.e., RMT > 1.0). The apparent ion responses for metabolites in terms of their relative peak heights shown in Figure 4a are dependent on both the magnitude of their RRF as well as the extent of longitudinal diffusion that increases for solutes with longer RMTs. Figure 4b summarizes the overall performance of the multivariate model by comparing the measured/predicted concentration ratio of 26 metabolites at three levels when using measured and predicted RRFs, respectively, where quantification was performed for each solute in triplicate (i.e., average RIR at a defined concentration) by CE-ESI-MS. This study was aimed at evaluating the predictive accuracy of the model without corrections due to different solute recoveries from the sample matrix that averaged about 120%. The dashed lines in Figure 4b indicate that the average measured/ predicted concentration ratio and its standard deviation (±1σ) in
this study was 1.03 ± 0.37, n ) 78. Overall, an average predictive error of 49% was determined for quantifying the spiked metabolites in RBC lysates when using predicted RRFs from MLR calibration, which highlights the feasibility for semiquantitative analysis of micromolar levels of metabolites in complex biological samples. Although this degree of accuracy is not comparable to the performance achieved when using stable-isotope dilution in ESIMS, it does offer a simple, rapid, and direct way to estimate solute concentrations in silico with reasonable accuracy in cases when purified chemical standards and/or isotope-labeled internal standards are lacking. Caveats to Quantitative Models for ESI-MS. Our work demonstrates that de novo quantification of metabolites is achievable in ESI-MS provided that computational methods can offer reliable estimates for intrinsic solute properties (e.g., MV, log D, Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
2513
Table 1. Accuracy and Long-Term Robustness of the Multivariate Model for Virtual Quantification of 20 Different Training/Test Metabolites by CE-ESI-MS 1 Year after Initial Model Development as Determined by Three Intrinsic Solute Parameters
metabolite
MH+, fragment (m/z)
Leu Orn* PABA TryN Lys MeAsp* C0 MeHis* Thea* Sero Cit* C3* Cyst Carn* NTyr C4* At A MeA* C8*
132, 133 138 138, 147, 148 162 170 175 177, 176 218 223 227 227 232 267 268 286 288
86 121 130
160
RMT
MV (Å3)
1.105 0.747 1.050 0.895 0.752 1.472 0.921 0.814 1.247 0.948 1.198 1.007 1.053 0.712 1.321 1.032 1.132 1.061 1.108 1.132
124 113 102 120 131 114 155 144 150 147 140 214 174 182 163 234 253 195 228 313
log D
µo (cm2/V · s) × 10-4
predicted RRF × 10-2 a
-1.52 -4.33 +0.80 -2.00 -3.05 -2.58 -2.49 -2.94 -2.56 -2.18 -3.19 -1.92 -4.80 -2.89 -1.12 -1.73 +0.10 -1.01 -1.82 -0.64
2.49 6.86 2.79 3.60 5.41 3.84 3.23 6.20 3.04 3.29 3.14 2.56 2.84 5.50 1.93 2.46 2.25 2.50 2.48 2.14
1.25 0.631 0.962 1.11 1.55 0.667 2.46 2.23 2.19 2.20 1.60 5.11 2.61 3.03 3.00 6.03 7.30 4.52 5.74 9.75
predicted concn 5 µM (% error)b,c
predicted concn 25 µM (% error)b,c
predicted concn 60 µM (% error)b,c
5.3 (+6.0) 6.2 (+25) 7.1 (+42) 5.7 (+14) 6.9 (+37) 7.6 (+51) 9.5 (+89) 3.4 (-32) 3.1 (-38) 2.6 (-48) 3.4 (-31) 9.4 (+89) 4.6 (-8.8) 7.8 (+56) 2.3 (-54) 5.1 (+2.5) 9.9 (+98) 11 (+111) 7.5 (+51) 3.7 (-26)
27 16 29 26 47 33 40 14 36 12 18 29 21 20 12 30 52 31 25 21
71 (+18) 26 (-56) 60 (-0.08) 68 (+13) 108 (+81) 72 (+20) 80 (+34) 34 (-43) 169 (+180) 24 (-60) 41 (-32) 63 (+5.0) 36 (-40) 45 (-24) 26 (-57) 68 (+13) 94 (+56) 80 (+33) 60 (+0.3) 48 (-20)
(+9.2) (-38) (+16) (+4.4) (+88) (+32) (+60) (-44) (+42) (-53) (-29) (+17) (-17) (-22) (-53) (+19) (+110) (+24) (-0.8) (-15)
a Predicted RRF derived from the three measured or estimated (*) solute physicochemical parameters (MV, log D, µo) identified after MLR of the training set using unstandardized variables: y ) [(4.40 ± 0.38) × 10-4](MV) + [(2.7 ± 1.0) × 10-3](log D) + (14 ± 11)µo - (4.14 ± 0.074) × 10-2. b Predicted metabolite concentration based on triplicate analysis (n ) 3) of the average RIR for validation metabolites at three different concentration levels (5, 25, 60 µM) normalized to the internal standard (50 µM DiAla), where predicted [metabolite] ) RIR/RRF. c Model accuracy measured as the absolute value of the relative percent error of the predicted concentration from true concentration, whereas the average % error for training/ test metabolite quantification at 5, 25, and 60 µM was 46%, 35%, and 39%, respectively.
and µo) as highlighted in Figure S2 of the Supporting Information. Two elements critical to the success of quantitative ESI models when applied to real-world samples include the need for robust ion source designs that maintain stable spray formation as well as efficient separation techniques that resolve major coion interferences. Overall, there were two types of metabolites that were excluded from the original training set during initial model development in our study. First, four high molecular weight metabolites (e.g., GSSG, Quin, Meto, and Oxytet) were not applicable to multivariate calibration since their chemical properties were distinct from the bulk of the training set as highlighted in Figure S1 of the Supporting Information. In fact, two of these metabolites (e.g., GSSG and Quin) were detected by ESI-MS as their MH22+ molecular ion unlike all other solutes (MH+) in this study, which is a major source of bias when assuming equivalent ion transmission and/or detection by the mass analyzer. Several low molecular weight metabolites with low RRF (e.g., Ala, β-Ala, GABA, Crea, Thr, Asp, and Asn) also did not conform to the quantitative model (i.e., prediction of negative RRFs) despite analogous intrinsic properties to the training set of metabolites. These artifacts are likely a result of the limited dynamic range of the multivariate calibration method when using a single IS to normalize ion responses for various metabolites having a wide disparity in ionization efficiency, which in our refined model did not exceed a RRF difference of about 70-fold (from Ad to C8). Thus, careful attention to model training and metabolite/IS selection are key features to the performance of quantitative models for predicting solute ionization efficiency in ESI-MS. It is envisioned that researchers can successfully adopt this method by modeling 2514
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
subclasses of metabolites using an appropriate IS (selected for each subclass) to ensure high predictive accuracy as demonstrated in Figure S5 of the Supporting Information. In addition, the use of alternative mass analyzers (e.g., time-of-flight mass spectrometry) with extended linear dynamic range is preferred when modeling metabolites with large differences in RRF. This strategy offers a boon to metabolomic applications relevant to drug development and disease prognosis by allowing for rapid quantification of recently identified metabolites in biological samples with reasonable confidence in advance to their chemical synthesis/ purification, such as biomarkers of oxidative stress (e.g., NTyr, oTyr, MetS, etc.).47 Since this multivariate model is based on fundamental solute physicochemical parameters for predicting ionization efficiency, a simple recalibration is required whenever different types ESI-MS instruments are employed due to changes in their ion sampling, transmission, and detection efficiency. CONCLUSIONS To the best of our knowledge, this work demonstrates for the first time the feasibility for virtual quantification of micromolar levels of polar metabolites in biological samples when using ESIMS. Multivariate calibration with internal/external validation was developed to predict the ionization efficiency for various classes of metabolites and their isomers based on a limited set of intrinsic solute physicochemical properties that were measured experimentally or estimated by computer modeling. Ionization suppression due to sample matrix effects remains the most significant obstacle to quantitative models in ESI-MS, which can be minimized when using robust ESI interface designs together with high(47) Ptolemy, A. S.; Lee, R.; Britz-McKibbin, P. Amino Acids 2007, 33, 3–18.
efficiency separation techniques. It is hoped that this work can contribute to a deeper understanding of the fundamental thermodynamic and electrokinetic processes that influence solute ionization efficiency in ESI-MS, while providing a simple strategy for de novo quantification of recently discovered metabolites when commercial standards are unavailable. The latter feature is particularly relevant given the increasing prominence that metabolomics plays in systems biology research for assessing the global impact of stressors in organisms.48 Future studies will continue to develop quantitative models for predicting ionization efficiency of other classes of metabolites in either positive- or negative-mode ESI-MS, as well as assessing its applicability to LC-ESI-MS. (48) Ishii, N.; Nakahigashi, K.; Baba, T.; Robert, M.; Soga, T.; Kanai, A.; Hirasawa, T.; Naba, M.; Hirai, K.; Hoque, M.; Ho, P. Y.; Kakazu, Y.; Sugawara, K. Science 2007, 316, 593–597.
ACKNOWLEDGMENT This work is supported by funds provided by the National Science and Engineering Research Council of Canada, Premier’s Research Excellence Award, and the Canada Foundation for Innovation. The authors thank the reviewers for their insightful comments and helpful suggestions. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.
Received for review October 28, 2008. Accepted February 9, 2009. AC802272U
Analytical Chemistry, Vol. 81, No. 7, April 1, 2009
2515