Anal. Chem. 2006, 78, 8324-8331
Software Tools for Analysis of Mass Spectrometric Lipidome Data Perttu Haimi, Andreas Uphoff, Martin Hermansson, and Pentti Somerharju*
Institute of Biomedicine, Department of Biochemistry, University of Helsinki, Haartmaninkatu 8, PL 8, 00014 Helsinki, Finland
New software tools for quantitative analysis of mass spectrometric lipidome data have been developed. The LIMSA tool finds and integrates peaks in a mass spectrum, matches the peaks with a user-supplied list of expected lipids, corrects for overlap in their isotopic patterns, and quantifies the identified lipid species according to internal standards. Three different algorithms for isotopic correction (deconvolution) were implemented and compared. LIMSA has a convenient user interface and can be applied on any type of MS spectrum. Typically, analysis of one spectrum takes only a few seconds. The SECD tool, designed for analysis of LC-MS data sets, provides an intuitive and informative display of MS chromatograms as two-dimensional “maps” for visual inspection of the data and allows the user to extract mass spectra, to be further analyzed with LIMSA, from arbitrary regions of these maps. More reliable analysis of complex lipidome data with improved signal-to-noise ratio is obtained when compared to standard time-range averaged spectra. The functionality of these tools is demonstrated by analysis of standard mixtures as well as complex biological samples. The tools described here make accurate, high-throughput analysis of extensive sample sets feasible and are made available to the scientific community free of charge. Since the introduction of soft ionization techniques, such as ESI and MALDI, MS has gained an ever-growing role in biochemistry due to its unique sensitivity, high resolution, and versatility. MS-based methods have proven to be highly useful also in the field of lipidomics, i.e., in the functional analysis of lipid compositions of cells and organisms.1,2 Lipids comprise a very diverse group of compounds (for a recent comprehensive classification, see Fahy et al.3), and beside functioning as a energy store and structural components of cellular membranes, they play a key role in a variety of biological functions including signal transduction,4 * To whom correspondence should be addressed. E-mail: pentti.somerharju@ helsinki.fi. Telephone: +358-9-191 25410. Fax: +358-9-191 25444. (1) Wenk, M. R. Nat. Rev. Drug Discovery 2005, 4, 594-610. (2) van Meer, G. EMBO J. 2005, 24, 1-7. (3) Fahy, E.; Subramaniam, S.; Brown, H. A.; Glass, C. K. Merrill, A. H.; Murphy, R. C.; Raetz, C. R. H.; Russell, D. W.; Seyama, Y.; Shaw, W.; Shimizu, T.; Spener, F.; van Meer, G.; VanNieuwenhze, M. S.; White, S. H.; Witztum, J. L.; Dennis, E. A. J. Lipid Res. 2005, 46, 839-861. (4) Mills, G. B.; Moolenaar, W. H. Nat. Rev. Cancer 2003, 3, 582-591.
8324 Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
membrane trafficking and sorting,5 and morphogenesis.6 While MALDI has been used successfully in the analysis of certain lipid classes,7 ESI is the most commonly used ionization method in lipidomics due to its versatility.8,9 The selectivity of ESI-MS can be greatly enhanced by employing lipid class-specific scanning modes8,10 or by using online HPLC separation of the sample.11,12-14 Notably, ESI-MS together with stable isotope-labeled lipid precursors has also proven a powerful tool in the elucidation of metabolic pathways of lipids.15-17 Often, members of a lipid class are separated by only two mass units (one double bond), which leads to significant overlap of their isotopic patterns. In addition, the instrument response varies even among very similar species and depends on solvent composition, instrument settings, and other factors,18 thus making the use of multiple internal standards a necessity. These factors as well as the multitude of lipid species typically present in the sample make the manual analysis of data a very time-consuming and error-prone process. Accordingly, software tools have been developed to automate data processing.12,19-21 However, each of these methods have certain limitations; e.g., they were either designed for the analysis of a particular type of data sets only, require considerable manual work, do not provide accurate correction for the isotopic (5) Huijbregts, R. P. H.; Topalof, L.; Bankaitis, V. A. Traffic 2000, 1, 195-202. (6) Pavlidis, P.; Ramaswami, M.; Tanouye, M. A. Cell 1994, 79, 23-33. (7) Schiller, J. S. R.; Arnhold, J.; Fuchs, B.; Lessig, J.; Muller, M.; Petkovic, M.; Spalteholz, H.; Zschornig, O.; Arnold, K. Prog. Lipid Res. 2004, 43, 449488. (8) Han, X. L.; Gross, R. W. Mass Spectrom. Rev. 2005, 24, 367-412. (9) Murphy, R. C.; Fiedler, J.; Hevko, J. Chem. Rev. 2001, 101, 479-526. (10) Pulfer, M.; Murphy, R. C. Mass Spectrom. Rev. 2003, 22, 332-364. (11) Ka¨kela¨, R.; Somerharju, P.; Tyynela¨, J. J. Neurochem. 2003, 84, 1051-1065. (12) Hermansson, M.; Uphoff, A.; Ka¨kela¨, R.; Somerharju, P. Anal. Chem. 2005, 77, 2166-2175. (13) Houjou, T.; Yamatani, K.; Imagawa, M.; Shimizu, T.; Taguchi, R. Rapid Commun. Mass Spectrom. 2005, 19, 654-666. (14) Merrill, A. H., Jr.; S. M.; Allegood, J. C.; Kelly, S.; Wang, E. Methods 2005, 36, 207-224. (15) DeLong, C. J.; Shen, Y. J.; Thomas, M. J.; Cui, Z. J. Biol. Chem. 1999, 274, 29683-29688. (16) Boumann, H. A.; Damen, M. J. A.; Versluis, C.; Heck, A. J. R.; de Kruijff, B.; de Kroon, A. Biochemistry 2003, 42, 3054-3059. (17) Hunt, A. N.; Clark, G. T.; Attard, G. S.; Postle, A. D. J. Biol. Chem. 2001, 276, 8492-8499. (18) Koivusalo, M.; Haimi, P.; Heikinheimo, L.; Kostiainen, R.; Somerharju, P. J. Lipid Res. 2001, 42, 663-672. (19) Kurvinen, J. P.; Rua, P.; Sjovall, O.; Kallio, H. Rapid Commun. Mass Spectrom. 2001, 15, 1084-1091. (20) Kurvinen, J. P.; Aaltonen, J.; Kuksis, A.; Kallio, H. Rapid Commun. Mass Spectrom. 2002, 16, 1812-1820. (21) Liebisch, G.; Lieser, B.; Rathenberg, J.; Drobnik, W.; Schmitz, G. Biochim. Biophys. Acta: Mol. Cell Biol. Lipids 2004, 1686, 108-117. 10.1021/ac061390w CCC: $33.50
© 2006 American Chemical Society Published on Web 11/17/2006
pattern overlap in all cases, or are not freely available as a readyto-use software. Here, we describe two new software tools that greatly simplify and speed up the analysis of MS, MS/MS, and LC-MS data of cellular lipidomes. The performance of these tools was tested with mixtures of lipid standards, and we demonstrate their usefulness in the analysis of complex natural lipid mixtures. The tools have a user-friendly interface and are made freely available to the scientific community. MATERIALS AND METHODS Lipid Standard Mixtures. PC species were obtained from Avanti Polar Lipids (Alabaster, AL); SM species were synthesized from sphingosylphosphorylcholine and a fatty acid as described earlier.22 To test the performance of the deconvolution methods, mixtures of standards were prepared by mixing three stock solutions (A-C) each consisting of two lipids at equal concentration. Mixture A consisted of PC 32:2/PC 36:2, mixture B of PC 32:1/PC 34:1, and mixture C of SM 17:0/SM 18:0. Three series of samples were prepared. In series 1, lipids from stock A were kept constant at 2 pmol/µL while lipid concentrations of stock B were decreased from 2 to 0 pmol/µL with the following concentrations: 2, 0.5, 0.125, 0.063, 0.031, 0.0156, 0.0078, and 0 pmol/µL. Respectively, in series 2, stock A was held constant and stock C varied in the same way as before. For series 3, stocks A and B were held constant and stock C was varied (see Figure 3). The samples were dissolved in chloroform/methanol (1:2 v/v) and 4% aqueous NH3 (25% solution; SupraPur, Merck) was added just prior MS analysis. The PC and SM molecular species were detected in the positive ion mode by scanning for the precursors of 184 m/z, which corresponds to the phosphocholine head group.23 Analysis of SM Species of High-Density Lipoproteins. HDL samples were isolated from healthy individuals as described previously.24 A cocktail of internal standards (SM 15:0, SM 25:0, 2 nmol each, PC 28:0, PC 40:2, PC 44:2, 5 nmol each) was added to the samples (200 µg of protein), and the lipids were then extracted.25 The lower chloroform phase was washed three times with theoretical upper phase, taken to dryness under a N2 flow, and then redissolved in 500 µL of CHCl3/MeOH (1:2, v/v). Half of the sample was again dried under N2, dissolved in 1 mL of CHCl3/MeOH (1:1), subjected to alkaline hydrolysis by adding 167 µL of NaOH in MeOH (0.3 M), and then incubated for 2 h at room temperature. The hydrolysate was neutralized with 100 µL of HCl in MeOH (0.3 M) and extracted as above. The chloroform phase was evaporated under N2 flow and redissolved in 250 µL of CHCl3/MeOH (1:2). Data Acquisition. The mass spectra were acquired with a Quattro Micro triple-quadrupole mass spectrometer (Micromass Manchester, U.K.). In direct infusion experiments, chloroform (Uvasol, Merck) and methanol (LichroSolv, Merck) in a 1:2 ratio containing 4% aqueous NH3 (25% solution) was used as solvent. LC-MS data were acquired using a diol modified silica column coupled to the mass spectrometer and isocratic elution as (22) Koivusalo, M.; Alvesalo, J.; Virtanen, J. A.; Somerharju, P. Biophys. J. 2004, 86, 923-935. (23) Hsu, F. F.; Turk, J. J. Am. Soc. Mass Spectrom. 2003, 14, 352-363. (24) Miilunpohja, M.; Uphoff, A.; Somerharju, P.; Tiitinen, A.; Wa¨ha¨la¨, K.; Tikkanen, M. J. J. Steroid Biochem. Mol. Biol. 2006, 100, 59-66. (25) Folch, J.; Lees, M.; Stanley, G. H. S. J. Biol. Chem. 1957, 226, 497-509.
described previously.12 The LC-MS data were converted to NetCDF format using the DataBridge tool provided by the MassLynx 4.0-software (Micromass, Manchester, U.K.). Accuracy of Determined Isotopic Patterns. A prerequisite for the successful deconvolution and analysis of mass spectral data is the accurate determination and calculation of isotopic patterns. The accuracy of the pattern calculation algorithm used here has been demonstrated.26 To test how well the calculated theoretical patterns coincide with the data, we measured the isotopic patterns of PC 34:1 and SM 17:0 in the precursor ion (+184 m/z) scanning mode. The relative standard deviations of the differences from the calculated pattern were 2, 3, 7, 10, and 25% for the first five isotopic peaks, respectively (n ) 8, scan speed 150 m/z/15 s). Thus, the accuracy of the isotopic pattern measured here is appropriate for deconvolution with an error less than 5%, which seems reasonable in most cases. However, this error depends on the quality of the data, which could possibly be improved by using longer acquisition times or, alternatively, by averaging multiple scans. Software for Extraction of Mass Spectra from LC-MS Data Sets. A software tool entitled SECD (Spectrum Extraction from Chromatographic Data) was developed for the extraction of mass spectra from LC-MS data. Prior to analysis, the data have to be converted to NetCDF format using manufacturer software. SECD bins the data (to obtain equal mass spacing) and displays the data in a pseudo-3-D format (“map”) with the retention time as the x-axis, m/z as the y-axis, and gray scale coded signal intensity values. The user can select arbitrary regions (e.g., all lipids corresponding to a lipid class) consisting of one or several trapezoids. The data in the selected regions are summed in the m/z direction to obtain the mass spectrum and then exported to Microsoft Excel for further analysis. The software was developed using Microsoft Visual Studio 2003 and the programming languages Visual Basic.NET and Visual C++.NET. The program makes use of the NetCDF library from Unidata for data access functions (www.unidata.ucar.edu/software/netcdf/). An executable Windows version of the program can be obtained from the authors free of charge. Software for Lipid Identification and Quantification. LIMSA (LIpid Mass Spectrum Analysis) consists of a dynamic library, written in ANSI C++, for identification, deconvolution, and quantification of lipids species from MS data and a convenient user interface, which was implemented as a Microsoft Excel addin. This library can also be used from a standalone command line program for batch processing of data. The source code of the LIMSA software is released under the GPL license and is also available free of charge from the authors. The library can be compiled and used under either Windows or Linux operating systems. The input for LIMSA consists of (i) a mass spectrum obtained directly or produced by SECD from a LC-MS data set, (ii) a list of compounds with names and sum formulas (stored internally in the Microsoft Excel add-in), and (iii) a list of isotope abundances for the relevant atoms, (contained internally in the Microsoft Excel add-in). The mass spectrum can be either a line spectrum (i.e., a peak list created by the data acquisition software) or a continuous spectrum with equal or nonequal mass spacing. The mass (26) Rockwood, A. L.; Haimi, P. J. Am. Soc. Mass Spectrom. 2006, 17, 415-419.
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8325
spectrum is pasted in the first two columns of a Microsoft Excel worksheet, the compound list is selected or constructed using the add-in’s user interface (see below). The Microsoft Excel add-in has an internal lipid database, which presently contains more than 3000 lipid species (glycerophospholipids, glycosphingolipids, triacylglycerols, etc.) and their common adducts and “unwanted” fragments (e.g., loss of water or a methyl group upon ionization or at the entrance cone) with unique names and sum formulas. The user can freely complement the database by adding new sum formulas. The user can select the expected lipids from this database for the compound list. Adding heavy isotope-labeled compounds to the compound database is simple as the Microsoft Excel add-in contains a database of isotope abundances for all stable elements including the following heavy isotope symbols: D, 2H; T, 3H; Cx, 13C; Nx, 15N; Ox, 18O. Additional isotopes can be added by modifying the internal element database, also allowing the creation of partially labeled compounds. Molecular isotopic patterns are calculated as described previously.26 When processing profile (i.e., continuous) spectra, the first step is to estimate the background level. This is done by (i) dividing the spectrum into equal segments, (ii) running a median filter on those segments, (iii) finding the lowest intensity of each segment, and (iv) calculating a linear regression through the lowest points. The background thus obtained is subtracted from the raw data before any further processing. The next step is to find the relevant peaks in the spectrum. First, the raw data are smoothed using a matched Gaussian filter.27 The local maximums are selected as candidate peak positions. Local minimums are taken as peak limits, and the peaks are integrated between these limits using the trapezoid rule. In the case of line spectra, this step is omitted and each data point is considered as a peak. Next, the user-supplied compound list and the peak list derived from the mass spectrum are compared to search for matches, i.e., lipid species present in the spectrum. This is done by searching the peak list for peaks at m/z values corresponding to the most abundant isotopic peak of each of the expected compounds. Any unmatched compounds are then removed from the compound list. The remaining ones, i.e., the found lipids, are grouped into regions of overlapping isotopic patterns, which are then deconvoluted (see below) to obtain the corrected peak areas of the individual lipid species for further processing. We compared three different algorithms of peak pattern deconvolution, i.e., a subtraction algorithm, a linear fit algorithm, and a Gaussian peak model fit algorithm. The subtraction algorithm subtracts the theoretical isotopic pattern (scaled according to the first isotopic peak) of the compound with lowest m/z from the peak list, thus eliminating the contribution of this compound. The procedure is repeated for the next compound with lowest m/z in the peak list until no further peaks remain. The linear fit algorithm models the integrated peak intensities (in a region of overlapping isotopic patterns) as a weighted sum of the theoretical isotopic patterns using a set of linear equations, which can be solved to yield the contributions of the individual compounds.28 The algorithm was modified by constraining the (27) Smith, S. W. The Scientist and Engineer’s Guide to Digital Signal Processing, 2nd ed.; California Technical Pub: San Diego, 1999; pp 307-308. (28) Meija, J.; Caruso, J. A. J. Am. Soc. Mass Spectrom. 2004, 15, 654-658.
8326 Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
Figure 1. Screenshot of the graphical user interface of the LIMSA add-in. The dropdown control (1) allows the user to choose the parameter set to be used in the analysis. Once this has been done, a list box (2) displays the lipid species included in the analysis, the internal standards used (PC) and their amounts (e.g., PC28:00, 10.47 nmol), and other parameters (3) are updated as well. Another list box (4) shows all lipid species available in the internal database and their masses. More species can be added to the database by pressing the “Compound Library. . .” button.
solutions to positive values in order to improve performance with noisy data. The peak model fit algorithm models directly the raw spectrum as a weighted sum of isotopic patterns with a Gaussian peak shape and variable width.28 This is a nonlinear fitting problem, which was solved using the Levenberg-Marquardt algorithm implemented in the Gnu Scientific Library (www.gnu.org/software/gsl/). Depending on the sample, the spectrum may contain peaks due to lipids or other compounds not present in the compound list. Such peaks can compromise the analysis if their isotopic patterns overlap with those of lipid species to be analyzed. LIMSA will issue a warning in such a case. For optimal results, these compounds should be identified and included in the database. RESULTS AND DISCUSSION Lipid Mass Spectrum Analysis Software. LIMSA is a software tool designed for quantitative analysis of lipid MS spectra. It consists of a library of functions and a convenient Microsoft Excel add-in interface. As input, mass spectrum is expected in the first two columns of an empty Microsoft Excel worksheet. In practice, the user first copies a mass spectrum to the leftmost columns of an empty Microsoft Excel worksheet. After invoking the LIMSA add-in, the display shown in Figure 1 will appear. The user then chooses the list of lipids expected to be present from the database, fills in the quantities of the internal standards (if present), and describes the type of MS data to be processed (positive/negative mode; peak list/spectrum; scan type; and, optionally, other parameters like mass shift, etc.). All parameters can be saved as a named set for later use.
Figure 2. Screenshot of the LIMSA add-in output. (1) Original data with mass (m/z) in the first and the corresponding intensity in the second column. (2) List of found peaks with their mass, area, and background values; (3) Graph displaying part of the original spectrum, found peaks (marked with red diamonds) and the assigned lipid species (marked with blue triangles). (4) Graph displaying the residuals of the fit. (5) List of found lipids together with their total areas and concentrations along with other parameters. For the species not found but present in the list of expected species, a zero is displayed in the area, concentration, and response columns. (6) Calibration curve fit to the responses (area/ concentration) of the internal standards. The calibration curve (also) corrects for lipid mass-dependent variation of instrument response.18
The results, i.e., the found compounds and their concentrations, are presented on the same sheet in graphical and tabulated form (Figure 2). Typically, the analysis of one spectrum takes only a few seconds. A set of similar sample spectra can be analyzed with the same parameters in a batch mode. A database of compounds and their sum formulas, which the user can easily extend and modify, is stored inside LIMSA. Another database containing the masses and isotopic abundances for all stable elements, needed for isotope pattern calculation, is also included. Because the isotope pattern calculation is very fast on modern computers, the isotopic patterns of compounds are not stored in LIMSA, but calculated “on the fly”. In this way, isotope pattern for parent ion or neutral loss scans, which are always different from the precursor pattern,29 can be calculated by specifying the precursor and the fragment (head group, fatty acid, etc.) sum formula for the acquisition mode. This avoids storing all possible
fragment ions separately in the database, which simplifies the maintenance of the compound database. LIMSA will carry out the following tasks unattended: (1) subtraction of the background from the MS spectrum, (2) finding relevant peaks in the spectrum, (3) assignment of those peaks to specific lipid species by using a (user-defined) compound list, (4) correction of isotopic overlap and determination of the total intensity (area) of the isotopic pattern of each lipid, and (5) quantification of each species based on a calibration curve derived from the concentrations and total peak areas of added internal standards.18,30 LIMSA can be used to analyze direct MS and MS/ MS data or LC-MS data sets preprocessed with another software tool (SECD) to be described below. The first three tasks listed above are straightforward (see Materials and Methods) and are not discussed here further. The fourth task is more challenging because the isotopic patterns
(29) Rockwood, A. L.; Kushnir, M. M.; Nelson G. J. J. Am. Soc. Mass Spectrom. 2003, 14, 311-322.
(30) Brugger, B.; Erben, G.; Sandhoff, R.; Wieland, F. T.; Lehmann, W. D. Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 2339-2344.
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8327
Figure 3. Representative spectra of lipid standard mixtures used to study the performance of deconvolution algorithms. The samples were prepared by mixing by mixing two or three different twocomponent stock solutions (A-C; see Material and Methods) in varying ratios. Panel 1 shows a spectrum of a sample containing equimolar amounts of mixtures A-C (2 pmol/µL of each species). The dotted lines indicate the corresponding pairs. Panel 2 demonstrates the limit of reliable deconvolution, when mixing solutions C (major signal, m/z 731) and B (minor signal, m/z 732). Panel 3 demonstrates the limit of reliable deconvolution, when mixing solutions A, B (major signals, m/z 730 and 732), and C (minor signal, m/z 731). The dashed line in panels 2 and 3 indicates the minor component.
change and frequently overlap extensively as shown in Figure 5 and panel b of Figure 6. This causes the major complication in analyzing lipidome mass spectra and has to be corrected for.8 We tested three different algorithms to correct for such overlap (see below and Materials and Methods for details). Two of these algorithms, i.e., the subtraction and linear fit algorithms, operate on an integrated peak list, while the peak model fit algorithm operates on an unprocessed continuous MS spectrum. The fifth task, i.e., quantitation, is based on fitting a calibration curve to the responses of internal standards and is described in more detail elsewhere.12,18 Comparison of the Deconvolution Algorithms. We first tested the performance of the three different deconvolution algorithms described above using lipid standard mixtures varying in the degree of isotopic pattern overlap. This is exemplified in Figure 3 (panel 1), which shows a spectrum of a sample containing equimolar amounts of mixtures A-C (2 pmol/µL for each species). As can be seen, the spectrum contains a region where the isotopic peak pattern of one lipid out of each pair overlaps with others (m/z 729-739), while the second compound of each pair resides in a non-overlapping region. The ratio of peak areas were determined for each pair (A-C) after deconvolution and used as an indicator of algorithm performance. If the deconvolution method works properly, the ratio of the intensities of these two lipids should stay constant independent of the concentration. The 8328 Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
Figure 4. Comparison of the performance of the linear fit and Gaussian peak model fit algorithms in deconvoluting poorly resolved data. Lower panels. An equimolar mixture of PC34:2, PC34:1 (indicated by an arrow), and PC34:0 (2 pmol/µL each) was analyzed with five different resolution settings. Upper panel. The peaks were deconvoluted using either the linear fit (continuous line) or the Gaussian peak fit model (dashed line) algorithm, and the deconvoluted peak area of PC34:1 is plotted vs resolution (data points correspond to the spectra below). Error bars represent the standard deviation (n ) 8).
deconvolution was considered reliable when this ratio determined from eight individual scans showed a standard deviation of less than 25%. Panel 2 of Figure 3 illustrates a situation where the first isotopic peak of an overlapping pattern coincides with the monoisotopic peak of a minor lipid. The minor signal is not readily visible to the naked eye in the sum spectrum (solid line). In this case, the limit of reliable detection for the minor signal was less than 10% of the intensity of the major signal. When the monoisotopic peak of a minor lipid is overlapping with the second isotopic peak of a major lipid, the limit of detection was somewhat lower, as expected (not shown). Panel 3 of Figure 3 demonstrates a situation where the peak pattern of a minor lipid is embedded within patterns of lipids with one mass unit lower and higher masses, respectively. The limit of reliable detection of the minor signal was slightly higher in this case. To further test the performance of the different deconvolution algorithms, we analyzed lipid extracts from human HDL, whose composition is much more complex than that of the standard mixtures. PC and SM are the major phospholipid components of HDL and are typically analyzed by scanning for precursors of +184 m/z. Quantification of the SM species from the resulting spectra is challenging (without resorting to removal of PCs by alkaline hydrolysis), due to extensive overlap between the SM species and much more abundant PC species (Figure 5). We quantified SM molecular species from 45 HDL samples using each of the three algorithms and compared the results with those obtained for the parallel samples in which PCs had been removed by alkaline hydrolysis. The results are summarized in Table 1. The subtraction and the linear fit algorithms produced similar results, albeit the standard deviations were slightly smaller with the latter, as expected from theory. In line with this, analysis of simulated data with added random noise showed that the results obtained by
Figure 5. Representative spectra of human HDL lipid extract before and after alkaline hydrolysis. The precursor ion (+184 m/z) scan obtained for unhydrolyzed extract shows both PC and SM species (upper panel), while that for a hydrolyzed sample shows only SM species (lower panel). For clarity, only SM species are indicated. Table 1. Comparison of Deconvolution Algorithmsa relative deviation ( SD (%) lipid SM14:0 SM16:1 SM16:0 SM18:1 SM18:0 SM20:1 SM20:0 SM22:1 SM22:0 SM23:1 SM23:0 SM24:2 SM24:1 SM24:0
av concn % of (pmol/µL) total SM 0.58 1.18 5.85 0.90 1.44 0.65 1.27 2.70 2.13 1.24 0.91 3.98 6.50 1.43
2 4 18 3 4 2 4 8 6 4 3 12 19 4
Gaussian peak fit
linear fit
subtraction
-5 ( 3 -4 ( 6 0(2 -4 ( 6 -4 ( 6 -100 ( 0 -9 ( 43 -22 ( 9 -5 ( 17 -7 ( 5 -2 ( 4 -4 ( 7 -4 ( 9 1(6
-5 ( 5 -5 ( 5 1(2 -4 ( 7 -8 ( 7 -99 ( 6 -94 ( 14 -68 ( 21 -44 ( 26 -4 ( 5 -1 ( 5 -2 ( 13 1 ( 10 8 ( 11
-4 ( 6 -4 ( 5 2(3 -5 ( 7 -7 ( 7 -99 ( 6 -94 ( 13 -67 ( 21 -44 ( 26 -4 ( 6 -2 ( 6 -2 ( 12 2 ( 10 5 ( 12
a Relative deviation of the concentration of SM species present in human HDL samples (n ) 45) obtained from the same sample prior and after removal of PC by alkaline hydrolysis analyzed using different deconvolution methods. The value obtained from the hydrolyzed sample was taken as the reference. Only SM species with abundance of >1% are listed. The average concentration of all hydrolyzed samples and the corresponding mol percent are displayed in the first two columns.
linear fit algorithm deviated ∼10% less from the nominal values as compared to the subtraction method (data not shown). The data obtained with the peak model fitting algorithm28 were significantly better than those obtained with the other two methods. The relative abundances of several minor SM species deconvoluted from the spectra of unhydrolyzed samples agreed
much better with those of hydrolyzed samples lacking PCs (Table 1). Furthermore, additional minor species could be detected and analyzed with this algorithm. Finally, we tested how the limits of reliable deconvolution depend on the resolution of the MS spectra when using either the linear or the Gaussian peak model fit algorithm. With highresolution spectra, similar limits of deconvolution were obtained with both algorithms. However, with less resolved spectra, the peak model fitting was superior as shown in Figure 4. The difference became evident once the unit mass resolution was lost. The results obtained with the linear fit algorithm were unacceptable at low spectral resolution, while the peak fit algorithm was quite insensitive to poor resolution, within the limits tested. Even though the peak model fitting algorithm appears to provide most accurate results, each algorithm has its merits. The subtraction algorithm, thus far most often used in lipidome data analysis,8,20,21 is easy to implement and fast to compute. The linear fit algorithm28,31 is also fast but seems slightly more accurate. The peak model fitting algorithm is more difficult to implement and is computationally more demanding than the other two, but provides more precise results with complex samples, particularly if the spectral resolution is compromised. While the analysis takes 10-100 times longer with this algorithm, typical analysis times are only ∼30 s, which is fast enough in nearly all cases, i.e., computing time is not rate limiting in the analysis. In contrast to the subtraction algorithm, the linear and the peak model fitting algorithms can theoretically deconvolute patterns that (31) Zhang, Z. Q.; Guan, S. H.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1997, 8, 659-670.
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8329
Figure 6. SECD software tool for extraction of lipid class-specific spectra. Panel a. Screenshot of the SECD user interface. Colored trapezoids have been drawn set to include all the species of each major phospholipid class: Blue ) PE; green ) PC alkyl/acyl; red ) PC; dark green ) SM; beige/green ) PS; yellow ) PI. Panel b. Spectra for PC diacyl (red line) and PC alkyl/acyl (green line) species extracted using SECD trapezoids or time-range averaging (black line). Circles highlight some regions where SECD allows exclusion of unwanted signals compared to time range averaging. With SECD, isobaric PC alkyl and PC diacyl species can be readily distinguished (circle 1), or a non-PC lipid can be exluded (circle 2).
overlap fully (i.e., even the first isotope peak overlaps), provided that the patterns are different enough.28 This kind of situation can arise for example when a partially isotope-labeled standard overlaps a natural lipid. Software for Extraction of Mass Spectra from LC-MS Data (SECD). The analysis of complex lipidomes by LC-MS is usually complicated by limited chromatographic resolution between lipid classes, which can introduce significant overlap between different lipid species when time-range averaged spectra are used. On the other hand, the usage of single-ion chromatograms is tedious and time-consuming, since correction for isotopic overlap is often needed. To simplify the analysis of lipid LC-MS data, we constructed the SECD software, which (i) displays of MS chromatograms as two-dimensional “maps” for visual inspection and (ii) allows the user to extract mass spectra from arbitrary regions of these data sets by drawing one or more trapezoids covering the desired region (Figure 6). 8330
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
The SECD software has an intuitive interface, which allows the user to easily create, move, and modify trapezoids with the mouse. It also guides the user in optimizing the trapezoids by showing (in a separate pop-up window) a list of compounds with masses corresponding to the selected peak present in a database. The summed spectra of trapezoids for each lipid class are exported to Microsoft Excel for further analysis with the LIMSA software as described above. The use of trapezoids for extraction of spectra from LC-MS data has several advantages as compared to conventional timerange spectral averaging. The retention times of lipid classes typically overlap significantly and thus average spectra over a timerange contain signals from two or more lipid classes (Figure 6), which can significantly compromise the analysis. SECD trapezoids can be readily set to cover a single lipid class only, thus greatly simplifying the analysis. Another advantage of trapezoids is that less “empty” regions are included in the spectra, which can
increase the signal-to-noise ratio, particularly when a lipid class elutes over an extended time range. Notably, the locations of the trapezoids can be saved and thus similar samples can be analyzed without further adjustments, provided that the retention times are fairly reproducible between runs. CONCLUSIONS The software tools presented here make the analysis of mass spectrometric lipidome data much easier, faster, and more reliable than when carried out manually or with the tools provided with MS instruments. The LIMSA tool allows convenient and accurate quantification of the individual lipid species from MS data by efficiently correcting for the overlap of isotopic patterns typical for lipidome data. Integration of the necessary databases, isotopic pattern calculation, peak deconvolution, and quantitation in the software allows efficient analysis of large sample sets with LIMSA. This is further supported by the included batch processing option. The software is also directly applicable for metabolic studies using 2H-, 13C-, 15N-, and 18O-labeled compounds. Other labels or partially labeled species can also be analyzed after proper supplementation of the internal element database. Albeit the main focus of this work was the analysis of MS/MS spectra, LC-MS data sets also can be conveniently and accurately analyzed with the additional SECD tool. However, when large numbers of related samples are to be analyzed by LC-MS, another recently published method,12 in which peak picking and deconvolution are done on the chromatographic data directly, might be a better option. In the future, it could be useful to obtain names and structures of the lipids, now stored in the add-in, from a central database like the one being implemented as part of the lipid maps project (http://www.lipidmaps.org/). This will be straightforward once
it becomes possible to directly access name and sum formula for all entries in the external database. Finally, the software tools developed here could be applied for analysis of spectra obtained for other types of analytes as well because of the flexible algorithms used for pattern calculation, deconvolution and spectrum extraction. Obviously, also spectra originating from other kind of instruments (e.g., MALDI) could be analyzed as well. ABBREVIATIONS ESI, electrospray; HDL, high-density lipoprotein; LC-MS, liquid chromatography-mass spectrometry; MALDI, matrixassisted laser desorption/ionization; PC, phosphatidylcholine; PCalkyl, alkyl/acyl-glycerophosphocholine; PE, phosphatidylethanolamine; PI, phosphatidylinositol; PS, phosphatidylserine; PE, phosphatidylethanolamine; SM, sphingomyelin. Lipid molecular species are identified by the abbreviation of the lipid class followed by the total number of carbons in the fatty acid chain(s) separated by a colon from the total number of double bonds in the fatty acid chain(s). In case of sphingomyelins, only the carbon number of N-linked fatty acid is indicated, assuming a 18:1 base (sphingosine). ACKNOWLEDGMENT We are grateful to Drs. Tomi Mikkola and Matti Tikkanen for providing the HDL samples and to Tarja Grundstro¨m for excellent technical assistance. This work was supported by grants of the Academy of Finland to A.U. (106210) and P.S. (44236) and from Sigrid Juselius Foundation to P.S. Received July 28, 2006. Accepted September 28, 2006. AC061390W
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8331