Software Platform for High-Throughput Glycomics - Analytical

Apr 2, 2009 - (a) Scheme of the High-Throughput Assignment Software Platform for .... Nevertheless, even if a 10% intensity deviation is taken into ac...
0 downloads 0 Views 3MB Size
Anal. Chem. 2009, 81, 3252–3260

Software Platform for High-Throughput Glycomics S. Y. Vakhrushev,* D. Dadimov, and J. Peter-Katalinic´* Institute for Medical Physics and Biophysics, Biomedical Analysis, University of Muenster, D-48149 Muenster, Germany Mass spectrometry (MS) is a key tool for structural analysis of oligosaccharides because of its high accuracy, sensitivity, and speed on one hand and because of the general and flexible protocols on the other. In glycomics projects the analysis of mass spectra is the speed determining step because, unlike in proteomics, software platforms for high-throughput glycan mass spectra interpretation are not fully automated and still depend on highly specialized knowledge. For the publicly available software, initial steps for manual MS data preprocessing are required mostly considering operations with glycan structures already stored in databases. In particular, monoisotopic peaks have to be manually determined or imported. In this contribution we describe our development of a platform for MS data evaluation in glycomics that demands only a low human intervention. The proposed platform named SysBioWare is constructed to allow import of the raw MS data to the spectrum browser and to perform isotopic grouping of detected peaks after denoising and wavelet analysis. Monoisotopic m/z values render peak list association with the raw MS spectrum and allow compositional assignment according to the tuned building block library. This platform has been applied to human urine glycome as a potent tool for rapid assignment of already known or/and novel structures. Development of distinct bioinformatics tools is crucial for the elucidation of glycan structures, especially with the emerging revolution in glycomics where the organization and manipulation of large scale data derived from carbohydrate analysis is required.1 The most significant progress in bioinformatics in glycomics was observed in the field of the development of databases for prediction of N- and O-glycosylation sites, graphical representation and nomenclature, generation of three-dimensional structures of glycans, and web-portals focused on general information.2-5 Databases of this type are usually tailored to specific tasks. Because of the high sensitivity and speed, mass spectrometry * To whom correspondence should be addressed. E-mail: vakhrush@ uni-muenster.de (S.Y.V.), [email protected] (J.P.-K.). Phone: (+49)251-8355195 (S.Y.V.), (+49)251-83-52308 (J.P.-K.). (1) Packer, N. H.; von der Lieth, C. W.; Aoki-Kinoshita, K. F.; Lebrilla, C. B.; Paulson, J. C.; Raman, R.; Rudd, P.; Sasisekharan, R.; Taniguchi, N.; York, W. S. Proteomics 2008, 8, 8–20. (2) Raman, R.; Raguram, S.; Venkataraman, G.; Paulson, J. C.; Sasisekharan, R. Nat. Methods 2005, 2, 817–824. (3) von der Lieth, C. W.; Bohne-Lang, A.; Lohmann, K. K.; Frank, M. Briefings Bioinf. 2004, 5, 164–178. (4) Aoki-Kinoshita, K. F. PLoS Comput. Biol. 2008, 4. (5) von der Lieth, C. W.; Lutteke, T.; Frank, M. Biochim. Biophys. Acta 2006, 1760, 568–577.

3252

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

(MS) is a most popular and key tool for structural analysis of oligosaccharides. In particular, qualitative data interpretation of MS spectra in high-throughput projects is of primary importance for the rapid identification of biological routes.1 In this context, the availability of tools for analysis of mass spectra is the most relevant issue, since the existing software platforms for automated high-throughput glycan mass spectra interpretation are still highly dependent on human expertise. Critical comments on the computational tools for assignment of glycoconjugates from mass spectra, already introduced to glycomics, are given below. The “Glycomod” system was designed to calculate from experimentally determined molecular ion all possible compositions of a glycan structure, considering underivatized, methylated, or acetylated glycans, or those with derivatized reducing terminus. In addition, the composition of glycopeptides could be computed if the mass and/or the sequence of the peptide portion are known. Also, if the parent protein is already known the peptide data can be entered within the protein sequence, SWISS-PROT/TrEMBL ID or as a set of unmodified peptide masses.6,7 This program supports experimental mass entry only as neutral species or as singly charged ions, where multiply charged ions have to be convoluted to the neutral or singly charged forms. Additionally, filtering of biologically nonrelevant carbohydrate compositions is not provided; thus, the composition hit list obtained has to be further manually analyzed with respect to the presence of implausible structure proposals. “GlycoPep DB” was constructed as a web-based tool for glycopeptide analysis using a “smart search” concept, designed for N-glycopeptide compositional assignment. By comparing experimentally determined masses with all calculated glycopeptides from a carbohydrate database with N-linked glycans, only biologically relevant structures are considered.8 For that purpose the user has to introduce a peak list manually or to paste it from the file of a previously pre-processed mass spectrum, to specify query criteria (e.g., database, cysteine modification, charge state, charge carrier and mass tolerance in ppm), and to enter the peptide mass or sequence. Although the number of implausible glycan compositions in comparison to “Glycomod” is reduced, the restriction of this approach is defined by a number of structures present in the database. “Cartoonist”, developed for the automated annotation of Nglycan MALDI TOF mass spectra, is based on the labeling of MALDI peaks with cartoons representing the most plausible (6) Cooper, C. A.; Gasteiger, E.; Packer, N. H. Proteomics 2001, 1, 340–349. (7) Cooper, C. A.; Joshi, H. J.; Harrison, M. J.; Wilkins, M. R.; Packer, N. H. Nucleic Acids Res. 2003, 31, 511–513. (8) Go, E. P.; Rebecchi, K. R.; Dalpathado, D. S.; Bandu, M. L.; Zhang, Y.; Desaire, H. Anal. Chem. 2007, 79, 1708–1713. 10.1021/ac802408f CCC: $40.75  2009 American Chemical Society Published on Web 04/02/2009

glycan assemblies synthesized by mammals using 300 manually determined archetypes.9 Although the numbers of implausible proposals for a certain precursor ion are greatly reduced because of the fixed library of created archetypes, its application area is limited by the size of the library itself. This algorithm has been recently extended by the program called “Peptoonist” for automated identification of N-glycopeptides using a combination of MS and MS/MS data.10 The web-based tool “Glyco-Peakfinder”, programmed for rapid assignment of glycan compositions, is intended to be entirely a de novo platform for compositional analysis that does not rely on prior information from glycan databases and precalculated known archetypes.11 It can be used for assignment of all types of fragment ions including monosaccharide cross-ring cleavage products and multiply charged ions and accepts computation of derivatized carbohydrate molecules by permethylation, peracetylation, perdeuteromethylation, and acetylation. However, in this software tool the human expertise with respect to the creation of the list of monoisotopic m/z values is still required. The automatic peak detection as one of the most challenging and time-consuming tasks, playing a crucial role in any high-throughput application, has not been covered so far. To date major efforts for automated spectra deconvolution and monoisotopic m/z recognition were focused to the field of proteomics. A number of systems for charge state deconvolution and automated reduction and interpretation of high resolution mass spectra of large molecules have been reported.12-17 The challenging tasks, however, remain the chemical noise filtering and monoisotopic m/z values determination. To improve peak detection several algorithms for noise reduction in electrospray spectra have been proposed,18-20 where wavelet analysis seems to represent currently a method of choice for thresholding and denoising.21,22 Although widely tested for various proteomics applications, such automated systems perform poorly in glycomics. For example, the glycome of human urine is extremely complex, characterized by a large number of different ionic species.23-27 (9) Goldberg, D.; Sutton-Smith, M.; Paulson, J.; Dell, A. Proteomics 2005, 5, 865–875. (10) Goldberg, D.; Bern, M.; Parry, S.; Sutton-Smith, M.; Panico, M.; Morris, H. R.; Dell, A. J. Proteome Res. 2007, 6, 3995–4005. (11) Maass, K.; Ranzingei, R.; Geyer, H.; von der Lieth, C. W.; Geyer, R. Proteomics 2007, 7, 4435–4444. (12) Mayampurath, A. M.; Jaitly, N.; Purvine, S. O.; Monroe, M. E.; Auberry, K. J.; Adkins, J. N.; Smith, R. D. Bioinformatics 2008, 24, 1021–1023. (13) Kaur, P.; O’Connor, P. B. J. Am. Soc. Mass Spectrom. 2006, 17, 459–468. (14) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 2000, 11, 320–332. (15) Zhang, Z. Q.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1998, 9, 225– 233. (16) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233. (17) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 52–56. (18) Du, P.; Kibbe, W. A.; Lin, S. M. Bioinformatics 2006, 22, 2059–2065. (19) Kast, J.; Gentzel, M.; Wilm, M.; Richardson, K. J. Am. Soc. Mass Spectrom. 2003, 14, 766–776. (20) Wehofsky, M.; Hoffmann, R. J. Mass Spectrom. 2002, 37, 223–229. (21) Kwon, D.; Vannucci, M.; Song, J. J.; Jeong, J.; Pfeiffer, R. M. Proteomics 2008, 8, 3019–3029. (22) Du, P. C.; Stolovitzky, G.; Horvatovich, P.; Bischoff, R.; Lim, J.; Suits, F. Bioinformatics 2008, 24, 1070–1077. (23) Vakhrushev, S. Y.; Langridge, J.; Campuzano, I.; Hughes, C.; Peter-Katalinic´, J. Anal. Chem. 2008, 80, 2506–2513. (24) Vakhrushev, S. Y.; Langridge, J.; Campuzano, I.; Hughes, C.; Peter-Katalinic´, J. J. Clin. Proteom. 2008, 4, 47–57.

Thus, for automated glycomics, especially in clinical applications, bioinformatics approaches providing a fully automated highthroughput MS data interpretation and demanding a minimal human interaction are expected. We present here a software platform named SysBioWare for carbohydrate assignment as a solution for glycomics in general and for clinical applications in particular. Our platform is designed to work directly from raw MS data and was constructed to be easily tuned to different specific applications. The following steps are incorporated according to the proposed design: (i) import of the raw MS data to the spectrum browser, (ii) baseline adjustment and denoising, (iii) peak detection based on shape matching, (iv) isotopic grouping of detected peaks infer monoisotopic m/z values and charge states, (v) loading and creation of a biological filter for compositional analysis, (vi) automated compositional assignment. Raw data and analysis results can be stored in the local database and referenced from subsequent experiment records. EXPERIMENTAL SECTION Materials and Sample Preparation. Materials. Methanol was obtained from Merck (Darmstadt, Germany) and used without further purification. Graphitized carbon powder used for the graphitized carbon cartridge preparation was collected by dismantling commercially available Active Charcoal MicroTip Columns 25-100 µL (Hugo Sachs Elektronik/Harvard Apparatus Inc., March-Hugstetten, Germany). AG50 (H+) resin was purchased from Bio-Rad (Richmond, CA, U.S.A.). Distilled and deionized water (Mili-Q water systems Millipore, Bedford, MA, U.S.A.) was used for the preparation of the sample solutions. N-glycan standards: oligomannose (Man5, Catalogue No. M-00250S); asialo, agalacto, biantennary (NGA2, Catalogue No. C-0720); asialo, agalacto, biantennary with core fucose (NGA2F, Catalogue No. C-004301); asialo, biantennary (NA2, Catalogue No. C-0024300M) were purchased at Oxford GlycoSciences, Abington, U.K. They were used as solutions in MeOH/H2O (1/ 1; v/v) at the concentration of 0.5 pmol/µL to be analyzed on an orthogonal hybrid quadrupole time-of-flight mass spectrometer (Waters/Micromass, Manchester, U.K.) in the Z-spray geometry. CDG Urine Sample. The sample investigated in this study was a native glycoconjugate mixture from the urine of patient K.L., suffering from symptoms assigned to Congenital Disorders of Glycosylation (CDG). For isolation of components, the patient’s urine was filtered and submitted to a first gel filtration chromatography step on Biogel P2 followed by gel filtration chromatography performed on Fractogel TSK HW 50 and anion-exchange chromatography on MonoQ to deliver 6 fractions as described previously.25,27 The fraction M3 was used in the present study. Mass Spectrometry. Mass spectrometry was performed on an orthogonal hybrid quadrupole time-of-flight mass spectrometer (Q-TOF Waters/Micromass, Manchester, U.K.) in the Z-spray geometry. Nitrogen was used as a dissolvation gas, and the source block temperature was kept at 80 °C. Gas-phase ions were (25) Vakhrushev, S. Y.; Mormann, M.; Peter-Katalinic´, J. Proteomics 2006, 6, 983–992. (26) Vakhrushev, S. Y.; Snel, M. F.; Langridge, J.; Peter-Katalinic´, J. Carbohydr. Res. 2008, 343, 2172–2183. (27) Vakhrushev, S. Y.; Zamfir, A.; Peter-Katalinic´, J. J. Am. Soc. Mass Spectrom. 2004, 15, 1863–1868.

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3253

Figure 1. (a) Scheme of the High-Throughput Assignment Software Platform for Glycomics. (b) Hierarchical structure organization of the developed SysBioWare platform showing interactions between the key objects of the program: Peak Detection Wizard, BioFilter Wizard, Composition Module, and Analysis Wizard.

Figure 2. Main window of the SysBioWare 2.0.0 software with activated “Laboratory”, Compositions blocks”, and “Molecule Classes” modules.

generated by nanoelectrospray ionization in both positive and negative ion mode. Omega glass capillaries used in nanoESIexperiments were pulled using a vertical pipet puller (model 720, David Kopf Instruments, Tujunga, CA, U.S.A.). The voltage was applied to the solution via a stainless steel wire inside the capillary. The cone voltage values were in the range of 30-50 V. Instrumental control and mass spectra acquisition were operated by MassLynx 3.1 software (Waters/Micromass, Manchester, U.K.). Software Platform. The protocol for high-throughput glycoscreening of human urinome is schematically shown as a flowchart in Figure 1. The software platform presented here under the name SysBioWare includes “Laboratory Module”, “Composition Blocks” constructor and “Molecule Classes” library (Figure 2). The auxiliary Peak Detection Wizard is incorporated into the Laboratory Module. Calculation-intensive steps such as wavelet transform and theoretical isotopic distribution calculations are implemented using Intel Performance Primitives libraries that provide access to 3254

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

hardware acceleration capabilities of Intel and compatible processors and parallel calculations on modern multicore processors. All these optimizations allow the user to complete a typical analysis within less than 1 min. “Composition Blocks” Constructor. “Composition Blocks” constructor (Figure 2, inset) allows users to create a library of potential building block components, adducts, and modifications, which can be used at further analysis steps. Component increment masses are calculated automatically from elemental composition formulae. “Molecule Classes” Library. The library defines a list of molecule classes to be observed (Figure 2, inset). The Molecule Class is defined by: • The list of all possible building blocks, which provides the foundation for compositional assignment. • Biological feasibility rules (Biofilter). These rules, written as a line code, specify conditions or the ratio between building blocks. Rules for amino acid linked glycans and the dHex ratio in

Figure 3. Different stages of the working process of Peak Detection Wizard. (a) Baseline Correction stage. Raw spectrum is shown in gray. (b) Noise level determination. The raw spectrum after baseline subtraction and smoothing is shown in gray. Peaks detected above the noise level are shown with red circles. (c) Peak Shape Detection stage based on the continuous wavelet analysis. Signals matched by Gaussians are shown in red. (d) Isotopes grouping stage: grouping of different peaks into a single isotopic envelope. The monoisotopic m/z values recognition and charge state determination. The raw spectrum after baseline subtraction and smoothing is shown in gray. Signals matched by Gaussians are shown in black. The recognized monoisotopic peaks are shown in red. The inset shows the example of the correct recognition of the singly and doubly charged ions.

the composition are shown as an example in the “Bio-filter” window (Figure 2, inset). • An average elemental composition model is used for more accurate isotopic peak grouping. At the current stage two types of model biomolecules are implemented: peptides and glycans. Since the major chemical elements present in oligosaccharides and peptide molecules are hydrogen, oxygen, carbon, and nitrogen, these elements have been taken to emulate the elemental composition of the corresponding model. Whereas, for the peptide model molecule the hypothetical amino acid “averagine” could be considered for calculation,16 no such estimations have been observed for oligosaccharides. Therefore, for the glycan model molecule the following estimation has been considered: the four most common monosaccharides NeuAc, dHex, Hex, and HexNAc provide an equal contribution to the hypothetical oligosaccharide molecule, resulting in the mass concentration for carbon, oxygen, and nitrogen as 46%, 44%, and 3%, respectively (Figure 2, inset). “Laboratory Module”. This module provides user interface to the database of experiments. The user starts by importing a raw mass spectrum at the “Spectra” tab, where the Peak Detection Wizard generates an initial monoisotopic peak list. The “Components” tab requires users to provide the lists of building blocks, adducts, and modifications they are expecting to see, as well as the list of biofilter rules which can be imported from the molecule classes library. The “Annotation” tab allows the user to describe various aspects of the experiment such as materials, methods, instrument settings, biological sources, and so on. These annota-

tion elements can be later searched to retrieve past experiments and make cross-references. Finally, the “Analysis” tab is the place where compositional assignment is performed. This procedure is based on the principle of modeling of the respective glycoconjugate ions using different combinations of potential building blocks defined by the user in the “Components”. The engine used for these calculations was tested previously on different complex glycoconjugate mixtures.23-28 Afterward, assignment results can be compared between different experiments and exported to Excel. Peak Detection Wizard. Since the manual determination of the monoisotopic m/z peak list from the raw spectrum represents a laborious task, especially in the case of high complex mixtures and significantly increases a time of analysis, the Peak Detection Wizard was introduced as a crucial step to address this issue for high-throughput approaches. The wizard provides mass spectrometric data preprocessing: baseline correction and noise level determination, automated monoisotopic m/z values recognition, and charge state determination (Figure 3). It consists of the following steps: • Data Import and Resampling. The raw spectrum needs to be cleansed of duplicate m/z values (spectrum points with the equal m/z values and non-equal intensity values) and resampled to have uniform m/z values. This makes possible a number of (28) Zamfir, A.; Vakhrushev, S.; Sterling, A.; Niebel, H. J.; Allen, M.; PeterKatalinic´, J. Anal. Chem. 2004, 76, 2046–2054.

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3255

optimizations, such as the speed of wavelet analysis calculation enhancing (Peak Shape detection, below). • Baseline Correction. We have introduced two modes of baseline correction, which can be optionally selected by the user. The first mode detects the electronic noise, which is assumed to have normal distribution, and then uses this noise as a lowest signal level (Figure 3a). The second mode is based on the removal of specified proportion of lowest measurements. Afterward, the baseline can be subtracted from the spectrum. • Smoothing and Peak Detection. At this step, noise spikes and coarse quantization effects are removed from the spectrum. Quantization effects are represented by discrete intensity levels which are usually the case for a mass spectrum obtained using a small number of scans. Removing discrete levels significantly improves performance in the next, noise detection step. For already smoothed spectra no further smoothing is usually required. • Noise Level Determination. The intensity of noisy peaks is assumed to follow a normal distribution with a varying local standard deviation. The mean of the noise is estimated as the median of peak intensities, and the local standard deviation is estimated from the local median absolute deviation. The peaks whose intensity exceeds a specified percentile are considered to be useful signals (Figure 3b). • Peak Detection. Peaks that have passed noise filtering are matched against one of the expected peak shapes (Figure 3c). Currently, Gaussian and Lorentzian shapes are used. Shape matching is implemented using the continuous wavelet transform: C(a, b) )

1 √a



+∞

-∞

f(x) ψ

( x -a b ) dx

(1)

where ψ(x) describes the shape of the peak. The transformed space C(a,b) is then searched for local maxima. The result normally looks as a series of ridges starting at the locations of the most intensive peaks and then gradually branches into shorter ridges corresponding to the less intensive peaks. The algorithm then searches for local maxima along those ridges. These local maxima determine the location and width of the peak fitted at the specific m/z location. The peak’s intensity is estimated using the area under the curve and its width. This method has been proven to be more robust than the peak shape fitting using leastsquares methods. Furthermore, it can be implemented very efficiently using the Fast Fourier Transform and works extremely fast on modern multicore processors. • Isotope Grouping. After detecting potential peaks by shape matching, SysBioWare attempts to group them as isotopes and determines their charge state (Figure 3d). Starting with a maximum possible charge state, the program tries to find peaks at designated locations that form the packet of the same shape as a corresponding theoretical isotopic distribution. A likelihood of observing a particular grouping is evaluated, and if it is above the specified threshold, the first peak in the group is accepted as a monoisotopic one, and peaks matching other signals in its isotopic envelope are then excluded from the further search. Isotopic distributions are probed by SysBioWare for different molecule classes, which can be selected by the user from the “Molecule Classes library” module, for example, glycans and peptides (Figure 2, inset). This is necessary because at higher m/z values the difference between their isotopic distribution 3256

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

shapes begins to be detectable. Namely, it has been observed that at the same m/z range the second peak of the isotope envelope for the peptides is higher than for the glycans, which can be explained by the increased contribution of the nitrogen element. Since major contributions to the isotopic intensity distribution are made by carbon and partially by oxygen and nitrogen, hydrogen has not been considered for the computation. Thus, isotope abundances for the model molecule have been calculated as coefficients of power members after the multinomial multiplication of the corresponding elements. RESULTS AND DISCUSSION Isotopic Distribution Modeling. To show the accuracy of the approximation in oligosaccharide molecule modeling, a comparison between the isotopic distributions of the real composition and those computed from the model molecule has been performed (Figure 4). Isotopic distributions of elemental composition for sialylated O-glycans (Figure 4a,b), monosialylated biantennary N-glycan with the core fucose (Figure 4c), trisialylated triantennary (Figure 4d), and pentasialylated pentaantennary N-glycans with four fucoses (Figure 4e) have been calculated and compared with those computed from the hypothetical approximation. For these classes of molecules very good consistency has been observed. In addition, the accuracy of isotopic distribution simulation based on an oligosaccharide model molecule was tested by trisialylated triantennary N-glycan with one (Figure 4f) and with four sulfate groups (Figure 4g) and by a hexose 20-mer (Figure 4h). In this case the divergence in the range from 5% to 10% was observed only for a trisialylated triantennary N-glycan with four sulfate groups (Figure 4g). Nevertheless, even if a 10% intensity deviation is taken into account, a correct identification of the monoisotopic peak and of the charge state is accomplished. The simulation of isotopic distribution for the ACTH18-38 peptide (Figure 4i) and for the oxidized B chain insulin molecules (Figure 4k) aligned with the calculations for the glycan and the “averagine” models shows that the highest consistency is achieved by the “averagine” model. When a glycan calculation model for peptides of the same mass values is used, an intensity deviation higher than 20% for isotopic peaks in the envelope is obtained. Indication of the specific calculation for model molecules at masses higher than 2500 Da is demonstrated by this. Moreover, preliminary information about the sample nature allows the user to tune the Peak Detection Wizard more precisely and correct the oligosaccharide model molecule according to any specific case. Automated Peak Detection and Assignment of the Nglycan standard mixture. To test the Peak Detection Wizard validity, an equimolar mixture of N-glycan standards has been analyzed. To simulate high-throughput conditions low mass concentration and a short time of mass spectrum acquisition has been chosen. Four neutral glycan standards, Man5, NGA2, NGA2F, and NA2, have been dissolved each in MeOH/H2O 1/1(v/v) at the concentration of 0.5 pmol/µL and submitted to nanoESI Q-TOF MS analysis in the positive ion mode, simulating a high-throughput procedure (Figure 5). At two scans, which correspond approximately to 4 s time acquisition, oligosaccharide standards have been detected as a group of singly and doubly charged ions at different signal abundances. There was no singly charged signal detected for NA2 glycan standard at two scans, whereas Man5, NGA2, and NGA2F have been

Figure 4. Comparison between isotopic distributions theoretically simulated from the elemental composition as singly charged deprotonated ions of the real molecule (solid line) and its hypothetical analogue calculated from the mass values based on the glycan model (dashed line). “Averagine” peptide model is shown by dashed line with the index “P”. The following theoretical ions have been considered for simulations: sialylated O-glycans NeuAcHexHexNAc (a), NeuAcHex3HexNAc2 (b), sialylated biantennary N-glycan with a core fucose NeuAcdHexHex5HexNAc4 (c), trisialylated triantennary NeuAc3Hex6HexNAc5 (d), pentasialylated pentaantennary N-glycan with four fucoses NeuAc5dHex4Hex8HexNAc8 (e), trisialylated triantennary N-glycan with one sulfate NeuAc3Hex6HexNAc5(SO3) (f), trisialylated triantennary N-glycan with four sulfates NeuAc3Hex6HexNAc5(S4O12) (g), a hexose 20-mer Hex20 (h), ACTH18-38 (i), and oxidized insulin B chain (bovine) peptide (k).

detected in both singly and doubly charge states. Singly charged ions were detected as sodiated ionic forms, while a doubly charged ion population was represented by a mixture of [M+H+Na]2+, [M+H+K]2+, [M+2Na]2+, and [M+Na+K]2+ ionic species. The Peak Detection Wizard of the SysBioWare program has been applied to the raw spectrum analysis of glycan standards. Under the chosen parameters, the monoisotopic values for

[M+H+Na]2+, [M+H+K]2+, [M+2Na]2+, and [M+Na+K]2+ ionic species have been correctly determined for all four standards except for the [M+H+Na]2+ at m/z 629.24 corresponding to the Man5 glycan and [M+H+Na]2+ at m/z 743.34 corresponding to the NGA2F glycan, where a false positive peak at m/z 744.32 was selected. From the singly charged ions detected for Man5, NGA2, and NGA2F at m/z 1257.54, 1339.59, Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3257

Figure 5. Example of the automated monoisotopic m/z peak recognition. (a) Positive ion mode nanoESI Q-Tof MS of the equimolar mixture of neutral N-glycan standards Man5, NGA2, NGA2F, and NA2. (b) Expansion of the singly (left) and doubly charged (right) detection area of the Man5 N-glycan. (c) Expansion of the singly (left) and doubly charged (right) detection area of the NGA2 N-glycan. (d) Expansion of the singly (left) and doubly charged (right) detection area of the NGA2F N-glycan. (e) Expansion of the singly (left) and doubly charged (right) detection area of the NA2 N-glycan.

and 1485.72, respectively, all monoisotopic values of sodiated adducts have been properly selected and the charge state determined correctly. System Validation: Application to the Complex Carbohydrate Mixture from the Urine of a CDG Patient. The validation of the SysBioWare platform is demonstrated on an example of the high-throughput analysis of the KLM3 urine fraction (Figure 6a). By manual evaluation of the spectrum only 15 monoisotopic m/z values in the range of relative intensity from 5 to 100% were recognized. However, the vast majority of ions manually recognized as monoisotopic m/z values have been observed in the range from 1 to 5% of relative abundance, contributing additional 42 precursors to the total peak list. Because of the presence of chemical noise clusters, the correct manual recognition was complicated in the intensity range below 1%. At the automated analysis, the program has been tuned to split all recognized peaks into two groups of intensity: from 1 to 5% and from 5 to 100% (Figure 6b). The Peak Detection Wizard parameters were optimized to keep the number of false positive signals as low as possible. Thus, in the relative intensity range from 5 to 100% all 15 manually determined monoisotopic peaks have been recognized correctly without any true negative and false positive signals. From 42 peaks determined manually in the range of 1-5% of relative abundance, none of the false positive signals have been selected and 29 monoisotopic m/z values have been identified correctly by the Peak Detection Wizard. Additionally, 13 peaks 3258

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

have been manually selected from the area of significant influence of the chemical noise clusters. For the automated compositional assignment, the “Components” inlay has been tuned to contain NeuAc, dHex, Hex, HexNAc as potential building blocks. The legitimacy for this has been supported by the anion exchange chromatogram profile of the monosaccharide mixture obtained after total hydrolysis of the sample KLM3.29 Phosphate (P) and sulfate (S) have been chosen as potential modification groups. Since the preliminary glycome investigation of the urine from CDG patients revealed mostly the presence of up to two amino acids linked to the sugar moiety, the number of amino acids potentially attached to glycans in the current example has been restricted to a single Asn, Ser, Thr, and a combination of Pro with Ser or Thr.23-27 This limitation allows the program, together with the applied biological filter, to minimize the number of nonrelevant compositions in the analysis. BioFilter has been tuned to discard structures that contain amino acids without the HexNAc unit and those where the Hex and HexNAc building blocks can not be organized as a full size or as a truncated pentasaccharide core in the presence of Asn amino acid. In addition to that, polyfucosylation of this sample has been considered to be improbable as well. On the basis of the highest charge state of ionic species observed in the current negative ion mode spectrum, [M - H]- and [M - 2H]2- have been selected as potential molecular ion forms. After the activation of the “Recalc” function within the mass deviation window less than (29) Vakhrushev, S. Y. PhD Thesis, University of Muenster, Muenster, Germany, 2006.

Figure 6. (a) Negative ion mode nanoESI Q-Tof MS of the sample KLM3. (b) An example of a high-throughput screening analysis: complex glycoconjugate mixture KLM3 from the urine of a CDG shown on Figure 6a.

30 ppm, the list of possible compositions related to the selected m/z values was computed during 10 s.

From the automatically created peak list in the intensity range from 5-100% of 15 ionic species, 9 have been unambiguously Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3259

assigned as follows: 8 to a singly sialylated free oligosaccharides and 1 at m/z 809.24 to the Hex5(-H2O). Two different compositional proposals have been computed for each of molecular ions at m/z 760.27, 1362.46, and 1403.49 with the highest mass accuracy calculation corresponding to NeuAcHexHexNAcSer, NeuAcHex4HexNAc2, and NeuAcHex3HexNAc3, respectively. Fragmentation analysis of the corresponding precursors (data not shown) confirmed these compositions excluding the pseudoisobaric structures as NeuAcdHexHex2(-H2O), NeuAcHexHexNAc4(S/P), and NeuAcHexNAc5 (S/P), respectively, and allowed us to give preference to them for the final compositional hit list (Table 1). No compositional assignment has been proposed for the abundant ionic species at m/z 787.28, 939.28, and 1249.55 detected in the range of relative intensity from 5-100%. From this fact it can be proposed that these species do not belong to the class of free or amino acid-linked glycans. From 42 molecular ions detected within the relative intensity range 1 to 5%, 12 have been uniquely assigned to free oligosaccharides and 1 to a Thr-linked glycan. For 7 ionic species, at m/z 1112.45, 1143.38, 1208.47, 1271.47, 1313.45, 1565.55, and 1727.58, respectively, two different compositions each were proposed by the platform. According to biosynthetic rules for the glycan assembly, all proposed compositions for ions at m/z 1112.45 and 1208.47 were implausible. This fact reflects further future perspectives for the development of a biofilter library regarding the nonrelevant glycan moieties. With respect to ionic species at m/z 1143.38, 1565.55, and 1727.58, the composition with the highest biological relevance was correlated with the higher mass accuracy calculation. For molecular ions at m/z 1271.47 and 1313.45 under the current instrumental conditions, both proposed compositions could be accepted. For more accurate compositional evaluation for these ions, high resolution MS and/or fragmentation analysis are necessary. Thus, within the time scale of less than 1 min the analysis of the raw mass spectrum by the SysBioWare platform along with the assignment of 30 glycoconjugate species has been accomplished (Table 1). CONCLUSIONS Algorithms for monoisotopic m/z values recognition based the peak shape matching and isotopic grouping have been developed. The peak detection algorithm has been optimized for glycomics applications and successfully tested for mass spectra of different levels of complexity. A tuned BioFilter toolbox, providing filtration of unreliable structures, has been incorporated into the Compositional Module. A novel software platform with integrated Peak

3260

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

Table 1. Software-Assisted Assignment of Molecular [M-H]- Ions within the Relative Intensity 1 to 100% Range Obtained from the KLM3 CDG Urine Sample by the Negative Ion nanoESI IMS Q-TOF MS No.

m/zexp

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

470.15 544.19 599.20 616.22 632.21 655.23 673.22 714.25 760.27 774.28 778.26 809.24 819.29 835.30 876.31 981.35 997.33 1038.37 1079.40 1143.38 1184.42 1200.42 1241.46 1271.47

24

1313.45

25 26 27 28 29 30

1362.46 1403.49 1549.59 1565.55 1727.58 1768.64

a

composition NeuAcHex Hex2HexNAc NeuAc2 NeuAcdHexHex NeuAcHex2 NeuAcHexHexNAc

m/zcalc

470.15 544.19 599.19 616.21 632.20 655.22a 673.23 NeuAcHexNAc2 714.26 NeuAcHexHexNAcSer 760.26 NeuAcHexHexNAcThr 774.28 NeuAcdHexHex2 778.26 Hex5 809.26a NeuAcdHexHexHexNAc 819.29 NeuAcHex2HexNAc 835.28 NeuAcHexHexNAc2 876.31 NeuAcdHexHex2HexNAc 981.34 NeuAcHex3HexNAc 997.34 NeuAcHex2HexNAc2 1038.36 NeuAcHexHexNAc3 1079.39 NeuAcdHexHex3HexNAc 1143.39 NeuAcdHexHex2HexNAc2 1184.42 NeuAcHex3HexNAc2 1200.42 NeuAcHex2HexNAc3 1241.44 NeuAcdHexHex2HexNAc2Ser 1271.45 NeuAcdHex2Hex3HexNAc 1271.44a NeuAc2dHexHexHexNAc2 1313.46 Hex8 1313.43 NeuAcHex4HexNAc2 1362.47 NeuAcHex3HexNAc3 1403.50 NeuAcdHexHex3HexNAc3 1549.55 NeuAcHex4HexNAc3 1565.55 NeuAcHex5HexNAc3 1727.60 NeuAcHex4HexNAc4 1768.63

∆m/z (ppm) 2.13 0.00 16.69 24.34 7.91 18.31 19.31 9.80 13.15 1.29 3.85 21.01 2.44 17.96 2.28 7.13 3.01 2.89 7.41 8.75 4.22 6.66 11.28 14.16 22.81 13.70 15.23 7.34 2.85 21.94 0.64 9.84 6.78

[M(-H2O)-H]-.

Detection Module and Compositions has been developed that provides the option for high-throughput data glycan analysis within the time scale of less than 1 min and for the organization of the assigned structures into a local database for further analysis and correlation. The use of the newly developed SysBioWare platform for analysis of complex glycoconjugate mixtures has been tested on the urine fraction from a CDG patient and validated as a potential tool for clinical applications. Received for review November 13, 2008. Accepted February 27, 2009. AC802408F