Exploring Three-Dimensional Matrix-Assisted Laser Desorption

Jun 20, 2012 - For segmentation, we used an efficient clustering method, called bisecting k-means .... Lee , Nan-Yow Chen , Sung-Sen Yang , Ann Chen ,...
0 downloads 0 Views 444KB Size
Article pubs.acs.org/ac

Exploring Three-Dimensional Matrix-Assisted Laser Desorption/ Ionization Imaging Mass Spectrometry Data: Three-Dimensional Spatial Segmentation of Mouse Kidney Dennis Trede,†,‡ Stefan Schiffler,†,‡ Michael Becker,§ Stefan Wirtz,∥ Klaus Steinhorst,† Jan Strehlow,∥ Michaela Aichler,⊥ Jan Hendrik Kobarg,‡ Janina Oetjen,¶ Andrey Dyatlov,†,‡ Stefan Heldmann,∥ Axel Walch,⊥ Herbert Thiele,†,‡,∥ Peter Maass,*,†,‡ and Theodore Alexandrov*,†,‡,¶ †

Steinbeis Innovation Center for Scientific Computing in Life Sciences, Bremen, Germany Center for Industrial Mathematics, University of Bremen, Bremen, Germany § Bruker Daltonik GmbH, Bremen, Germany ∥ Fraunhofer MEVIS, Institute for Medical Image Computing, Bremen, Germany ⊥ Research Unit Analytical Pathology, Institute of Pathology, Helmholtz Center Munich, Munich, Germany ¶ MALDI Imaging Lab, University of Bremen, Bremen, Germany ‡

S Supporting Information *

ABSTRACT: Three-dimensional (3D) imaging has a significant impact on many challenges of life sciences. Three-dimensional matrix-assisted laser desorption/ ionization imaging mass spectrometry (MALDI-IMS) is an emerging label-free bioanalytical technique capturing the spatial distribution of hundreds of molecular compounds in 3D by providing a MALDI mass spectrum for each spatial point of a 3D sample. Currently, 3D MALDI-IMS cannot tap its full potential due to the lack efficient computational methods for constructing, processing, and visualizing large and complex 3D MALDI-IMS data. We present a new pipeline of efficient computational methods, which enables analysis and interpretation of a 3D MALDIIMS data set. Construction of a MALDI-IMS data set was done according to the state-of-the-art protocols and involved sample preparation, spectra acquisition, spectra preprocessing, and registration of serial sections. For analysis and interpretation of 3D MALDI-IMS data, we applied the spatial segmentation approach which is well-accepted in analysis of two-dimensional (2D) MALDI-IMS data. In line with 2D data analysis, we used edge-preserving 3D image denoising prior to segmentation to reduce strong and chaotic spectrum-tospectrum variation. For segmentation, we used an efficient clustering method, called bisecting k-means, which is optimized for hierarchical clustering of a large 3D MALDI-IMS data set. Using the proposed pipeline, we analyzed a central part of a mouse kidney using 33 serial sections of 3.5 μm thickness after the PAXgene tissue fixation and paraffin embedding. For each serial section, a 2D MALDI-IMS data set was acquired following the standard protocols with the high spatial resolution of 50 μm. Altogether, 512 495 mass spectra were acquired that corresponds to approximately 50 gigabytes of data. After registration of serial sections into a 3D data set, our computational pipeline allowed us to reveal the 3D kidney anatomical structure based on mass spectrometry data only. Finally, automated analysis discovered molecular masses colocalized with major anatomical regions. In the same way, the proposed pipeline can be used for analysis and interpretation of any 3D MALDI-IMS data set in particular of pathological cases.

S

Three-dimensional matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI-IMS) is an emerging label-free 3D imaging technique with high potential in proteomics, lipidomics, and metabolomics.1−4 Three-dimensional MALDI-IMS is based on 2D MALDI-IMS, which in the past decade has proven its value in metabolomics, glycomics, lipidomics, peptidomics, and proteomics.5−10 MALDI-IMS discovers the spatial distribution of hundreds of molecular

ince biology is by and large a three-dimensional (3D) phenomenon, it is hardly surprising that 3D imaging has a significant impact on many challenges of life sciences. Current well-established 3D bioimaging techniques (CT, MRI, PET, SPECT, and ultrasound) are either targeted or labeled, i.e., they either recover anatomy or trace a specific compound in the body. They have great capabilities to either display body or organ anatomy by tracing water or other fluid molecules or to localize a particular labeled compound. Unfortunately, they are not useful in proteomics or metabolomics discovery studies aimed at finding new biomarkers, drugs, and disease pathways. © 2012 American Chemical Society

Received: April 11, 2012 Accepted: June 20, 2012 Published: June 20, 2012 6079

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

structure based on mass spectrometry data only. Finally, automated analysis discovered molecular masses colocalized with major anatomical regions, which correspond to molecular compounds specific to anatomical regions.

compounds in a single measurement by collecting mass spectra across a flat sample (e.g., a tissue section, plant tissue, agar slice). Each mass spectrum is measured at a spatial pixel with an assigned pair of the spatial coordinates x and y and represents a plot of relative abundances of ionizable molecular compounds along their mass over charge (m/z) ratio. An m/z value in MALDI is usually interpreted as the molecular mass, since ions with a charge of +1 prevail. MALDI-IMS can find molecular masses of unknown compounds of specific spatial localization or establish spatial localization of known molecular compounds given their molecular masses. In proteomics, it serves as a superior discovery tool along with the conventional (dried droplet) mass spectrometry and 2D gel electrophoresis in proteomics or as a method to image spatial distribution of molecular compounds, thereby complementing immunohistochemistry or genetics-based methods, like in situ hybridization.11−13 In metabolomics, MALDI-IMS is used for finding new antibiotics14,15 and for imaging of drugs and their metabolites.5,16 The state-of-the-art protocol for 3D MALDI-IMS includes serial sectioning of a sample, measuring each section using 2D MALDI-IMS, and merging individual 2D data sets corresponding to serial sections into one 3D data set.2 A 3D MALDI-IMS data set represents a collection of MALDI mass spectra each with an assigned triple of spatial coordinates (x, y, z). Threedimensional MALDI-IMS inherits the advantages of 2D MALDI-IMS over other bioanalytical techniques. It is labelfree, highly sensitive, semiquantitative,16 can detect a wide range of biomolecules, and can be combined with MS/MS for subsequent identification of biomolecular species.17−19 Several studies have already demonstrated the feasibility of 3D MALDI-IMS and its advantages over 2D MALDI-IMS.1−4 Taking into account the three-dimensional nature of biological specimens, 3D MALDI-IMS is expected to lead to results not possible to be achieved with 2D MALDI-IMS. However, the size of a MALDI-IMS data set containing up to a million of mass spectra poses significant computational challenges when performing even simplest operations like visualization of spatial distribution of a specific m/z value (so-called m/z image). Currently, 3D MALDI-IMS cannot tap its full potential due to the lack of efficient computational methods for constructing, processing, and visualizing large and complex 3D IMS data. We propose a new computational pipeline for construction and analysis of 3D MALDI-IMS data. Construction of a 3D MALDI-IMS data set involves sample preparation, measurement, spectra preprocessing, and registration of serial sections. For mining large 3D MALDI-IMS data, we propose using the spatial segmentation approach which is well-accepted in 2D MALDI-IMS. In line with 2D data analysis, we used edgepreserving 3D image denoising prior to segmentation to reduce strong and chaotic spectrum-to-spectrum variation. For segmentation, we used an efficient clustering method, called bisecting k-means, which is optimized for hierarchical clustering of a large 3D MALDI-IMS data set. We applied the proposed pipeline to analyze a central part of a mouse kidney using 33 serial sections of 3.5 μm thickness after the PAXgene tissue fixation and paraffin embedding. For each individual serial section, a 2D MALDI-IMS data set was acquired following the standard protocols with the high spatial resolution of 50 μm. The data occupies approximately 50 gigabytes. After constructing a 3D data set of individual data sets corresponding to serial sections, we analyzed it using the proposed computational pipeline. The analysis revealed the 3D kidney anatomical



METHODS Tissue Sample. A freshly dissected mouse kidney was immediately fixed for 3 h in PAXgene tissue fixation reagent (PreAnalytiX GmbH, Germany) at room temperature and transferred into PAXgene tissue stabilization reagent for additional 24 h at room temperature according to the manufacturer’s instructions. The tissue was dehydrated in an automated tissue processor and embedded in paraffin. For MALDI-IMS, the tissue was cut into 3.5 μm thick serial sections (33 sections altogether) and mounted onto ITOcoated conductive slides (Bruker Daltonik GmbH, Bremen, Germany). The section thickness was optimized to achieve the best 2D MALDI-IMS quality. MALDI Imaging Mass Spectrometry of Individual Sections. Sections were stored at room temperature and deparaffinized on-slide shortly before the analysis by washing in xylene three times for 5 min each, followed by brief washes in 70% ethanol (twice for 1 min each) and absolute ethanol (1 min). After washing, sections were vacuum-dried in a desiccator for 30 min. MALDI matrix coating was conducted with the ImagePrep device (Bruker Daltonik) using default protocols supplied by the manufacturer. In brief, sinapinic acid matrix (10 g/L in 60% acetonitrile, 0.2% trifluoracetic acid) was deposited by gravitational deposition following a five-phase protocol of 26−132 individual spray cycles. The exact number of spray cycles as well as spray-on and drying times for each cycle was automatically regulated by the ImagePrep’s integrated scattered light sensor to achieve reproducible matrix coverage for each slide. MALDI-IMS data was acquired on an autoflex speed LRF MALDI-TOF mass spectrometer (Bruker Daltonik) equipped with 1 kHz smartbeam II laser, operating in linear mode and positive polarity with the mass range set to 2−20 kDa. The acquisition method was externally calibrated before each run, using a mixture of peptides and proteins spanning a mass range of 1−17 kDa with sinapinic acid matrix spotted on the slide adapter. Laser power and detector gain where manually finetuned before each acquisition to ensure optimal data quality. Data acquisition was controlled by the flexImaging 3.0 software (Bruker Daltonik) with a raster width (i.e., pixel size) of 50 μm. At each raster position, 300 laser shots were summed up to generate a spectrum using the predefined small focus setting (∼20 μm spot diameter) and the random walk option with 25 shots per position. All spectra were measured with the same settings providing one grid of 7677 m/z bins (no binning was done after the acquisition) for all spectra. The number of all acquired spectra is 512 495. For the analysis, we imported the data into our custom-developed data format based on the hierarchical data format (HDF5) providing direct and fast access to spectra without loading them into memory. Spectra Preprocessing. Preprocessing of mass spectra included the following steps: total ion count (TIC) spectra normalization, baseline correction, and data rescaling. For baseline correction we developed the following efficient iterative method. At the first iteration, given a spectrum, we (1) calculated its smoothed version by applying a weighted moving average filter with the Gaussian weights normalized to one and (2) constructed the minimum spectrum by taking at each m/z 6080

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

Peak Picking. For the mass spectrometry peak picking, we used the efficient approach proposed by us earlier21 for 2D MALDI-IMS and later improved.22 First, we considered every 15th spectrum, and for each of them we selected 50 peaks as follows. Each spectrum was modeled as a sum of peaks of the Gaussian shape plus noise. This model assumes that all peaks (1) have Gaussian shape, (2) can have different heights, and (3) can overlap. Representing each Gaussian peak as a convolution of a Dirac δ peak with the Gaussian kernel, the problem of peak picking is equivalent to the mathematical problem of deconvolution. In order to solve this problem, we used orthogonal matching pursuit,23 which is a simple, fast, widely accepted deconvolution method with an easy-interpretable parameter, namely, the number of sought-for peaks. The σ of the best-fit Gaussian kernel was estimated to be equal to 5 using an empirical method that corresponds to the peak width equal to 20 according to the “2σ” rule. Second, after applying the peak picking method to individual spectra, in line with 2D analysis21 we selected only those peaks which were found in at least 1% of all considered spectra. Third, we aligned the found peaks to peaks of the data set mean spectrum in order to counterbalance the effect of slight misalignment between m/z values of different spectra caused by instrumental and experimental variation.22 The alignment merges some m/z values corresponding to a peak into one m/z value corresponding to the maximum of the peak. The peak picking, as well as the rest of the analysis, was done in the SCiLS Lab software (SCiLS). Spatial Three-Dimensional Denoising. Recently,21 we have shown that 2D MALDI-IMS data suffers from a strong spectrum-to-spectrum (or, what is the same, pixel-to-pixel) variation which can be significantly suppressed by image denoising applied to m/z images. The quality of data analysis after an advanced image denoising has been shown to be superior as compared to analysis of data without image denoising. Note that the spectrum-to-spectrum variation in a 3D MALDI-IMS data set is comparable or higher than in individual 2D data sets it is made of. Additional sources of variation in a 3D MALDI-IMS data set include variation between several MALDI plates and sections, random effects of matrix application, between-days variation, protein degradation,24 and biochemical variation of the sample along the spatial coordinate z. In line with our approach to 2D analysis,21 we exploited image denoising for 3D data to reduce this variation computationally. However, denoising of 3D MALDI-IMS data poses an additional challenge. In contrast to 2D MALDI-IMS, where the spectra are acquired at a regular raster of pixels, after registration of serial sections the x−y grids of individual sections are not perfectly aligned to each other. Thus, the coordinates (x, y, z) of 3D acquisition spots represent not a regular 3D grid but a cloud of points in three-dimensional space. Denoising of such data requires special image processing methods which are adapted to 3D data with arbitrary coordinates of spots. The general idea of our proposed 3D image denoising method is as follows. Any image denoising applied to a conventional pixelized image is based on some averaging of intensities of a pixel and its spatial neighbors. Although, as discussed, 3D MALDI-IMS acquisition spots do not form a 3D raster, we still can calculate for each spot its spatial nearest neighbors and apply a distance-based 3D image denoising respecting such neighborhoods. For each selected m/z value,

value the minimum intensity of the spectrum and its smoothed version. The produced spectrum represents the first-iteration baseline. At the next iteration, we applied the same two-step procedure to the baseline produced at the last iteration. Finally, after a specified number of iterations, the resulted baseline was subtracted from the original spectrum. The motivation of this procedure was to find an approximation of the lower envelope of the spectrum, where the lower envelope is the piecewise linear curve going through the local minima of the spectrum. The procedure has two parameters. The first parameter is the number of iterations that specifies how close the baseline is to be to the lower envelope of the spectrum. As the number of iterations increases, the lower spectrum envelope becomes closer to the baseline. The second parameter is the σ of the Gaussian filter used for the weighted moving average; the filter length was calculated as 4 times σ based on the so-called “2σ” rule. The parameter σ specifies the level of detail of the produced baseline. The larger σ, the smoother is the baseline. We performed 10 iterations, with the σ equal to 20 m/z bins and the filter length equal to 80 m/z bins. Note that we specify the width of the smoothing filter in terms of m/z bins but not m/z values. Using a time-of-flight (TOF) analyzer, spectrum intensities are acquired using regular TOF bins and recalculated using the TOF−m/z relationship function that leads to an increase of the width of an m/z bin over m/z values. In our case, the width of an m/z bin was 1.69 m/z values at m/z 5000 and 2.36 at m/z 10000. Finally, all spectra were scaled so that the maximum intensity over all spectra is equal to 255. This is necessary for the later image denoising in order to select denoising parameters independently on a particular intensity range of the data set. The preprocessing was performed in the commercially available SCiLS Lab software [Scientific Computing in Life Sciences (SCiLS), Bremen, Germany]. Registration of Serial Sections. Three-dimensional MALDI-IMS data collection is performed by acquiring 2D MALDI-IMS data for serial sections of a sample and merging all individual data sets into a 3D data set. The important computational step enabling this approach is the so-called registration of serial sections, i.e., the alignment of individual serial sections reconstructing their initial spatial relations prior to the sectioning. First, we aligned microscopy optical images (JPEG format, approximately resolution 3200 × 3200) of serial sections using a rigid image registration. We transformed each image using a combination of shift and rotation so that it is optimally aligned to its serial neighbor according to a similarity measure and additional constraints. During this process, we optimized two parameters simultaneously, one for translation and one for rotation, using the Armijo line search algorithm and a multiresolution approach.20 The steps of our rigid image registration procedure are automatic image segmentation, automatic prealignment using principal component analysis, automatic image registration, and a final easy-to-use manual correction. Once we registered the optical images of all 33 serial sections, we aligned corresponding MALDI-IMS data sets using with the transformation which overlays an optical image with its MALDI-IMS data set. Finally, we introduced a new coordinate z, and assigned z-values to all serial sections so that all spectra of a serial section had equal z-values. As a result of this procedure, we had spatial (x, y, z) coordinates assigned to all spectra. The registration was performed in the MeViSLab software (MeViS, Bremen, Germany) integrated with SCiLS Lab. 6081

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

Figure 1. High-resolution H&E-stained images of the 1st, 8th, and 18th serial sections and exemplar m/z images for these sections. Each m/z image has its own intensity scale encoded with the blue (lowest values) to red (highest values) color gradient.

since normally P is smaller than N. For k-means clustering, we used the correlation distance. Other considered clustering algorithms were bisecting k-means using the city block and Euclidean distances, as well as agglomerative hierarchical clustering with the average linkage and the correlation distance as recommended earlier for 2D MALDI-IMS.28

we denoised its 3D image in two steps. First, we applied the median filter by calculating for a spot the median value of intensities of its six spatial neighbors. Then, we applied the total variation (TV)-minimizing25 method of edge-preserving image denoising. As we discussed,21 informally speaking TV is the sum of absolute differences of intensities between neighboring spots. Noise increases TV significantly. Given an image, a TVminimization algorithm searches for a least-squares approximation of an image at the same time minimizing its TV. This results in a denoised approximation of the original image with edges between contrast regions preserved. We adapted the Chambolle algorithm26 formulated for conventional gray-scaled 2D raster images to a 3D cloud-of-spots data. As the original Chambolle algorithm, our algorithm has two parameters: the level of denoising λ and the number of iterations. The larger is λ or the number of iterations, the stronger is denoising. We used λ equal to 1 and 10 iterations. Three-Dimensional Spatial Segmentation. Spatial segmentation of a 3D MALDI-IMS data is done by clustering spectra into distinct groups, where each spectrum is assigned to one group only using a clustering algorithm. In principle, clustering can be applied to spectra after preprocessing (prior to peak picking and image denoising). However, the peak picking is essential as it normally reduces the size of the data set considerably (10−100-fold). Additionally, it omits m/z values corresponding to nonspecific signals (baseline, noise), thus making the subsequent clustering less affected by noise. The image denoising, as we have demonstrated for 2D MALDI-IMS data,21 suppresses the noise and improves the segmentation map significantly. In this paper, we show that edge-preserving denoising is essential in 3D analysis as well. We used an efficient iterative clustering algorithm called bisecting k-means.27 At the first iteration, the method splits the full data set into two clusters using the k-means algorithm. At next iteration, we split each of the clusters into two subclusters using k-means. The iterations continue until each cluster has only one spectrum. This method produces a binary hierarchical tree similar to the conventional hierarchical clustering which is common in biological research and is practical for interactive data analysis. In the context of segmentation, splitting a specific cluster into segments to perform more detailed analysis is often required. The advantage of the bisecting k-means over the conventional hierarchical clustering is its computational efficiency and reduced memory needs. A hierarchical clustering algorithm computes and stores a matrix of pairwise distances. The size of a distance matrix (N × N/2, where N is the number of spectra) significantly exceeds the size of the data matrix after peak picking (N × P, where P is the number of picked peaks),



RESULTS

Two-Dimensional MALDI-IMS. Before performing 3D analysis, it is important to ensure the reproducibility of measurements of individual sections. Currently, there are no accepted quantitative methods to evaluate the reproducibility of MALDI-IMS data. Thus, we performed visual analysis comparing several individual sections. First, we compared histologically stained (H&E) high-resolution images to evaluate the quality of tissue preparation and serial sectioning, see Figure 1. Second, we evaluated the reproducibility of mass spectral data across the individual sections. Figure 1 shows m/z images (spatial distribution of intensities of m/z values) across three exemplary serial sections for m/z values 4801 (images for preprocessed data and the same images after edge-preserving denoising), 2518, and 8648 m/z values (denoised images). Note the strong and multiplicative pixel-to-pixel variation visible in the raw m/z image 4801. The edge-preserving image denoising using the 2D Chambolle algorithm26 reduced this variation and improved the visualization. The m/z images were overlaid with the corresponding optical images to evaluate the quality of 2D MALDI-IMS data. Visually, the images confirm high reproducibility of both histological and chemical spatial structure. Additionally, the spatial distribution of m/z images correlates with the anatomic structure of the kidney, thus indicating the ability of 2D MALDI-IMS to capture biochemical signals reflecting kidney anatomy. The m/z value 4801 is colocalized with the renal medulla, m/z value 2518 is colocalized with the surrounding of renal pelvis, and m/z value 8648 is colocalized with the renal cortex. Registration of Serial Sections. Registration of optical images of serial sections enables the merge of individual 2D MALDI-IMS data sets into a 3D data set. Figure 2 visualizes the result of registration of optical images of individual sections. Visual examination confirms that the applied approach of rigid registration allowed us to achieve acceptable quality of registration. The anatomical regions (renal pelvis, renal cortex, and renal medulla) can be seen through all sections. Slight misalignment of sections seen at the boundaries can be explained by nonlinear deformations of sections during the 6082

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

assignments. Figure 3 shows the 3D segmentation maps for two, three, and four clusters. When visualizing 3D models (both m/z images and segmentation maps), the z-coordinate (thickness of a slice) is stretched in 15 times for better visualization making it visually similar to the size of a pixel (50 × 50 μm2). One can see that the segmentation maps reproduce a simplified anatomical structure of the mouse kidney. In the segmentation map with four clusters (Figure 3C), the blue cluster represents the renal cortex, the yellow cluster represents the renal pelvis, the green one represents the surroundings of renal pelvis, and the red one represents the renal medulla. The virtual slices through the 3D segmentation map of three clusters (Figure 5) demonstrate the consistent assignments of pixels through the volume of the considered part of the kidney. A 3D segmentation map provides a unique way to select prominent regions of interest in 3D because a manual selection of a region of interest in 3D is not feasible as opposite to 2D data analysis. Selection of a region of interest is important because it provides a way to interpret a 3D MALDI-IMS data set by calculating m/z values colocalized with prominent regions of interest. In order to demonstrate the importance of image denoising applied to selected m/z channels prior to clustering, we computed the segmentation maps with the same procedure but without the prior-to-clustering image denoising, see Supplementary Figure 2 in the Supporting Information. In comparison with segmentation maps in Figure 3, these maps suffer from spectrum-to-spectrum variation significantly. Additionally, the segmentation map with two clusters (Supplementary Figure 2A in the Supporting Information) shows a few serial sections in red color probably due to the section-to-section variation. For a region of interest, a colocalized m/z value is defined as one having high intensities in this region and low intensities in the rest of pixels. Colocalized m/z values can be found by calculating the Pearson correlation between the spatial mask given by the region and the intensities of an m/z image and by selecting statistically significant correlations with p-values smaller than 0.05. Following this approach, we interpreted

Figure 2. Stack of optical images of 33 serial sections aligned using image registration. Only parts of the stack are shown for evaluating the quality of the alignment.

slicing and mounting process, which cannot be compensated by a rigid registration. Peak Picking Results. Selecting 50 peaks per each 15th spectrum led to 5386 candidate m/z values. After selecting only those of them which were found in at least 1% of all considered spectra, we left only 1018 m/z values. Performing alignment of these m/z values to the peaks of the mean spectrum resulted in 173 data set-relevant m/z values, see Supplementary Figure 1 in the Supporting Information. As one can see, all major peaks were detected. Although some peaks visible in the mean spectrum (Supplementary Figure 1 in the Supporting Information) were not selected, this can be explained by the fact that the peak picking searches for peaks which are the most prominent in a large portion of all spectra. In the next steps, the spatial segmentation is performed based on selected 173 m/z values only. Note that when interpreting the segmentation map, the search for colocalized m/z values is done over all original m/z values and not only 173 selected ones. Three-Dimensional Spatial Segmentation and Colocalized m/z Values. After peak picking and 3D edgepreserving image denoising applied to selected m/z images, we clustered the resulted (reduced and processed) spectra. The clustering results were visualized as a 3D segmentation map, where all pixels are color-coded according to their cluster

Figure 3. Three-dimensional spatial segmentation analysis. Three-dimensional segmentation maps for two (A), three (B), and four (C) clusters. The hierarchy relations between clusters are shown in the diagram. 6083

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

Figure 4. Interpretation of the 3D segmentation map with three clusters (Figure 3B) which are shown in blue, green, and red colors. For each cluster, 3D images of the two most colocalized m/z values are visualized with the color gradient. For each m/z image, its low-intensity spots are made transparent for better visualization. The threshold intensity level was adjusted manually.

the segmentation map with three clusters (Figure 3B) and visualized for each cluster the two most colocalized m/z images. Note that the search for colocalized m/z values was done over all m/z values and not only those selected by the peak picking algorithm. This eliminates the risk of omitting m/z values which are colocalized but were not selected by the peak picking, e.g., due to low intensity of the corresponding peaks. For the best visualization, each m/z image was manually adjusted by making low-intensity pixels transparent. For the search for colocalized m/z values, we considered all m/z values and not only those selected by the peak picking. In particular, m/z values 2518 and 2858 colocalized with the second cluster (Figure 4) were not selected by the peak picking, which can be explained by the low intensity of these m/z values (their mean intensity is approximately 5% of the data intensity range). The strong pixel-to-pixel variation observed in 2D m/z images of exemplary serial sections (Figure 1) is visible in 3D m/z images as well. Note the similarity of 3D m/z images to the clusters of segmentation maps produced with image denoising prior to clustering (Figure 3). This, together with colocalization of the 3D m/z images in major anatomical regions, confirms the relevance and correctness of using image denoising. Comparing segmentation maps produced without image denoising (Supplementary Figure 2 in the Supporting Information) with segmentation maps produced with denoising (Figure 3), we conclude that the prior-to-clustering image denoising suppressed spectrum-to-spectrum and section-to-section variation and helped reconstructing biologically relevant regions of interest. The problem of identification of molecular identities of the colocalized m/z values is out of the scope of this paper; for more details on this topic see the recent review.12 Virtual Slices through Three-Dimensional MALDI-IMS Data. A 3D MALDI-IMS data set can be examined by providing virtual slices in any direction through the considered volume. Figure 5 shows virtual slices through the 3D segmentation map with three clusters (Figure 3B) and through a 3D m/z image of m/z 6284 colocalized with the blue cluster. A virtual slice is generated by rotating the 3D model to achieve

Figure 5. Virtual slices through (A) the 3D segmentation map with three clusters (Figure 3B) and (B) the 3D image of m/z value 6284 colocalized with the blue cluster. The red cross indicates the spatial point which all three virtual slices go through.

the desired orientation of the virtual slicing plane, then manually specifying the coordinates (x, y, z) which the plane should go through. In Figure 3, the red cross indicates the spatial point which all three orthogonal virtual slices go through. The section-to-section variation observed earlier in the segmentation map without the prior-to-clustering image denoising (Supplementary Figure 2 in the Supporting Information) is visible in the 3D m/z image (X- and Y-planes, Figure 5B). Notably, the segmentation map produced with the prior-to-clustering image denoising suffers minimally from this variation. 6084

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry



Article

DISCUSSION Role of Computational Methods. The role of computational methods for 3D imaging is of invaluable importance, especially for an emerging technology like 3D MALDI-IMS, where development of computational methods is lagging behind the technological process. Recent technological improvements of spatial and mass resolution, as well as of acquisition speed of MALDI imaging mass spectrometers, led to increase of size of MALDI-IMS data; a gigabyte data set can be produced in few hours. Certainly, efficient mining of such large amount of data requires computational methods. In addition to increased data size, working with 3D data poses specific challenges such as the need for visualization and impracticability of manual drawing of regions of interest. This paper aims to contribute into the new field of computational 3D imaging mass spectrometry, but cannot bridge the existing gap completely. We hypothesize that in the close future the following computational problems will be of high importance in 3D MALDI-IMS: registration of serial sections, the need for interactive analysis, large data size, reduction of data variation, combination of unsupervised and supervised methods, and combination with other 2D and 3D modalities. Role of PAXgene Fixation. Cryopreserved samples are so far the most often used material to perform MALDI-IMS experiments. However, cryosectioning produces pronounced deformation of the tissue sections. Tissue fixation and paraffin embedding help to overcome some limitations in the MALDIIMS workflow in particular for 3D MALDI-IMS. The sectioning of paraffin-embedded tissues is much more accurate and reduces sections deformations. Furthermore, histomorphological details are not preserved accurately in cryosections, and consequently, reliable histological interpretationas possible in paraffin-embedded materialis impaired. Of course, the histology of a healthy kidney is highly ordered and is easy to interpret. However, in diseased tissues the histological situation is more complicated and can be significantly hampered by cryosectioning. The PAXgene tissue system is a two-step approach that is commercially available (PreAnalytiX GmbH, Hilden, Germany) allowing paraffin embedding after fixation. The PAXgene system overcomes the discussed limitations of cryosectioning while guaranteeing intact proteins29 and usable for MALDI-IMS.30 PAXgene fixed tissue yields high-quality spectra compared to cryopreserved tissue sections and with a perfect preservation of morphology that is essential in the serial sectioning-based 3D MALDI-IMS. As a drawback, there is sometimes a slight reduction in signal intensity, which may hinder the detection of low-abundant analytes. Advantages and Disadvantages of the Proposed Pipeline. The proposed analysis pipeline was developed keeping into account the following problems of MALDI-IMS data processing: strong pixel-to-pixel variation, large number of spectra, large amount of m/z values in a spectrum, and limited memory and CPU capacities of a processing workstation. This motivated us not using the following advantageous but memory-intensive algorithms. Global peak alignment31 would allow us to improve peak picking results, but it requires loading all spectra into memory, which is impossible for 3D MALDIIMS (our data set occupies approximately 50 gigabytes). Distance-based or graph-based clustering, e.g., hierarchical or spectral clustering, would allow us to consider non-Euclidean distances between spectra, but it requires loading into memory

the distance matrix which, in 3D MALDI-TOF-IMS, exceeds in size the matrix of spectra. The core of our pipeline is the spatial segmentation of MALDI-IMS data, which provides a concise representation of the full data set in just one image, a spatial segmentation map. A disadvantage of this representation is that the pixels are uniquely assigned to the clusters. In case when a pixel represents a mixture of spectra and can be potentially assigned to a few clusters, its assignment depends on the clustering algorithm. In this case, a so-called soft or fuzzy clustering algorithm providing for each pixel probabilities of belonging to all clusters would be advantageous. On the contrary, visualization of probability maps provided by a soft clustering algorithm is sophisticated. Moreover, soft clustering does not provide regions of interest explicitly. The similar comparison can be made between the proposed approach and component analysis methods such as principal component analysis, probabilistic latent semantic analysis, and non-negative matrix factorization, recently proposed in the context of 2D MALDIIMS.32−34 The use of efficient algorithms, special implementation, and the custom-developed data format allowed us to apply the proposed pipeline for analysis of any number of spectra in a reasonable time (a few hours for the data set reported in this paper) on a workstation, while providing smooth, detailed, and informative segmentation maps for 3D MALDI-IMS data. The custom-developed data format allows us to access spectra from the storage as they are loaded into the memory. This cancels any requirements on the memory, although large data sets are processed slower than the data sets which can be fully loaded into memory. Expected Technological Improvements. The following technological improvements are expected to improve robustness and accessibility of the 3D MALDI-IMS technique. First, easy, working, and published protocols for reproducible sample preparation are necessary. Here recent solutions for reproducible tissue preparation35 and MALDI-compatible tape transfer system4 can be useful. Additionally, reproducibility and optimization of the matrix application is still understudied. Second, MALDI acquisition needs to be accelerated and a recent solution with 5 kHz laser36 can lead to a significant improvement when available on market. Applications. We hope that, with our methods, the 3D MALDI-IMS technique will become a useful and accessible tool in various areas of biomedical analysis, in particular, when efficient and user-friendly software performing the construction, analysis, and interpretation of 3D MALDI-IMS will be available. One field where understanding of spatial molecular composition is of utter importance is histopathology, in particular the analysis of development and spread of tumor cells. Another prominent application is 3D imaging of whole-body or wholeorgan specimens as the natural extension of the highly sensitive and label-free approach for imaging drugs and metabolites directly in tissues (in situ) using 2D whole-body MALDIIMS.37,38 Untargeted whole-body 3D imaging of drugs and their metabolites of a small animal model would provide opportunities not available so far, especially when MALDI-IMS signals are correlated with anatomy or information from MRI or PET. Moreover, the assessment of the tissue proteome by MALDI-IMS before or under administration in combination with drug distributions (pharmacoproteomics) opens the possibility for a comprehensive understanding of the effects of therapeutics. 6085

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

(3) Sinha, T. K.; Khatib-Shahidi, S.; Yankeelov, T. E.; Mapara, K.; Ehtesham, M.; Cornett, D. S.; Dawant, B. M.; Caprioli, R. M.; Gore, J. C. Nat. Methods 2008, 5 (1), 57−59. (4) Seeley, E. H.; Caprioli, R. M. Anal. Chem. 2012, 84 (5), 2105− 2110. (5) Castellino, S.; Groseclose, M. R.; Wagner, D. Bioanalysis 2011, 3 (21), 2427−2441. (6) Watrous, J. D.; Alexandrov, T.; Dorrestein, P. C. J. Mass Spectrom. 2011, 46 (2), 209−222. (7) Watrous, J. D.; Dorrestein, P. C. Nat. Rev. Microbiol. 2011, 9 (9), 683−694. (8) Kaspar, S.; Peukert, M.; Svatos, A.; Matros, A.; Mock, H. P. Proteomics 2011, 11 (9), 1840−1850. (9) Chughtai, K.; Heeren, R. M. Chem. Rev. 2010, 110 (5), 3237− 3277. (10) McDonnell, L. A.; Corthals, G. L.; Willems, S. M.; van Remoortere, A.; van Zeijl, R. J.; Deelder, A. M. J. Proteomics 2010, 73 (10), 1921−1944. (11) Rauser, S.; Deininger, S.-O.; Suckau, D.; Höfler, H.; Walch, A. Expert Rev. Proteomics 2010, 7 (6), 927−941. (12) Walch, A.; Rauser, S.; Deininger, S. O.; Hofler, H. Histochem. Cell Biol. 2008, 130 (3), 421−434. (13) Franck, J.; Arafah, K.; Elayed, M.; Bonnel, D.; Vergara, D.; Jacquet, A.; Vinatier, D.; Wisztorski, M.; Day, R.; Fournier, I.; Salzet, M. Mol. Cell. Proteomics 2009, 8 (9), 2023−2033. (14) Kroiss, J.; Kaltenpoth, M.; Schneider, B.; Schwinger, M. G.; Hertweck, C.; Maddula, R. K.; Strohm, E.; Svatos, A. Nat. Chem. Biol. 2010, 6 (4), 261−263. (15) Yang, Y. L.; Xu, Y.; Straight, P.; Dorrestein, P. C. Nat. Chem. Biol. 2009, 5 (12), 885−887. (16) Rubakhin, S. S.; Jurchen, J. C.; Monroe, E. B.; Sweedler, J. V. Drug Discovery Today 2005, 10 (12), 823−837. (17) Cazares, L. H.; Troyer, D.; Mendrinos, S.; Lance, R. A.; Nyalwidhe, J. O.; Beydoun, H. A.; Clements, M. A.; Drake, R. R.; Semmes, O. J. Clin. Cancer Res. 2009, 15 (17), 5541−5551. (18) Rauser, S.; Marquardt, C.; Balluff, B.; Deininger, S. O.; Albers, C.; Belau, E.; Hartmer, R.; Suckau, D.; Specht, K.; Ebert, M. P.; Schmitt, M.; Aubele, M.; Hofler, H.; Walch, A. J. Proteome Res. 2010, 9 (4), 1854−1863. (19) Lagarrigue, M.; Becker, M.; Lavigne, R.; Deininger, S. O.; Walch, A.; Aubry, F.; Suckau, D.; Pineau, C. Mol. Cell. Proteomics 2011, 10 (3), M110 005991. (20) Boehler, T.; van Straaten, D.; Wirtz, S.; Peitgen, H. O. Comput. Biol. Med. 2011, 41 (6), 340−349. (21) Alexandrov, T.; Becker, M.; Deininger, S. O.; Ernst, G.; Wehder, L.; Grasmair, M.; von Eggeling, F.; Thiele, H.; Maass, P. J. Proteome Res. 2010, 9 (12), 6535−6546. (22) Alexandrov, T.; Kobarg, J. H. Bioinformatics 2011, 27 (13), i230−i238. (23) Denis, L.; Lorenz, D. A.; Trede, D. Inverse Probl. 2009, 25 (11), 115017. (24) Goodwin, R. J.; Lang, A. M.; Allingham, H.; Boren, M.; Pitt, A. R. Proteomics 2010, 10 (9), 1751−1761. (25) Rudin, L.; Osher, S.; Fatemi, E. Proceedings of the 11th Annual International Conference of the Center for Nonlinear Studies on Experimental Mathematics: Computational Issues in Nonlinear Science 1992, 60 (1−4), 259−268. (26) Chambolle, A.; Kokaram, A. J. Math. Imaging Vision 2004, 20 (1), 89−97. (27) Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.; Ng, A.; Liu, B.; Yu, P.; Zhou, Z.-H.; Steinbach, M.; Hand, D.; Steinberg, D. Knowl. Inf. Syst. 2008, 14 (1), 1−37. (28) Deininger, S. O.; Ebert, M. P.; Futterer, A.; Gerhard, M.; Rocken, C. J. Proteome Res. 2008, 7 (12), 5230−5236. (29) Kap, M.; Smedts, F.; Oosterhuis, W.; Winther, R.; Christensen, N.; Reischauer, B.; Viertler, C.; Groelz, D.; Becker, K. F.; Zatloukal, K.; Langer, R.; Slotta-Huspenina, J.; Bodo, K.; de Jong, B.; Oelmuller, U.; Riegman, P. PLoS One 2011, 6 (11), e27704.

In another study, we applied 3D MALDI-IMS for imaging of bacterial natural products in agar plates in collaboration with the Dorrestein Lab, University of California San Diego. In this application, 3D MALDI-IMS revealed signals which are not present at the top layer of the bacterial colony but are localized deeper in the agar which were overlooked by 2D analysis (unpublished results). Imaging of bacterial products is a perfect application for 3D MALDI-IMS, since a high spatial resolution is not necessary, which reduces the acquisition time down to a few hours for a 3D MALDI-IMS data set.



CONCLUSIONS We have considered a yet-unsolved problem of automatic analysis and interpretation of 3D MALDI-IMS data. We have presented efficient computational methods for registration of serial sections, baseline correction, advanced 3D image denoising, clustering, and visualization, which were optimized for large, complex, and noisy 3D MALDI-IMS data. On the basis of these methods, we have proposed a new computational pipeline of analysis and interpretation of 3D MALDI-IMS data, which allows one to establish prominent spatial regions and to find m/z values colocalized with these regions. Using this pipeline, we analyzed a large 3D MALDI-IMS data set of the central part of mouse kidney and detected m/z values colocalized with major anatomical regions. The proposed pipeline can be of use for analysis and interpretation of any 3D MALDI-IMS data set, in particular, in 3D molecular histology and biology.



ASSOCIATED CONTENT

S Supporting Information *

Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected] (P.M.); [email protected] (T.A.). Phone: +49 (0)421 21863820. Fax: +49 (0) 421 218 98 63820. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS D.T., S.S., K.S., and A.D. acknowledge financial support from the WFB Bremen Economic Development (Grant FUE0485B). D.T., J.H.K., and A.D. acknowledge support from the European Union Seventh Framework Programme (Grant 255931). J.O. and H.T. acknowledge the support of the Bundesministerium für Bildung und Forschung (Grant 01IB10004C). A.W. acknowledges the support of the Deutsche Forschungsgemeinschaft (SFB824 TP Z2) and the Bundesministerium für Bildung und Forschung (Grants 0315508A and 01IB100004E). We thank Ingrid Bayer (Helmholtz Center Munich) for excellent technical assistance.



REFERENCES

(1) Crecelius, A. C.; Cornett, D. S.; Caprioli, R. M.; Williams, B.; Dawant, B. M.; Bodenheimer, B. J. Am. Soc. Mass Spectrom. 2005, 16 (7), 1093−1099. (2) Andersson, M.; Groseclose, M. R.; Deutch, A. Y.; Caprioli, R. M. Nat. Methods 2008, 5 (1), 101−108. 6086

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087

Analytical Chemistry

Article

(30) Ergin, B.; Meding, S.; Langer, R.; Kap, M.; Viertler, C.; Schott, C.; Ferch, U.; Riegman, P.; Zatloukal, K.; Walch, A.; Becker, K. F. J. Proteome Res. 2010, 9 (10), 5188−5196. (31) Liu, J.; Yu, W.; Wu, B.; Zhao, H. Cancer Inf. 2008, 6, 217−241. (32) Klerk, L. A.; Broersen, A.; Fletcher, I. W.; van Liere, R.; Heeren, R. M. A. Int. J. Mass Spectrom. 2007, 260 (2−3), 222−236. (33) Hanselmann, M.; Kirchner, M.; Renard, B. Y.; Amstalden, E. R.; Glunde, K.; Heeren, R. M. A.; Hamprecht, F. A. Anal. Chem. 2008, 80 (24), 9649−9658. (34) Jones, E. A.; van Remoortere, A.; van Zeijl, R. J.; Hogendoorn, P. C.; Bovee, J. V.; Deelder, A. M.; McDonnell, L. A. PLoS One 2011, 6 (9), e24913. (35) Goodwin, R. J.; Pennington, S. R.; Pitt, A. R. Proteomics 2008, 8 (18), 3785−3800. (36) Spraggins, J. M.; Caprioli, R. M. J. Am. Soc. Mass Spectrom. 2011, 22 (6), 1022−1031. (37) Stoeckli, M.; Staab, D.; Schweitzer, A. Int. J. Mass Spectrom. 2007, 260 (2−3), 195−202. (38) Khatib-Shahidi, S.; Andersson, M.; Herman, J. L.; Gillespie, T. A.; Caprioli, R. M. Anal. Chem. 2006, 78 (18), 6448−6456.

6087

dx.doi.org/10.1021/ac300673y | Anal. Chem. 2012, 84, 6079−6087