An Analytical Pipeline for MALDI In-Source Decay Mass Spectrometry

Jun 21, 2011 - Successfully coupling these approaches requires the development of new data analysis tools, but first, investigating the properties of ...
1 downloads 6 Views 2MB Size
ARTICLE pubs.acs.org/ac

An Analytical Pipeline for MALDI In-Source Decay Mass Spectrometry Imaging Tyler A. Zimmerman,† Delphine Debois,† Gabriel Mazzucchelli,† Virginie Bertrand,†,‡ Marie-Claire De Pauw-Gillet,‡ and Edwin De Pauw†,* †

Mass Spectrometry Laboratory, ‡Histology-Cytology Laboratory, GIGA-R Systems Biology and Chemical Biology, University of Liege, B-4000 Liege (Sart-Tilman), Belgium

bS Supporting Information ABSTRACT: In-source decay (ISD) fragmentation as combined with matrix-assisted laser desorption/ionization (MALDI) mass spectrometry allows protein sequencing directly from mass spectra. Acquisition of MALDI-ISD mass spectra from tissue samples is achieved using an appropriate MALDI matrix, such as 1,5-diaminonaphthalene (DAN). Recent efforts have focused on combining MALDI-ISD with mass spectrometry imaging (MSI) to provide simultaneous sequencing and localization of proteins over a thin tissue surface. Successfully coupling these approaches requires the development of new data analysis tools, but first, investigating the properties of MALDI-ISD as applied to mixtures of protein standards reveals a high sensitivity to the relative protein ionization efficiency. This finding translates to the protein mixtures found in tissues and is used to inform the development of an analytical pipeline for data analysis in MALDI-ISD MS imaging, including software to identify the most pertinent spectra, to sequence protein mixtures, and to generate ion images for comparison with tissue morphology. The ability to simultaneously identify and localize proteins is demonstrated by using the analytical pipeline on three tissue sections from porcine eye lens, resulting in localizations for crystallins and cytochrome c. The variety of protein identifications provided by MALDI-ISD-MSI between tissue sections creates a discovery tool, and the analytical pipeline makes this process more efficient.

M

atrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) when combined with in-source decay (ISD) fragmentation reveals a series of mass peaks corresponding to a protein sequence through cleaving amino acid bonds along the peptide chain.1 The mass differences between individual peaks are consecutively read from the mass spectrum to provide “top-down” sequencing leading to protein identification.2 This process resembles Edman degradation to determine amino acid sequence,3 but it is comparatively easier, faster, and cheaper to perform. The use of an appropriate MALDI matrix for ISD such as 1,5-diaminonaphthalene (DAN)4 when combined with analyte is sufficient to cause protein fragmentation, and this process has been used to provide sequence information for a variety of carbohydrates,5,6 peptides,7 and proteins,4,8 including the presence of posttranslational modifications.1,9,10 Recent efforts have focused on combining MALDI-ISD with mass spectrometry imaging (MSI) techniques so that both localization and identification of proteins can be performed from tissue sections.11 MSI combines chemical information from mass spectrometry with the spatial information of imaging12 and is a powerful modality to map the localizations of peptides,13 proteins,14 lipids,15 17 and metabolites18,19 within tissue samples. MSI as combined with MALDI works by rastering a UV laser in a grid of positions over the tissue surface, and software is used to reconstruct ion images that show protein localization. MSI has advantages over other imaging techniques in that prior radio- or r 2011 American Chemical Society

fluorescent labeling of a single analyte of interest is not necessary, and instead, information from an entire mass range of analytes is detected in each mass spectrum. Often, new biomolecules are discovered in this fashion from tissue sections using MSI as a discovery tool for cancer biomarkers,20 protein profiles,21 and medical diagnostics.22 Can MALDI-ISD and MSI be combined to provide a wealth of protein information from thin tissue sections? Our laboratory recently published a paper11 that attempts this wherein image acquisition was automated, but the entire data analysis process was performed manually. Indeed, the new data style generated by MALDI-ISD-MSI wherein a protein sequence is present in each spectrum opens opportunities for development of new data analysis protocols. In this paper, the properties of MALDI-ISD are first explored from mixtures of protein standards, from which it is discovered that MALDI-ISD is highly sensitive to the relative ionization efficiencies of proteins. This finding translates to the protein mixtures found in tissue sections and is used to inform the subsequent development of software for the analysis of MALDI-ISD-MSI data sets. In fact, through the creation of an analytical pipeline, we create automated ways to identify correlated peaks that are likely to be part of the same protein sequence, to find the most pertinent Received: May 13, 2011 Accepted: June 21, 2011 Published: June 21, 2011 6090

dx.doi.org/10.1021/ac201221h | Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry spectra for automated de novo sequencing and protein identification, and to generate ion images of protein localizations for final comparison to tissue morphology. The test samples used here for the development of the analytical pipeline are thin slices of porcine eye lens. Recent studies on mammalian eye lens samples have focused on characterizing the properties, including localization, of proteins from the crystallin family that are responsible for maintaining eye lens transparency, and their degradation is implicated in cataract formation. 23,24 Using the developed analytical pipeline, we identify and localize two types of crystallins along with other proteins in thin tissue sections of porcine eye lens. Our current finding that MALDI-ISD is highly sensitive to the relative ionization efficiencies of proteins in mixtures means that the types of proteins seen from the imaging data will change depending upon the relative protein composition between tissue sections. This feature is demonstrated by applying the analytical pipeline to three similar tissue sections of porcine eye lens that result in an advantageous variety of protein identifications and localization information. The ideal application of these newly developed MALDI-ISD-MSI methods, then, is to image several tissue sections to gain the maximum variety of information available, and this process is made more efficient by the developed analytical pipeline.

’ EXPERIMENTAL SECTION MALDI-ISD of Protein Standard Mixtures. Solutions containing mixtures of protein standards in various ratios were created using 100 pmol/μL of total protein and combined with a MALDI matrix solution of 20 mg/mL of 1,5-diaminonaphthalene (DAN) (Acros Organics, Geel, Belgium) in 70:30 vol/vol of acetonitrile/0.1% formic acid. The MALDI matrix was combined with the protein solutions in a 2:1 vol/vol ratio of matrix/analyte on a Bruker MTP 384 solid metal target plate and analyzed with an UltraFlex II MALDI TOF/TOF mass spectrometer (Bruker Daltonics, Bremen, Germany) in reflectron mode. Through manual profiling, 1  104 shots accumulated at 100 Hz from each sample to accumulate the in-source decay fragments. The first sample set consists of bovine serum albumin (BSA) combined with horse heart cytochrome c (CytC) in 1:1, 1:2, and 1:3 concentration ratios of BSA/CytC. The second set consists of BSA, CytC, and myoglobin (Myo) from horse heart combined in 1:2:1, 1:2:2, and 2:4:1 ratios of BSA/ CytC/Myo (protein standards were obtained from SigmaAldrich, Steinheim, Germany). Reference MALDI-ISD spectra were also taken of the individual pure protein standards in the same fashion as for protein mixtures. Traditional MALDIMS to detect intact proteins was performed using non-ISD matrixes (2,5-dihydroxybenzoic acid and sinapinic acid) on the ProteoMass calibration kit protein mixture MSCAL3 from Sigma Aldrich. ISD Sequencing Software. Novel software was created in the Java programming language (www.sun.com, SDK version 1.6.0_20) that automates amino acid sequencing from MALDIISD mass spectra containing ion series from a single protein or those from a protein mixture. Software was also created for an analytical pipeline that performs data analysis from MALDIISD MS tissue imaging data, including ranking of mass spectra,

ARTICLE

protein identification, and generation of ion images of the identified proteins. For MALDI-ISD spectra clearly containing an ion series from a single protein standard, a spectrum is first baseline-subtracted using the TopHat algorithm followed by Savitzky-Golay smoothing with a window of 0.5 m/z and 2 cycles in FlexAnalysis software (Bruker Daltonics). Peak picking parameters were adjusted depending on the data quality, but the thresholds used were 1.5  S/N, 500 counts minimum intensity threshold, 5000 maximum number of peaks, using defaults for the remaining parameters, and with the centroid peak detection algorithm. The resulting mass list is exported by FlexAnalysis to Microsoft Excel, sorted in descending order, and saved as a text file. The in-house written Java code for ISD sequencing reads both this peak list file and a file containing a list of amino acids including amino acid combinations of up to three amino acids and posttranslational modifications and all of their associated monoisotopic masses. Mass differences are calculated between successive masses in the peak list, and these are directly compared with the known masses of amino acids, starting at the higher end of the peak list and moving toward lower masses, which for the z ion series results in amino acid sequencing from the C-terminal direction toward the N-terminus, and the resulting tag being usually found closer to the N-terminus. If redundant ion series are present for a single protein, this interference can lead to false sequencing from the Java software, so the z + 2 and c ion series that are commonly seen in MALDIISD are kept, but peaks corresponding to the sometimes present a [c 45.0215], b [c 17.0266], b-H2O [c 35.0371], and y [z + 2 + 14.0076] ion series can be calculated by known mass shifts from the z + 2 and c ion series and automatically removed by the Java software. The values of these constant and known mass shifts were calculated using ProteinProspector version 5.7.3 (http://prospector.ucsf.edu/prospector/mshome.html) on an example protein sequence. Even after removal of redundant ion series, false identifications can occur as a result of other peaks that are not identifiable as deriving from a redundant ion series or because FlexAnalysis often reports mass lists with peaks that are less than 2 Da apart, or sometimes noise peaks are present. To address these issues, if the difference between the current peak and the immediately following peak does not correspond to anything in the amino acid list, the mass of the immediately following peak is skipped and the mass differences between the current peak and the following six masses in the peak list are calculated to find the best amino acid assignment. If several assignments are possible by matching these values to the list of known amino acid masses, the final assignment is made on the basis of the assignment having the smallest mass error. Once an amino acid assignment for the sequence is made, a variable named CurrentExtremeMass in the Java code is set to the value of the lowest currently assigned mass if sequencing is performed from higher to lower masses so that false assignments owing to interference from the higher masses are avoided. Near or exact amino acid mass degeneracies can lead to false identifications, such as glutamine (128.0586), which has the same mass as the combination of glycine (57.0215) and alanine (71.0371) and is also near lysine (128.0950). Of course, with good spectral quality, it is possible to discriminate AG from Q if a peak corresponding to loss of glycine [M 57]+ or alanine [M 71]+ is present, but because of small mass errors in the automated peak picking, it is sometimes difficult to discriminate between K/Q. The Java code attempts to make the correct 6091

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry identification by making an assignment with the smallest mass error, but it also outputs near and exact amino acid matches concurrently. It is known that proline residues are skipped by in-source decay fragmentation, where the MALDI-ISD spectrum will contain a gap at [M 97]+ where a peak would correspond to loss of proline.11 Instead, the mass difference to the immediately following peak is a total of two amino acids, proline and X. The Java code additionally accounts for this “proline gap”, first by subtracting the mass of proline (97.0528) from the mass difference and then comparing the remaining mass distance with the masses of known amino acids. For ISD spectra that contain the ion series of two or more proteins, the above Java software was altered to remove either user-inputted known masses or to automatically identify one protein sequence in the manner described above and remove it. After removal of the masses corresponding to the first identified sequence, the sequencing algorithm as described above is applied a second time to the remaining masses to identify a second protein sequence, and this process can be iterated if more proteins are present. All of this software is freely available online at http://www.mslab.ulg.ac.be/pages/software.html. ISD Tissue Imaging. All details previously described11 for procedures on obtaining animals, tissue dissection, tissue sectioning, matrix application parameters with the ImagePrep (Bruker Daltonics), and use of FlexImaging software (Bruker Daltonics) to automate image acquisition with the UltraFlex II mass spectrometer are the same experimental details used here. However, images were acquired only from porcine eye lens samples and with exclusive use of the DAN MALDI matrix. In addition, images from three separate imaging experiments are presented here from three slices of porcine eye lens. Total imaged positions in the three experiments were 2343, 2251, and 4239, and spatial resolution in all cases was 100 μm. Spectra from each imaging run were baseline-subtracted (TopHat) and smoothed (0.5 m/z Savitzky Golay window, 2 cycles), and peaks were chosen (thresholds of 1.5 S/N, total possible peaks at 5000, peak width 1.0 Da) by a batch processing macro written in FlexAnalysis software (Bruker Daltonics) that resulted in the peak lists for each spectrum being automatically generated into each spectrum folder in XML format. Analytical Pipeline Software for ISD Imaging. The Java software works in a batch process to retrieve peak list masses and relative intensity values from the individual XML peak lists while using the document object model (DOM) parser that is a preexisting part of the Java SDK that can be adapted for handling XML files. In addition, the X Y coordinates that were used by FlexImaging to automate spectral acquisition with the Bruker UltraFlex II instrument are then read by the Java code in a batch fashion from the spectral folder names, which are ultimately used to plot ion images. The mass values in all the peak lists that are above the mass region of matrix clusters (>1000 Da) are binned into 1.0 Da bins, and their frequencies are tallied to find the most common peaks throughout the entire imaging data set. A correlation matrix is then constructed between the 20 most common mass bins in terms of the number of times peaks pair together in the individual spectra of the imaging run, and this correlation matrix is outputted as the first step of the Java software whereon the software pauses. The user consults this matrix to create a mass list of ∼10 highly correlated masses that are likely to result from the same protein sequence. The default setting of the software is to use the first ∼10 of the most common

ARTICLE

mass bins. The software then ranks the spectra/position combinations that are the most relevant to these highly correlated peaks, and the results are outputted in a text file that is consulted to manually look up and display in FlexAnalysis the most relevant spectra for a protein sequence. The peak list for a selected spectrum is then exported from FlexAnalysis and used with the previous Java code described above for performing automated sequencing, followed by protein identification by a BLASTP search against the nonredundant protein database (Basic Local Alignment Search Tool for Proteins, http://www.expasy.ch/ tools/blast/). The analytical pipeline Java software also outputs a ranked list of high quality spectra that contain a set of peaks that is orthogonal to the original set. The highest scoring spectra will contain the least number of the highly correlated peaks that were used to generate the first ion image. A spectral quality threshold is also applied so that the ranked spectra must also contain a certain number of peaks above the region of matrix cluster peaks (i.e., greater than ∼1000 Da). This orthogonal set of peaks is likely to correspond to a second protein sequence or to the C-terminal sequence of the same protein. As with the ranked spectra list for the first protein sequence, the second ranked list is used to select spectra from the imaging data set on which to perform sequencing followed by protein identification. Finally, the software plots a series of ion images for the different proteins that have been sequenced/identified, and these images are compared with optical images of the original tissue morphology. This process is iterated for more proteins. The analytical pipeline software both as a source code and as a graphical-user interface (GUI) is freely available online at http://www.mslab.ulg.ac.be/pages/ software.html

’ RESULTS AND DISCUSSION In-source decay protein sequencing using DAN as a MALDI matrix is combined with mass spectrometry imaging of tissues, but to understand the protein mixtures that occur in tissues, the first part of this study explores the properties of MALDI in-source decay as applied to protein standards mixtures. For example, the MALDI-ISD spectra of two protein standards, CytC in Figure 1A and BSA in Figure 1B, show very different fragmentation characteristics. Although the spectrum of BSA has a clearly identifiable ion series that gives a clear amino acid sequence, for CytC there are extra peaks belonging either to other ion series or to other fragmentation pathways. High fragmentation is not surprising for CytC, given that the protein sequence contains comparatively more basic amino acid residues, lysine and arginine, than acidic ones and that this highly charged protein can lead to a variety of fragmentation pathways.25,26 Thus, less easily fragmented proteins such as BSA are likely to be more easily sequenced using MALDI-ISD. While taking these challenges into consideration, new software has been developed to automate amino acid sequencing from MALDI-ISD spectra. The MALDI-ISD sequencing software as developed here is able to handle interference from interference peaks through a combination of optimized peak picking with a high intensity threshold through automated removal of peaks from redundant ion series and through error minimization. As explained above in the Experimental section, several additional adjustments improve results, such as sorting all peak lists in descending order to allow automated sequencing to begin with a high-mass region peak that is more likely related to a protein sequence, rather than in the 6092

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry

Figure 1. The dependence of protein signal on relative protein concentration in protein mixtures in MALDI-ISD points to a dependence on the relative ionization efficiency of the proteins. Panels A and B show MALDI-ISD spectra of pure protein standards cytochrome c and bovine serum albumin, respectively, that are used here for comparison purposes. Panels C E show a concentration gradient of increasing CytC relative to BSA, with only panel D showing an ideal concentration ratio for the detection of this protein mixture, whereas panel C is dominated by BSA and panel E shows only CytC peaks. In addition, the sequence annotation of CytC in panel A is known to be correct because it is a continuation of the more easily seen sequence from the higher mass region that is not displayed in this close-up view.

lower mass matrix cluster region. In addition, peak picking using the SNAP method in FlexAnalysis often does not correctly pick the monoisotopic peak unless the spectrum quality is nearly perfect with little-to-no noise, which is difficult to achieve with imaging data. By instead smoothing the spectrum to eliminate the peak isotopic distribution and using the “Centroid” method to select the average mass values, this gives more reproducible mass differences between successive peaks for more accurate protein sequencing. In fact, even though the spectra in the MALDI-ISD imaging runs presented in this paper were collected in reflectron mode, the use of linear mode followed by selection of the peak average masses in future experiments ought to further improve the quality of the sequence data obtained by the software. The resulting MALDI-ISD sequencing software performs well on determining protein sequences from protein standards. For the MALDI-ISD spectrum of CytC in Figure 1A, the ISD sequencing software reports a nine amino acid sequence tag, including amino acid mass degeneracies, of [A[P], G, F.M(+ 16),

ARTICLE

T, Y, T, D, A] where commas separate the positions and periods separate the mass degenerate amino acids. The symbol of [P] represents a proline that is predicted by the software using the aforementioned “proline gap” prediction algorithm. One combination from this sequence tag is APGFTYTDA, which when searched with BLASTP gives matches with 100% query sequence coverage to cytochrome c from a variety of organisms, including to the expected sequence from horse (Equus caballus). For the more clear spectrum produced by BSA standard in Figure 1B, the ISD sequencing software produces a 13 amino acid sequence tag with both near and exact amino acid mass degeneracies of [Q.K, G, L.I, V, L.I, L.I, A, F.M(+ 16), S, Q.K, Y, L.I, Q.K], of which one combination is KGLVLIAFSQYLQ for which BLASTP gives a top hit of bovine serum albumin with 100% query coverage. Comparing Figure 1A,B shows that there are more interference peaks for CytC, a highly charged protein that is more easily fragmented. Thus, as expected, a longer sequence tag was produced for the less easily fragmented BSA. From both protein standards, however, a sequence tag capable of providing the correct protein identification was able to be automatically generated by the ISD sequencing software. For protein mixtures, the effect of the relative concentration of constituent proteins was tested along a gradient of protein/ protein standards ratios. In Figure 1C, which shows a 1:1 ratio of CytC/BSA, the BSA signal is stronger than for CytC, possibly because of a higher ionization efficiency for BSA, and this allows only sequencing of BSA. Although a few CytC peaks appear in this spectrum, there are not enough to allow the generation of a sequence tag for CytC of more than 1 2 amino acids in length. Conversely, in Figure 1E, where the concentration of CytC is three times that of BSA, the signal owing to CytC is stronger, and BSA peaks are almost absent. Only in Figure 1D with an apparently optimal 1:2 concentration ratio of BSA/CytC are the MALDI-ISD fragmentation series from both proteins significantly present to allow the generation of sequences tags from both proteins in a single spectrum. Of course, for sequencing, it is desirable to have a clean spectrum showing only a single protein with less possibility for interference, but this is not always available. In fact, for MALDIISD tissue imaging, individual spectra sometimes contain the MALDI-ISD patterns of two proteins.11 For these situations, the ISD sequencing software was adapted to be able to identify separate proteins from protein mixtures. Peak lists from reference spectra of pure protein standards are available, so the code first subtracted the known peaks of BSA from the peak list of the mixture spectrum of the optimal concentration ratio in Figure 1D, giving a sequence tag including amino acid mass degeneracies, of [E, Q.K, G, G, Q.K, H, Q.K, T, G[P]], of which one combination is EKGGKHKTGP. When searched with BLASTP, this sequence matches cytochrome c sequences from a variety of organisms with 100% query sequence coverage, including the expected sequence from horse (E. caballus). After automatically removing the peaks of this sequence tag from the peak list, the second sequence tag produced by the software, including mass degeneracies, is [L.I, G, E, E, H, F.M(+ 16), Q.K, AV.GL, V, L.I, L.I, A, F.M(+16), S, Q.K], of which one combination is LGEEHFKGLVLIAFSQ, which when searched with BLASTP gives a top hit of bovine serum albumin with 100% query coverage. The annotated spectra showing the above sequence tags are presented in Figure S-1 in the Supporting Information. Two proteins can appear in a single MALDI-ISD spectrum, even the presence of three proteins 6093

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry

Figure 2. Analytical pipeline workflow for MALDI-ISD imaging data sets. In the first step, spectral information is read into the Java heap space for subsequent computation. The masses are then binned throughout all peak lists over all positions in the imaging data. The most common bins/ peaks are put into a correlation matrix, where the values correspond to the number of co-occurrences of peaks in the spectra. It was found after analyzing several imaging data sets that the first 7 10 most common peaks are usually correlated with each other and, thus, likely to be a part of the same protein sequence. All spectra are then ranked according to the presence of these correlated peaks, allowing selection of an appropriate spectrum for de novo sequencing. Ion images are then generated, and an orthogonal ranking is performed to identify a second protein sequence and generate an ion image and so on for more proteins.

simultaneously is possible at an optimum concentration ratio (see Figure S-2 in the Supporting Information). The software developed here is therefore useful in extracting separate protein sequences from such complex mixture spectra. Other applications of this developed software could include sequencing of protein mixtures found from gel spots. Although the software for determining multiple protein sequences from a single MALDI-ISD spectrum is useful in some applications, the main conclusion from Figure 1 is that the presence of a protein in a MALDI-ISD protein mixture spectrum is highly dependent on the relative protein concentration ratio. This points to a large sensitivity of MALDI in-source decay to the effects of different ionization efficiencies between proteins in mixtures, as also seen for a traditional MALDI-MS analysis of intact proteins (Figure S-3 in the Supporting Information). A high sensitivity of MALDI-ISD to the relative protein ionization efficiency is not surprising, given the rarity of the simultaneous presence of multiple proteins in both the experiments with protein standards presented in the previous paragraph and from

ARTICLE

previously reported MALDI-ISD spectra from tissue samples.11 In both cases, rarely were ion series of more than a single protein significantly present in an individual spectrum, and the presence of three or more proteins in an individual spectrum has thus far not been seen in our MALDI-ISD spectra from tissue samples. In tissue imaging experiments using MALDI-ISD, sensitivity to ionization efficiency means that the most efficiently ionized one or two proteins in a particular tissue region will be present and are likely to suppress the other proteins below the detection limit. This characteristic of MALDI-ISD results in reduction of spectral complexity and several advantages for tissue imaging; the relative signal of proteins changes drastically due to large sensitivity to changes in protein composition between differing tissue sections and therefore due to the relative protein ionization efficiency. Thus, although MALDI-ISD as applied to tissue imaging will provide reproducible results between similar sections, it also provides an advantageous variety of protein identifications that results from the nature of the proteins between differing sections. The final section of this paper demonstrates this diversity of protein identifications on three similar tissue sections, all taken from porcine eye lens. But first, the following section outlines an analytical pipeline for the analysis of MALDIISD imaging data that involves both newly created software tools and preexisting commercial software to enable the user to tightly control the entire data analysis process from signal processing to the final visualization of protein localizations. Both commercial and custom analytical pipelines for traditional MS imaging data sets already exist from Bruker and other sources,12,27 29 but the format of the data in MALDI-ISD imaging is different from traditional imaging in that spectra contain a protein sequence instead of several intact parent masses. The characteristics of MALDI-ISD open new opportunities for computational software, such as for sequence determination30,31 from protein mixtures32 without the availability in this case of spectral libraries; the possibility of identifying PTMs;10 accounting for other particularities of the MALDIISD process, such as the proline gap;11 and removal of interference peaks and redundant ion series peaks. Furthermore, the ability to determine amino acid sequences combined with the spatial information of imaging opens the possibility for automated data analysis through new software that is tailored to MALDI-ISD imaging data sets acquired on the Bruker UltraFlex instrument series, and the resulting software is adaptable to other instrumental platforms. Figure 2 shows a workflow for the MALDI-ISD MS imaging analytical pipeline, which is a series of analytical tools requiring minimal decisions from the user at certain steps while the rest of the steps are fully automated by the Java software. In the first step of Figure 2, all of the necessary information is read into the Java heap space in a batch fashion. The in-house written code for this would also be useful if adapted to other traditional imaging experiments because there is currently no option in FlexAnalysis software for batch exporting of peak lists. In the second step of Figure 2, reading of the data is followed by binning the masses of all the peak lists throughout the entire imaging data set. The occurrence frequencies of peaks within these 1.0 Da mass bins are tallied, and the most common mass bins are put into a correlation matrix, where the values in the matrix correspond to the total times a peak pair co-occurs in individual spectra throughout the imaging data set. The use of such a correlation matrix is particularly suited to the analysis of MALDI-ISD imaging data because peaks that co-occur are likely to be part of the same 6094

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry

Figure 3. Ion images generated as an output of the analytical pipeline for MALDI-ISD MS imaging for three sections of porcine eye lens. Panels A, D, and G are optical images of tissue sections. Panel A shows imaging of half of a morphologically symmetrical tissue, and panel G was taken after DAN matrix application, showing a homogeneous matrix layer. All scale bars are 0.2 cm. Panel B shows an ion image for R-crystallin chain A and panel C shows that for cytochrome c. Panels E and F show β-crystallin and an unidentified protein from the noise, respectively. Panels H and I show β-crystallin and R-crystallin chain A, respectively.

protein sequence. At this point, the user is required to inspect the correlation matrix to identify a list of highly correlated peaks that will be used in the subsequent step. From inspecting several correlation matrixes generated from different imaging data sets, it was found that the first 7 10 most common peaks are usually highly correlated, as signified by the high values in the example correlation matrix in Figure 2, of which the full matrix with peak labels is shown in Figure S-4 (Supporting Information). In contrast, after 7 10 peaks, the correlation tends to fall sharply, as also demonstrated in the example correlation matrix. These later peaks co-occur with less frequency and are unlikely to make a part of the protein sequence corresponding to the more highly correlated peaks. Thus, the software’s default setting is to use the first 10 most common peaks for the subsequent step, but this correlation should be verified by manual inspection of the correlation matrix to observe correlated peaks, and this takes about 2 min to perform. If the default setting is not appropriate, the user has an option to insert a corrected correlated peak list into the software. This is made easier using a GUI for the software, of which there is a screen shot in Figure S-5 in the Supporting Information and is freely available for use at: http:// www.mslab.ulg.ac.be/pages/software.html. The third step of Figure 2 uses the recently created list of correlated peaks and assigns a score to each spectrum in the imaging data set, where the score simply depends on how many of the peaks from the correlated peak list are present. For this step, the software’s output is a ranked list of spectra, labeled by their X Y positions on the tissue surface, where the top-ranked spectra are likely to contain a full protein sequence. The X Y position spectra labels allow the user to quickly open and inspect a given spectrum in the FlexAnalysis software. Given the case in imaging experiments that the spectral quality is often highly variable over the tissue surface with many hot spots of good matrix crystallization alongside positions of low signal, the output of this software is useful for finding good quality spectra that also contain protein sequences. Indeed, it was found that the outputted ranked list of spectra consistently points to these desirable

ARTICLE

spectra. Of these, of course, some are still better than others, and the user may try the first few spectra from the list to find the best one for generation of a de novo sequence tag. Sequence determination can be done using the aforementioned ISD sequencing software that was used above to provide the sequences of protein standards, or annotation can be performed manually if the spectrum contains many redundant ion series or excess fragmentation products that tend to complicate the automated sequence determination. For the final step of ion image generation, the software uses the list of spectra ranked for the presence of highly correlated peaks and plots the average ion intensity of protein sequence peaks at the spectra positions on the tissue surface. The resulting ion image can then be compared with an optical image of the tissue slice morphology that was taken before matrix application, as shown, for example, in Figure 3A. To find, sequence, and plot the localization of a second protein sequence, a ranking of spectra is performed that is orthogonal to the original ranking in that the highest scoring spectra will contain the least number of the highly correlated peaks that were used to generate the first ion image. In the same way as for the first protein, the highest ranked spectra are opened in FlexAnalysis, a sequence tag is either automatically or manually generated, and an ion image is created. These steps can be iterated to search for spectra containing the sequence of a third protein with a ranking that is orthogonal to both previous rankings, and so on, for more proteins, followed again by protein identification and the plotting of corresponding ion images, until iteration finally gives rankings of spectra containing only noise and the data set is completely characterized. Figure 3 shows an application of the analytical pipeline to identify and localize proteins in three similar tissue sections from porcine eye lens. Panel A shows an optical image of half of a symmetrical eye lens tissue slice that was imaged and panels B and C show corresponding protein localizations of R-crystallin chain A and cytochrome c, respectively. Panel B shows R-crystallin chain A to be localized mainly toward the periphery of the tissue section; this localization matches previously published results.33 After the analytical pipeline produced a ranked list of spectra, a spectrum was selected, and through a combination of automated and manual de novo sequencing, a sequence tag of RALGPFYPSRLFDQF was generated that when searched with BLASTP against the nonredundant protein database gives an unambiguous identification with 100% query coverage of R-crystallin chain A from a variety of homologous sequences in different mammalian species. Of course, an identification can be made because it is known that the sample originates from pig (Sus scrofa). The identified R-crystallin chain A protein (mass of ∼20 kDa, monomer) is a heat-shock protein known to be present in porcine eye lens. Its functions are maintaining lens transparency and refractive index and to act as a chaperone protein34 to prevent unfolding/aggregation of various lens proteins, thereby preventing cataract formation.23,24 Applying the orthogonal spectral ranking to search for a second protein sequence gives the sequence tag VEKGGKHKTGPNLHGLFGR, which unambiguously identifies using BLASTP the protein sequence of cytochrome c from a variety of organisms including from S. scrofa. Cytochrome c is likely to be the most efficiently ionized protein in the regions where it was detected in Figure 3C. Cytochrome c is also known to be present in mammalian eye lens35 and is important in mitochondrial 6095

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry respiration but also in mediating cellular apoptosis. Indeed, naturally occurring oxidation of CytC was previously implicated in inhibition of epithelial cell growth in eye lens samples.36 Selected spectra from the imaging runs are provided in Figure S-6 (Supporting Information) and are annotated with the protein sequence tags used above for protein identification. Figure 3D is an optical image followed by panel E, showing β-crystallin S-like with a sequence tag of FQGRHYDSDCDCT having unambiguous identification using BLASTP. β-Crystallin roughly colocalizes with R-crystallin chain A from Figure 3B and is also mostly localized to the periphery of the tissue section. β-Crystallin is part of a larger βγ-crystallin superfamily that is separate from R-crystallin and is known to be critical for lens clarity and refraction.37 For the same tissue section, Figure 3F shows the localization of an unidentified protein sequence with a tag of [A, S, F.(M + 16), Q.K, G, R, Q.K, H, T]. The peaks corresponding to this secondary sequence tag were close to noise level in several spectra that also contained β-crystallin peaks, which may be the cause of the signal suppression. As concluded from the protein standards mixtures of Figure 1, in the case that a highly ionizable protein such as β-crystallin is omnipresent over the tissue surface, it is likely to suppress ion series from other proteins toward the noise level. However, in such cases, an advantage is that the initially found protein sequence of β-crystallin in Figure 3E is so present that it is virtually uninhibited by signal suppression, thereby providing an accurate map of protein localization and removing one of the major barriers to accurate protein quantitation measurements. Figure 3G is an optical image of a tissue slice much like the other tissue slices, but the optical image was taken after the application of the DAN matrix and shows a homogeneous matrix crystallization layer over the tissue surface. Panel H shows corresponding localizations of β-crystallin S-like with a sequence tag of ASFQGRHYDSDCDCT, as identified with BLASTP, and panel I shows R-crystallin chain A with a tag of ALGPFYPSRLFDQF. β-Crystallin, as before in panel E, is again mostly confined to the periphery of the tissue and is the primary detected protein, whereas R-crystallin is secondary. The spatial resolution for the presented ion images of 100 μm is routinely obtained in MALDI-MSI of tissue sections, but because the DAN matrix forms small matrix crystals over the tissue surface, as seen in panel G, the spatial resolution could be further improved using a variety of preexisting instrumental 38 and sample preparation 39,40 methods for MALDI-MSI. The analytical pipeline for MALDI-ISD MSI is efficient in aiding the user to find appropriate spectra to identify and map proteins in tissues while also allowing control of optional inputs. In fact, after spectral batch processing and using software defaults, the time needed to generate ion images and make protein identifications from an imaging data set is about 5 min, but optionally performing manual instead of automated de novo sequencing adds ∼1 h to the total analysis time. This efficiency of the analytical pipeline is ideal when analyzing several imaging data sets to obtain the advantageous variety of information that can be found between tissue sections when using MALDI-ISD. Future work will test the effects of different ISD-suitable MALDI matrixes on signal suppression, adapt the imaging software toward the mapping of posttranslational modifications9,10 and protein isoforms8 in tissues, and test other applications involving protein mixtures.

ARTICLE

’ ASSOCIATED CONTENT

bS

Supporting Information. Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*Address: Mass Spectrometry Laboratory, Department of Chemistry, University of Liege, Institut de Chimie, Bat. B6c B-4000, Liege (Sart-Tilman), Belgium. E-mail: [email protected].

’ ACKNOWLEDGMENT The authors thank Dr. Loic Quinton and Dr. Nicolas Smargiasso for discussions about ISD. T.A.Z. acknowledges a postdoctoral fellowship from the University of Liege. The FRS-FNRS (Fonds National de la Recherche Scientifique, Belgium) is acknowledged for a postdoctoral fellowship to D.D., and the Lionel Project (Grant no. R.RWAL.1793) is acknowledged for a predoctoral fellowship to V.B. ’ REFERENCES (1) K€ocher, T.; Engstr€om, Å.; Zubarev, R. A. Anal. Chem. 2005, 77, 172–177. (2) Nakazawa, T.; Yamaguchi, M.; Okamura, T.-A.; Ando, E.; Nishimura, O.; Tsunasawa, S. Proteomics 2008, 8, 673–685. (3) Liu, P.; Bakalarski, C.; Sandoval, W. J. Biomol. Tech. 2010, 21 (3 Suppl), S49. (4) Demeure, K.; Quinton, L.; Gabelica, V.; Pauw, E. D. Anal. Chem. 2007, 79, 8678–8685. (5) Smargiasso, N.; Pauw, E. D. Anal. Chem. 2010, 82, 9248–9253. (6) Yamagaki, T.; Suzuki, H.; Tachibana, K. Anal. Chem. 2005, 77, 1701–1707. (7) Delvolve, A.; Woods, A. S. Anal. Chem. 2009, 81, 9585–9589. (8) Calligaris, D.; Villard, C.; Terras, L.; Braguer, D.; Verdier-Pinard, P.; Lafitte, D. Anal. Chem. 2010, 82, 6176–6184. (9) Lennon, J. J.; Walsh, K. A. Protein Sci. 1999, 8, 2487–2493. (10) Soltwisch, J.; Dreisewerd, K. Anal. Chem. 2010, 82, 5628–5635. (11) Debois, D.; Bertrand, V.; Quinton, L.; Pauw-Gillet, M. C. D.; Pauw, E. D. Anal. Chem. 2010, 82, 4036–4045. (12) Zimmerman, T. A.; Monroe, E. B.; Tucker, K. R.; Rubakhin, S. S.; Sweedler, J. V. Methods Cell Biol. 2008, 89, 361–390. (13) Rubakhin, S. S.; Hatcher, N. G.; Monroe, E. B.; Heien, M. L.; Sweedler, J. V. Curr. Pharm. Des. 2007, 13, 3325–3334. (14) Franck, J.; Longuespee, R.; Wisztorski, M.; Remoortere, A. V.; Zeijl, R. V.; Deelder, A.; Salzet, M.; McDonnell, L.; Fournier, I. Med. Sci. Monit. 2010, 16, BR293–BR299. (15) Cerruti, C. D.; Touboul, D.; Guerineau, V.; Petit, V. W.; Laprevote, O.; Brunelle, A. Anal. Bioanal. Chem. 2011, 401, 75–87. (16) Ostrowski, S. G.; Bell, C. T. V.; Winograd, N.; Ewing, A. G. Science 2004, 305, 71–73. (17) Murphy, R. C.; Hankin, J. A.; Barkley, R. M. J. Lipid Res. 2009, No. Apr. Suppl, S317–S322. (18) Porta, T.; Grivet, C.; Kraemer, T.; Varesio, E.; Hopfgartner, G. Anal. Chem. 2011, 83, 4266–4272. (19) Liu, W.-T.; Yang, Y.-L.; Xu, Y.; Lamsa, A.; Haste, N. M.; Yang, J. Y.; Ng, J.; Gonzalez, D.; Ellermeier, C. D.; Straight, P. D.; Pevzner, P. A.; Pogliano, J.; Nizet, V.; Pogliano, K.; Dorrestein, P. C. Proc. Natl. Acad. Sci. 2010, 107, 16286–16290. (20) Cazares, L. H.; Troyer, D.; Mendrinos, S.; Lance, R. A.; Nyalwidhe, J. O.; Beydoun, H. A.; Clements, M. A.; Drake, R. R.; Semmes, O. J. Clin. Cancer Res. 2009, 15, 5541–5551. (21) Burnum, K. E.; Tranguch, S.; Mi, D.; Daikoku, T.; Dey, S. K.; Caprioli, R. M. Endocrinology 2008, 149, 3274–3278. 6096

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097

Analytical Chemistry

ARTICLE

(22) Franck, J.; Arafah, K.; Elayed, M.; Bonnel, D.; Vergara, D.; Jacquet, A.; Vinatier, D.; Wisztorski, M.; Dy, R.; Fournier, I.; Salzet, M. Mol. Cell. Proteomics 2009, 8, 2023–2033. (23) Horowitz, J. Exp. Eye Res. 2003, 76, 145–153. (24) Sharma, K. K.; Santhoshkumar, P. Biochim. Biophys. Acta 2009, 1790, 1095–1108. (25) Clipston, N. L.; Jai-nhuknan, J.; Cassady, C. J. Int. J. Mass Spectrom. 2003, 222, 363–381. (26) Demeure, K.; Gabelica, V.; Pauw, E. A. D. J. Am. Soc. Mass Spectrom. 2010, 21, 1906–1917. (27) Wisztorski, M.; Croix, D.; Macagno, E.; Fournier, I.; Salzet, M. Dev. Neurobiol. 2008, 68, 845–858. (28) Bruand, J.; Sistla, S.; Meriaux, C.; Dorrestein, P. C.; Gaasterland, T.; Ghassemian, M.; Wisztorski, M.; Fournier, I.; Salzet, M.; Macagno, E.; Bafna, V. J. Proteome Res. 2011, 10, 1915–1928. (29) Sniatynski, M. J.; Rogalski, J. C.; Hoffman, M. D.; Kast, J. Anal. Chem. 2006, 78, 2600–2607. (30) Gao, J.; Tsugita, A.; Takayama, M.; Xu, L. Anal. Chem. 2002, 74, 1449–1457. (31) Demine, R.; Walden, P. Rapid Commun. Mass Spectrom. 2004, 18, 907–913. (32) Wang, J.; Perez-Santiago, J.; Katz, J. E.; Mallick, P.; Bandeira, N. Mol. Cell. Proteomics 2010, DOI: 10.1074/mcp.M000136-MCP201. (33) Grey, A. C.; Schey, K. L. Invest. Opthalmol. Vis. Sci. 2009, 50, 4319–4329. (34) Derham, B. K.; Harding, J. J. Prog. Retinal Eye Res. 1999, 18, 463–509. (35) Brennan, L. A.; Lee, W.; Kantorow, M. PLoS One 2010, 5, e15421. (36) Brennan, L. A.; Lee, W.; Cowell, T.; Giblin, F.; Kantorow, M. Mol. Vision 2009, 15, 985–999. (37) Hejtmancik, J. F.; Wingfield, P. T.; Sergeev, Y. V. Exp. Eye Res. 2004, 79, 377–383. (38) Klerka, L. A.; Altelaar, A. F. M.; Froesch, M.; McDonnell, L. A.; Heeren, R. M. A. Int. J. Mass Spectrom. 2009, 285, 19–25. (39) Tucker, K. R.; Serebryannyy, L. A.; Zimmerman, T. A.; Rubakhin, S. S.; Sweedler, J. V. Chem. Sci. 2011, 2, 785–795. (40) Zimmerman, T. A.; Rubakhin, S. S.; Sweedler, J. V. Methods Mol. Biol. 2010, 656, 465–479.

6097

dx.doi.org/10.1021/ac201221h |Anal. Chem. 2011, 83, 6090–6097