An Assessment of Software Solutions for the Analysis of Mass Spectrometry Based Quantitative Proteomics Data Lukas N. Mueller,*,†,⊥ Mi-Youn Brusniak,‡ D. R. Mani,§ and Ruedi Aebersold†,‡,⊥ Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland, Institute for Systems Biology, Seattle, Washington 98103, The Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, Faculty of Science, University of Zurich, Switzerland, and Competence Center for Systems Physiology and Metabolic Disease, ETH Zurich, Zurich, Switzerland Received November 15, 2007
Over the past decade, a series of experimental strategies for mass spectrometry based quantitative proteomics and corresponding computational methodology for the processing of the resulting data have been generated. We provide here an overview of the main quantification principles and available software solutions for the analysis of data generated by liquid chromatography coupled to mass spectrometry (LC-MS). Three conceptually different methods to perform quantitative LC-MS experiments have been introduced. In the first, quantification is achieved by spectral counting, in the second via differential stable isotopic labeling, and in the third by using the ion current in label-free LC-MS measurements. We discuss here advantages and challenges of each quantification approach and assess available software solutions with respect to their instrument compatibility and processing functionality. This review therefore serves as a starting point for researchers to choose an appropriate software solution for quantitative proteomic experiments based on their experimental and analytical requirements. Keywords: LC-MS • quantitative proteomics • quantification strategies • software review • label-free • differential labeling • spectral counting
1. Introduction Systems biology aims at the understanding of biological processes on the level of the interactions, dynamics, and complexity of multiple molecular elements (reviewed in Hood et al.1). Proteins are very important components of essentially any biological process, and the analysis of their molecular function and regulation is crucial to the understanding of biological systems. The introduction of mass spectrometry (MS) as a robust and sensitive technology for protein analysis had a major impact on the analysis of complex proteome samples. In particular, the measurement of peptides, obtained from protein digestion, by liquid chromatography coupled to online mass spectrometry analysis (LC-MS) has paved the road to study a large number of peptide elements of biological samples in an automated and high-throughput mode.2 Shotgun proteomic approaches where peptide signals detected by the mass spectrometer are automatically analyzed by tandem mass spectrometry (MS/MS) determine the identity of the peptides that constitute the sample in the course of an LC-MS experiment.3 * Corresponding author: E-mail:
[email protected]. Phone number: +41 44 633 36 17. Fax number: +41 44 633 10 51. † Institute of Molecular Systems Biology, ETH Zurich. ⊥ Competence Center for Systems Physiology and Metabolic Disease, ETH Zurich. ‡ Institute for Systems Biology. § The Broad Institute of Harvard and Massachusetts Institute of Technology. University of Zurich. 10.1021/pr700758r CCC: $40.75
2008 American Chemical Society
With the availability of mass spectrometry methods to analyze complex biological samples at a large-scale level arose a necessity for computational tools to analyze and statistically evaluate data generated from LC-MS experiments, thus catalyzing a new research direction in the field of bioinformatics. A sizable number of software tools is now available that support various functionalities such as the preprocessing of MS data (denoising, baseline correction, etc.), evaluation and assignment of MS/MS spectra to peptide sequences, comparison and quantification of multiple LC-MS experiments, as well as the integration of data generated by mass spectrometry with other available biological data resources (microarray, yeast 2-hybrid, etc.), respectively. This review focuses on available computational strategies for the analysis of quantitative proteomics data obtained from LC-MS measurements, which require the sequential application of modules carrying out specific tasks on suitable data sets. As is common to many research fields, software development is a dynamic process and proceeds in conjunction with technical advances of analytical instruments. LC-MS software tools are developed for specific generations or types of mass spectrometers and may produce high-quality results only with data generated by a limited number of MS platforms. These utilized platforms consequently define the theoretical limits of the computational LC-MS analysis (sensitivity and specificity).4 Therefore, it is often not trivial to choose an appropriate program suitable for the quantification of data generated by a specific instrument. In addition to the limited applicability of a program to different MS platforms, practical factors such as Journal of Proteome Research 2008, 7, 51–61 51 Published on Web 01/04/2008
reviews
Mueller et al. fication based on spectral counting of fragment ion spectra assigned to a particular peptide, (ii) quantification via differential stable isotope labeling of peptides or proteins, and (iii) label-free quantification based on the precursor ion signal intensities. In the following sections, the three strategies will be generally outlined, and the corresponding computational methods will be discussed in detail. Finally, we will review existing platforms that integrate available software tools and outline future directions and requirements for the development of LC-MS quantification methodology.
2. Spectral Counting of Peptide Assignments for Semiquantitative Analysis
Figure 1. Overview of quantification approaches in LC-MS based proteomic experiments. Different strategies to quantitatively analyze data generated from LC-MS experiments have been developed over the last two decades. Spectral counting computes abundance values from the number of times a peptide was successfully identified by tandem mass spectrometry (MS/MS) and compares these across experiments (green). In contrast, methods based on differential stable isotope labeling analyze peptides from two samples X and Y in the same LC-MS run where peptide A and its heavy isotope A* are detected by their characteristic mass difference ∆m/z (red). Label-free quantification extracts peptide signals by tracking isotopic patterns along their chromatographic elution profile. The corresponding signal in the LC-MS run 2 is then found by comparing the coordinates m/z, Tr, and z of the peptide (blue).
file compatibility, computational requirements, user friendliness and data visualization, variations in the sample preparation protocols, etc., are critical software aspects that factor into the choice of a data analysis program.5 A major advancement in file compatibility of LC-MS data was made through the development of generic MS file formats for proteomics data,6 which offer the possibility to import, archive, and process LCMS data from most mass spectrometer types into instrumentindependent analysis platforms.7–9 Another important factor is the computation time required by the software to process the data. Proteomics studies often span a large number of LCMS measurements and thus require considerable computational resources to complete data analysis within a reasonable amount of time. While software solutions exist for distributing computationally demanding tasks on multiple processors, it is desirable to work with analysis routines that can be performed on a single personal computer or are accessible as webdeployed applications. Finally, complementary to the automated processing of large data sets in the high-throughput mode, data visualization is important to assess the quality of the individual processing steps carried out by the software. Along with the evolution of MS instruments, different strategies have been undertaken to perform quantitative proteomics experiments via mass spectrometry.10 Rather than reviewing the large number of published reports on the subject, we group the computational tools for better transparency along the three major experimental strategies (Figure 1): (i) quanti52
Journal of Proteome Research • Vol. 7, No. 01, 2008
The concept of semiquantitative analysis was introduced for shotgun proteomics, a method in which the instrument control system of the mass spectrometer autonomously selects a subset of peptide precursor ions detected in a survey scan (MS1 scan) for collision induced fragmentation (CID) following predetermined rules (typically, the 1–5 most intense precursor ions). The quantification strategy is based on the hypothesis that the MS/MS sampling rate of a particular peptide, i.e., the number of times a peptide precursor ion is selected for CID in a large data set, is directly related to the abundance of a peptide represented by its precursor ion in the sample mixture. This approach, also termed spectral counting,11 therefore transforms the frequency by which a peptide is identified into a measure for peptide abundance. Spectral counts of peptides associated with a protein are then averaged into a protein abundance index.11–14 Spectral counting approaches have most frequently been used for the analysis of low to moderate mass resolution LC-MS data and serve therefore as a convenient, fast, and intuitive quantification strategy for the analysis of data from, for example, QSTAR, LCQ Deca, or LTQ ion-trap instruments, specifically, since alternative approaches (see below) are not applicable to these data sets. Software for automated quantification via spectral counting has been developed in the form of modules in LIMS systems,15 programming scripts,14 and stand-alone software packages.12,16 A problem of spectral counting methods is their dependency on the quality of MS/MS peptide identifications, because errors in the assignment of peptides propagate directly into protein abundance indexes. Although methods have been developed to calculate the likelihood of MS/MS peptide assignments to be correct,17 to our knowledge, there is no such analogous strategy available for the estimation of error rates of protein abundance indexes. Allet et al. weighted spectral counts by the search score of the corresponding peptide assignment to punish unreliable identifications and reduce the incorporation of false positive peptide assignments into the calculation of protein abundance indexes.15 While the assembly of spectral counts of peptides into a protein index is unproblematic for peptides whose sequences belong to only one protein, it is difficult to resolve protein belongings of peptides, which map to multiple protein sequences (as, for example, for peptides from conserved protein regions). Statistical methods have been developed to determine the presence of proteins in LC-MS experiments,18 but spectral counting approaches currently lack a robust model to resolve such ambiguities. Another critical point in spectral counting is how spectral counts are computed if only a small number of peptide identifications are available. Reasons for a low number of observations of a specific peptide are manifold and include low abundance of the protein, protein of short sequence length, specific physiochemical properties
Assessment of Software Solutions for Proteomics Data that affect peptide observability, erroneous peptide identification, complexity of the sample, etc. Ishihama et al. addressed this problem by computing protein abundance indexes from spectral counts where the spectral counts were not obtained in an absolute manner but in respect to which peptides are theoretically observable from a given protein.14 However, the challenge still remains for the quantification of low-abundance proteins since the selection of precursor masses for MS/MS analysis in shotgun experiments is skewed toward peptides of high abundance and the identification of low-abundant peptides is very irreproducible between LC-MS experiments.19–21 Corresponding abundance indexes of such low-abundant proteins are therefore unreliable since they are obtained from spectral counts of only a small number of peptide identifications.13 Therefore, protein abundance indexes can be very accurately quantified for high-abundance proteins, which are based on a large number of peptide identifications but become unreliable if only one to three peptides per protein are identified. A modified quantification strategy analyzes peptide abundance values from extracted ion chromatograms (XIC) of identified peptides and partially circumvents the MS/MS undersampling problem.16,22–24 This has the advantage that a peptide does not have to be successfully analyzed by tandem mass spectrometry in every single LC-MS measurement, but it is in principle sufficient to identify a peptide only at least once within all LC-MS experiments. The mass to charge (m/z) and retention time (TR) of the MS/MS precursor ion is then used as a starting coordinate to extract the XIC for this peptide in the remaining measurements. Old et al. introduced the userfriendly computer program Serac16 for the analysis of LCQ Deca data. The program recruits functionalities such as peak finding, extraction of XICs, etc. from the Xcalibur development kit and implements both spectral counting and peptide quantification on the XIC level. Serac provides graphical user interfaces (GUIs) for data exploration and the optimization of processing parameters. However, due to changes in the underlying Xcalibur package, there is currently no stable software release of Serac available. The C++ program Quoil was developed for the quantification of high-resolution FT-LTQ data and includes additional downstream processing routines such as intensity normalization across runs and statistical analysis.22 A crucial factor for finding the correct XIC across all samples for a given peptide identification is the reproducibility of the chromatography (LC). Fluctuations in the LC systems lead to retention time shifts of peptides so that the TR value of the XIC coordinate alters from run to run. Higgs et al. developed a set of LC-MS processing functionalities implemented in Perl and R scripts running on Sun Grid Engine Environment, which correct chromatographic drifts between LC-MS runs.23 The software package extracts peptide XICs from .RAW files of LCQ MS data and is available through the commercial service facility Monarchlifesciences. In conclusion, quantification approaches which are fully (spectral counting) or partially (XICs) based on peptide assignments represent valuable strategies for the analysis of low to moderate mass resolution LC-MS data. However, the available methodology is still in development, and it is believed that these approaches will greatly benefit from statistical models for the estimation of error rates in the computed results. In addition, the quantification is challenged by the stochastic nature of the MS/MS sampling process, which is particularly apparent for the analysis of a low-abundant peptide whose MS/
reviews
Figure 2. Quantification by differential labeling. The quantification principles of isobaric and isotopic labeling are schematically illustrated. Isobaric labeling generates in the MS/MS spectra different reporter ions that are used to calculate peptide abundance values between different samples. Isotopic approaches differentially label peptides or proteins from two samples (green/ blue) to produce isotopic pairs of characteristic mass shifts. Common to both is the downstream processing where peptide ratios are computed, assembled into protein ratios, and then statistically assessed to evaluate the significance of detected fold changes.
MS analysis is mostly overshadowed by peptides of higher abundance in complex mixtures.
3. Relative Quantification Based on Differential Stable Isotope Labeling In the recent years, differential stable isotope labeling in combination with mass spectrometry has become an extremely popular method for quantitative proteomics (for a review see Lill,25 Aebersold and Mann,26 and Yan et al.27). Besides absolute quantification of peptide concentrations where peptide elements of a sample are compared against their spiked in isotopic labeled synthetic analogue,28 differential isotopic labeling allows relative quantification of peptide intensity values between multiple biological samples within a single LC-MS measurement. A variety of MS compatible labeling techniques are now available, which can be summarized by isobaric and isotopic labeling strategies.26 Figure 2 illustrates the workflow of these two approaches. Isobaric labeling reagents, exemplified by the iTRAQ29 label, are peptide tags, which produce in the MS/MS spectrum specific fragment ions that are used for quantification. Differentially labeled samples are combined and concurrently analyzed by shotgun LC-MS, and peptide abundance values are compared via the reported fragments in the MS/MS spectra. In contrast, isotopic labeling methods, exemplified by the ICAT30 reagent and the SILAC31 protein labeling, generate pairs of peptides with characteristic mass differences introduced by the applied label. Typically, the isotopic forms of an MS/MS identified peptide are detected by its mass shift and identical elution profiles and are used to compute a peptide ratio between the “heavy” labeled peptide from stimulated cells to the “light” peptide version of normally grown cells (Figure 2, blue and green coloring). Common to both labeling strategies is that after the detection of reported fragments (isobaric) or isotopic pairs (isotopic) peptide ratios Journal of Proteome Research • Vol. 7, No. 01, 2008 53
reviews between the different labeling channels are computed and integrated into protein ratios. The obtained protein ratios are then statistically evaluated to quantify the significance of their fold changes. Stable differential labeling offers the unique possibility to measure in parallel multiple specimens. This excludes sources of noise introduced by separate measurements due to quantitative differences between LC-MS runs and allows the highly accurate quantification of multiple samples within a single measurment.32,33 3.1. Isobaric Labeling of Peptides. The novel 8-plex iTRAQ labeling allows the simultaneous quantification of eight biological samples.29 The isobaric reagent reacts with primary amino groups and produces in the MS/MS fragmentation spectrum eight different unique reporter groups, one per reagent flavor, at 113, 114, 115, 116, 117, 118, 119, and 121 m/z. iTRAQ labeling does not increase the sample complexity because the reagent is based, and relies, compared to isotopic labeling, on a fully MS/MS-dependent workflow. Therefore, only peptides are quantified that were subjected to CID fragmentation and could be successfully assigned to a peptide sequence. Multi-Q,34 iTracker,35 and the “iTRAQ reporter ion counter”36 are free available software programs, which import preprocessed MS/MS data from Sequest3 or Mascot37 in .dta or .mfg format. While the available information is limited for some of these tools, the console-based Perl program iTracker was developed for Windows platforms and generates spreadsheets from the calculated results. Libra9 is another freely available program for iTRAQ quantification and is integrated as a module in the Trans Proteomic Pipeline (TPP),9 which is an open source platform for the visual assessment and statistical evaluation of MS/MS peptide assignments. For more sophisticated software solutions, the commercial programs ProQuant and ProteinPilot distributed by Applied Biosystems offer a variety of processing functionalities ranging from the assignment of MS/ MS spectra to the quantitative analysis and visualization of multiplex iTRAQ data. 3.2. Isotopic Labeling Approaches. There are three general categories of isotopic labeling techniques for relative quantification experiments in proteomics: chemical tagging of peptides and proteins, introduction of stable isotopes into peptides during enzymatic digestion, and incorporation of labels into living cells through their metabolic cycle. The choice of the labeling approach is highly dependent on the biological application. The following sections will first summarize the main principle of each isotopic strategy and then given an overview of available software tools and to which labeling strategy they are compatible. 3.2.1. Chemical Tagging of Peptides and Proteins. The introduction of stable isotope signatures into peptides via chemical tagging of functional groups of amino acids, exemplified by the Isotope Coded Affinity Tag (ICAT),30 pioneered the field of mass spectrometry based quantitative proteomics. In the ICAT technique, two biological samples are labeled on the peptide or protein level using chemically identical but isotopically different reagents that specifically target cysteine groups. The labeled peptides differ in their molecular weight by 8 mass units, and in newer versions by 9 mass units. After the labeling step, samples are combined and subjected to mass spectrometric analysis. More recently, a number of variants of this concept have been developed in which sets of reagents differ in specificity, structure, mass difference, and number of isotopic forms.38–41 All these labeling techniques have in 54
Journal of Proteome Research • Vol. 7, No. 01, 2008
Mueller et al. common that they generate a complex sample mixture, which consists of pairs of chemically identical peptides of different mass. The elements of these pairs are then detected in the precursor ion mass spectrum, and the signal intensity is used to compute the relative abundance of the respective analyte in either sample. 3.2.2. Incorporation of Stable Isotopes into the Amino Acid Sequence. Besides the chemical tagging of peptides or proteins, the replacement of atomic elements in the peptide primary structure by incorporating isotopic analogues via enzymatic reaction is a popular method in proteomic studies. 18 O labeling is based on the incorporation of heavy oxygen into the C-termini of peptides during tryptic digestion of proteins.42,43 The major advantage of 18O labeling is that the method is not limited to a specific subpopulation of peptides as is the case for other labeling strategies where often the analysis of a subproteome is targeted (i.e., peptides containing a specific amino acid or post-translational modification). However, 18O incorporation is often not specific to only the C-termini so that also other residues in the peptide sequence are labeled, which may complicate the data analysis. 3.2.3. SILAC Quantification. In the recent years, “stable isotope labeling with amino acids in cell culture” (SILAC)31 has enjoyed great popularity (see review by Ong et al.10). Specifically, cell cultures are grown in media containing either light 12 C or heavy 13C labeled arginine and lysine to metabolically incorporate the modified amino acids into proteins through the metabolic cycle. The generated isotopic peptide pairs are then detected by mass shifts of multitudes of 6 mass units. Since the label is added at a very early stage of the experiment, this technology circumvents the introduction of additional error sources through extra experimental sample processing steps. However, SILAC labeling is largely limited to biological material that can be grown in culture and thus is not generally applicable to tissues, body fluids, or clinical applications, reports of SILAC labeling in multicellular organisms not withstanding.44 Recently, metabolic conversion of the stable isotope labeled peptide has also been reported, resulting in the added label in unexpected amino acids.45 3.2.4. Computational Tools for the Analysis of Data from Isotopic Labeling Experiments. The computational principles in the quantification of isotopic labeling experiments are identical for the different strategies where the major difference between the utilized labels lays in the mass shift of generate isotopic pairs of peptides. Specifically, the main computational workflow in the quantification process comprises the extraction of isotopic peptide pairs by their characteristic mass shift, mostly based on successful MS2 peptide assignments, the computation of ratios between ion chromatograms of the extracted isotopic pairs and the statistical evaluation of the calculated results (see also Figure 2). Accordingly, most computational methodologies implement these processing functions in a generic fashion and allow customizing the isotopic mass shift in respect to the applied label to guarantee compatibility of their workflow to other labeling techniques. Historically, however, software tools developed for the quantification of peptides generated by a specific isotope labeling method were predominantly used in combination with their original labeling technique. Along with the pioneer ICAT labeling, the XPRESS46 software was developed for the analysis of ICAT data and is available through the commercial software package BioWorks (ThermoFinnigan). On the basis of MS/MS peptide identifications,
reviews
Assessment of Software Solutions for Proteomics Data a
Table 1. Summary of Software Tools for Isobaric and Isotopic Labeling in Proteomics Experiments software
operating systems
tested data types
input format
label
compatible labels
Isobaric Labeling iTRAQ specific
availability
ref
http://ms.iis.sinica.edu.tw/Multi-Q
34
iTRAQ specific
http://www.cranfield.ac.uk/health/researchareas/ bioinformatics/page3201.jsp
35
mzXML
iTRAQ specific
http://tools.proteomecenter.org/wiki/
9
QStar, Qtrap
raw file
iTRAQ specific
http://www.appliedbiosystems.com
-
ProteinPilot Windows
QStar, Qtrap, Maldi-Tof/Tof
raw file
iTRAQ specific
http://www.appliedbiosystems.com
-
XPRESS
Linux, OSX, Windows
LTQ, OrbiTrap, Qtof, FT-LTQ
mzXML
Isotopic Labeling ICAT ICPL, SILAC
http://tools.proteomecenter.org/wiki/
46
ASAPRatio
Linux, OSX, Windows
LTQ, OrbiTrap, Qtof, FT-LTQ
mzXML
2
ICAT, SILAC, http://tools.proteomecenter.org/wiki/ ICPL
47
PeakPicker
Windows
Maldi-Tof/Tof
raw file
ICPL
specific
http://www.appliedbiosystems.com
-
WARP-LC
Windows
Maldi-Tof/Tof, Qtof
raw file
ICPL
generic
http://www.bdal.com/
-
ZoomQuant Linux, OSX, Windows
LTQ
raw file
18
O
specific
http://proteomics.mcw.edu/zoomquant/
53
STEM
Windows
Mascot results
.pkl file
18
O
generic
http://www.sci.metro-u.ac.jp/proteomicslab/ STEMDLP-0.html
54
MSQuant
Windows
QStar, Qtof, FT-LTQ
raw file
SILAC
ICAT, ICPL
http://msquant.sourceforge.net/
55
Multi-Q
Windows, Mascot/Sequest mzXML Web version results
iTracker
Linux, OSX, Windows
via TPP9
.mgf, .dat
Libra
Linux, OSX, Windows
via TPP9
ProQuant
Windows
H
a The table summarizes software programs for the quantification of differential labeling experiments. Software compatibility to other labeling techniques is shown where a program is either limited to a certain label (specific) or applicable to different labeling strategies (generic). For some tools, the column “compatible labels” summarizes for which isotopic labels the program has already been tested.
XPRESS reconstructs XICs for both peptides of an isotopic pair, computes peptide ratios, and offers visual representation of the extracted XICs for manual data inspection. Compared to XPRESS, ASAPRatio47 represents a more advanced and highly automated quantification program that performs tasks similar to XPRESS but also includes additional downstream statistical methods such as baseline correction, data normalization, and significance assessment of computed protein ratios. Both programs have been tested on LCQ, LTQ, and FT-LTQ data and allow defining the mass shift of the applied isotopic label. ASAPRatio as well as XPRESS are available through the Trans Proteomic Pipeline which can be installed on OSX, Linux, and Microsoft Windows platforms. With the availability of other labeling technologies,31,38,40–42 open source software tools have been adopted or newly developed to automatically extract and quantify peptide ratios from isotopic labeling and offer functionalities similar to those listed above.19,46–50 For example, a collection of methods have been published for the quantification of 18O labeling experiments,49,51,52 but only ZoomQuant53 and STEM54 represent free available and user-friendly software tools. While ZoomQuant is specific for 18O labeling, the Microsoft Windows program STEM quantifies Qtof data and was designed as a generic labeling quantification tool where the mass differences of the utilized label can be defined by the experimenter. Also, SILAC experiments are quantified in a fashion similar to the aforementioned. The Microsoft Windows system specific MSQuant software is the proteotypic SILAC quantification program and is generic to multiple isotopic labeling techniques.55 MSQuant
analysis starts by importing MS/MS peptide identifications retrieved from Mascot37 MS/MS searches and outputs computed results in spreadsheet files. In addition, there are a series of other programs applicable to the analysis of SILAC experiments.47,50,56 Finally, software solutions such as the Bruker WARP-LC program are commercially available and provide a generic platform for the quantification and visual inspection of various isotopic labeling techniques. In summary, isotopic labeling strategies enable the highly accurate quantification of LC-MS experiments since analysis is performed on single LC-MS runs where peptide pairs can be very accurately detected by distinct mass shifts characteristic to the utilized label.33,57 Table 1 provides an overview of the discussed software solutions and summarizes the available strategies and the tools developed to analyze the corresponding data. The described labeling strategies have in common that the combination of multiple, differentially labeled samples increases drastically the complexity of the peptide elements, thus crowding the mass spectra. This is particularly problematic for the identification of peptide–ions by MS/MS where only a limited number of peptide signals can be subject to CID fragmentation in the course of an LC-MS experiment. The selection of precursor ions is biased toward high-intensity peptide signals and leads to a prominent CID undersampling of low-abundance peptides. This limits the range of isobaric and isotopic labeling techniques since the computation of peptide/protein ratios is based exclusively on peptides, which were successfully identified by MS/MS. Furthermore, labeling strategies require additional sample processing steps in the Journal of Proteome Research • Vol. 7, No. 01, 2008 55
reviews
Mueller et al.
4. Label-Free Quantification of Multidimensional LC-MS Data
peptide identifications in parallel to the high mass precision measurement of peptides on the MS1 level. This raises the computational challenge for the processing and integration of these two sources of information and has led to the development of novel promising quantification strategies62–64 (discussed in section 4.3).
With the evolution of mass spectrometers toward high mass precision instruments, label-free quantification of LC-MS data has become a very appealing approach for the quantitative analysis of biological samples. Typically, peptide signals are detected at the MS1 level and distinguished from chemical noise through their characteristic isotopic pattern. These patterns are then tracked across the retention time dimension and used to reconstruct a chromatographic elution profile of the monoisotopic peptide mass. The total ion current of the peptide signal is then integrated and used as a quantitative measurement of the original peptide concentration (Figure 1, see label-free quantification). In principle, every peptide signal within the sensitivity range of the MS analyzer can be extracted and incorporated into the quantification process independent of MS/MS acquisition (reviewed in Listgarten and Emili58). This leads to an increased dynamic range of the peptide detection and largely reduces the undersampling problem common to the previously described MS/MS-based approaches. Label-free strategies were in most cases applied to data acquired on mass spectrometers equipped with the new generation of time-offlight (Tof), Fourier transform-ion cyclotron resonance (FTLTQ), or OrbiTrap mass analyzers. Measurements on these MS platforms reach very high resolution power and mass precision in the low parts per million mass unit range. This facilitates the extraction of peptide signals for specific analytes on the MS1 level and thus uncouples the quantification from the identification process (for a review, see Domon and Aebersold4). In contrast to differential labeling, every biological specimen needs to be measured separately in a label-free experiment (see Figure 1). The extracted peptide signals are then mapped across few or multiple LC-MS measurements using their coordinates on the mass to charge and retention time dimension.59 The efficiency of the peptide tracking depends on the available mass resolution of the utilized mass spectrometer. Data from high mass precision instruments greatly facilitate this process and increase the certainty of matching correct peptide signals across runs.19 In addition to the m/z dimension, the TR coordinate is used to map corresponding peptides between runs. Therefore, the consistency of the retention time values over different LCMS runs is a crucial factor and has led to the development of various alignment methods to correct chromatographic fluctuations (reviewed in Hilario et al.58). Finally, as suggested for microarray data,60 sophisticated normalization methods are important to removing systematic artifacts in the peptide intensity values between LC-MS measurements. Applying a label-free quantification strategy in combination with downstream intensity normalization, Callister et al.61 and others demonstrated that the intensity of extracted peptide signals scales linearly with their molecular concentration across a dynamic range of 3 to 4 orders of magnitude. High mass accuracy MS instruments in combination with sophisticated computational methods therefore offer a platform for the automated label-free quantification of complex biological mixtures without the requirement of MS/MS information. Furthermore, newer hybrid mass spectrometers like the LTQFT and LTQ OrbiTrap offer the possibility to acquire MS/MS
4.1. Open Source Software for Label-Free Quantification. The software suite SpecArray20 is one of the pioneer open source programs for the label-free quantification of LC-MS measurements. It was developed for ESI-Qtof data and comprises functionalities for peptide feature extraction, data visualization,65 and retention time normalization. SpecArray is a shell-based program and is available for Linux operating systems. Importantly, it does not contain functionality to include or process MS/MS information and requires considerable computational resources for multiprocessor computing. The software msInspect57 is a user-friendly and operating system independent Java program. It imports pepXML9 formatted MS/MS data into the analysis workflow and contains visualization functions as well as postprocessing routines for intensity normalization, isotopic labeling, and statistical analysis. The program was initially developed for high-resolution ESITof data but also works well with FT-LTQ, OrbiTrap, and Qtof measurements. While MSight66 is not distributed as an open source software package, the program can be downloaded free of charge for Microsoft Windows systems. MSight provides sophisticated visualization techniques for the exploration of unprocessed LC-MS runs, performs peak extraction and multiple alignment, and implements intensity normalization functions. MSight is compatible to raw data files from various MS instrument types (Bruker, Waters, Thermofinnigan, Applied Biosystems) and allows one to import MS/MS data directly from mzXML files or Phenyx search results.67 A variety of MS and MS/MS processing routines including file parsing, assignment of MS2 spectra, peak extraction, and multiple alignment for label-free quantification are integrated in the OpenMS Proteomics Pipeline (TOPP).68 Notable is that OpenMS is designed as an open source C++ library to provide software developers with a collection of functionalities for the processing of proteomics data (file parsing, data processing, etc.). The proteomics module PEPPeR64 of the gene pattern69 software suite represents another open source program to extract and align peptide features across LC-MS measurements. Gene pattern runs as a Java server application on OSX, Linux, and Microsoft Windows platforms, allows customizing analysis workflows, and provides besides the proteomics module functionalities for gene expression and single nucleotide polymorphism analysis. In addition, a series of other open source software tools are available to implement basic functionality (peak extraction, LC-MS alignment, etc.) for label-free quantification of LC-MS data.70–74 Even though many of these software tools provide visualization functions to examine the acquired and processed data, a challenge in high-throughput LC-MS analysis is to determine how to detect and handle LC-MS runs of poor quality (low signal response, imperfect chromatography, etc.) in large data sets. Without further quality assessment, such runs are automatically integrated into the computational analysis and can disrupt the multiple alignment, especially if they are incorporated at an early stage of the process. Common to most alignment routines is that a reference LC-MS run is chosen and retention time scales of the remaining runs are sequentially normalized with respect to this reference.20,57,68,70–73 The novel
experimental workflow, and the analysis of multiple samples becomes complicated if the sample size is bigger than 2 (8 in the case of iTRAQ).
56
Journal of Proteome Research • Vol. 7, No. 01, 2008
Assessment of Software Solutions for Proteomics Data
reviews
19
C++ program SuperHirn implements an alternative alignment strategy where similarities between all LC-MS runs are computed based on common peptides of an LC-MS pair and their intensity correlation.75 The LC-MS similarities can then be used 2-fold to detect outliers displaying significantly lower similarity values and to construct a treelike alignment structure based on the computed similarities. This structure guides the multiple alignment process and avoids the early incorporation of low-quality runs into the alignment process. SuperHirn is compatible with mzXML formatted Qtof, FT-LTQ, and OrbiTrap data and imports MS2 information in pepXML format. The program is available for Linux and OSX platforms and comprises additionally a set of downstream functionalities for intensity normalization, isotopic labeling analysis, and peptide/ protein profiling.62 4.2. Commercial LC-MS Analysis Frameworks. In addition to freely available open source projects, analytical software for similar purposes is also developed and distributed under commercial license. MassLynx was specifically developed for MS instruments of Waters and contains functionalities from sample tracking to the tracing of metabolic compounds in LCMS data. The integrated QuanLynx module extracts peptide features in batch mode but does not provide to our knowledge any multiple LC-MS alignment routine. LC-MS data from ThermoFinnigan instruments (LTQ, FT-LTQ, OrbiTrap) can be processed by the software package SIEVE. SIEVE performs extraction and alignment of peptide signals across LC-MS measurements and provides GUI functionality to assess the alignment of peptides and statistical tools to perform discriminative analysis of peptide abundance changes. Rossetta Biosoftware sells the LC-MS analysis framework Elucidator, which is based on methodology for the handling and processing of microarray data. The core of Elucidator builds on a robust data management system and offers statistical tools for data exploration supported by extensive GUI functionality. The implemented pipeline contains a complete MS/MS processing routine (MS/MS search engine, TPP, etc.), feature extraction, and LC-MS alignment modules. Clearly, Elucidator represents a professional program and with it comes a certain price tag. The software Expressionist distributed by GenaData is a platform for biomarker discovery. In particular, the Expressionist proteomics module unifies peak extraction and multiple LC-MS alignment in combination with interactive data visualization. A special feature of Expressionist is the integration of different omics data, which is essential for current approaches in the field of systems biology. Common to both commercial programs is that they provide tools to assemble processing routines into customized automated workflows for high-throughput data analysis. Compared to open source software, commercial products however are closed packages without access to the source code, implementing fixed workflows and proprietary data formats. Once imported into the commercial software, it can be problematic to access the processed data by an external analysis routine. Therefore, such extensive software solutions might rather be suitable for scientific groups, which profit most from a prepackaged data processing solution and do not develop inhouse processing algorithms. 4.3. Postprocessing of Extracted Peptide Signals. In the label-free experiments, even when performed on a modern hybrid mass spectrometer, typically only a fraction of the detected and quantified peptide features are identified by an MS/MS peptide assignment.19 Therefore, it is desirable to assess
Figure 3. Annotation of peptide signals by the AMT approach. An aligned peak list (red map) is generated by peak detection and alignment of multiple LC-MS runs (gray maps). AMT tags of peptide signals (blue flags) with interesting intensity profiles are then integrated into a list of AMT tags for annotation. Identities of AMT tags are derived from a map of identified peptides (green flags) by comparing AMT coordinates (i.e., mass and retention time). Illustrated here are identification maps from LC-MS runs where peptides are identified by targeted MS/MS (inclusion list run) or from a large repository of collected MS/ MS peptide identifications (MS/MS repository). The procedure is performed in an iterative manner, where successfully annotated AMT tags are fed back to the aligned peptide list after every cycle.
the identity of the remaining peptide signals, particularly those that show interesting quantitative patterns. Figure 3 illustrates schematically the general workflow for the annotation of peptide features. Peptide signals are extracted on the MS1 level (Figure 3, gray dots) from the acquired LC-MS runs and then tracked across multiple measurements to generate a list of aligned peptides (Figure 3, red dots), which serves as a framework for further annotation efforts.62–64,70,76,77 Each peptide signal is defined by its accurate mass and time tag (AMT)59 on the mass and retention time dimension (Figure 3, blue flag), and a subset of these peptide elements are selected according to a specific property of the peptide signal (for example, here the intensity profile62). These AMT tags are then subjected to postannotation where the identity of an AMT (i.e., protein name and peptide sequence) is derived by comparing its accurate mass and retention time coordinates to a map of AMT tags from successfully identified peptide signals (Figure 3, green flags). In a way, the workflow resembles the MALDI approach, but instead of spots on a plate, the coordinates consist of peptide landmarks on a two-dimensional peptide map. Where in the MALDI approach CID spectra are acquired by reanalyzing a selected peptide of a certain spot, AMT tags from peptide identifications are obtained from either targeted LC-MS runs (Figure 3, inclusion list run) or from a large repository of MS/ MS peptide identifications compiled from multiple LC-MS experiments7,8,78 (Figure 3, MS/MS repository). The inclusion list approach is based on AMT tags where the identification of a peptide is specifically targeted by its accurate mass and time coordinate in a subsequent LC-MS measurement.62,77 Rinner and Mueller et al. applied inclusion lists as well as MS/MS information from different MS instrument types to identify peptide signals displaying characteristic intensity Journal of Proteome Research • Vol. 7, No. 01, 2008 57
reviews
Mueller et al. a
Table 2. Overview of LC-MS Quantification Programs for Label-Free Quantification software
operating systems
SpecArray
Linux
MsInspect MSight
Linux, OSX, Windows Windows
TOPP
Linux, OSX
PEPPeR SuperHirn
Linux, OSX, Windows Linux, OSX
QuanLynx SIEVE
Windows Windows
Elucidator
Windows
Expressionist
Windows
tested data types
input format
MS/MS
type/language
availability
ref
http://tools.proteomecenter.org/ wiki/ http://proteomics.fhcrc.org/CPL/ home.html http://www.expasy.org/MSight/
20
http://open-ms.sourceforge.net
68
http://www.broad.mit.edu/cancer/ software/genepattern/ http://tools.proteomecenter.org/wiki/
64
http://www.waters.com/ http://www.thermo.com/
-
commercial
http://www.rosettabio.com/
-
commercial
http://www.genedata.com
-
FT-LTQ, OrbiTrap, Qtof ESI-Tof, OrbiTrap, FT-LTQ, Qtof LTQ, FT-LTQ, Qtof etc. LTQ, ESI-Tof
mzXML
no
mzXML
yes
mzXML, raw
yes
mzXML
yes
FT-LTQ, OrbiTrap
mzXML
yes
FT-LTQ, OrbiTrap, Qtof MS data from Waters MS data from ThermoFinnigan FT-LTQ, OrbiTrap, Qtof Thermo/Bruker instruments
mzXML
yes
raw raw
yes yes
open source (C) open source (Java) free of charge (C++) open source (C++) open source (Perl, R) open source (C++) commercial commercial
raw
yes
mzXML
yes
57,63
66
19
a Software features such as program portability and availability, data compatibility, and integration of MS/MS information (MS/MS) are summarized. MS/MS: if the software provides functionality for the integration of MS/MS information. Raw: software imports LC-MS data from instrument raw files.
profiles.62 Specifically, the application demonstrated that additional MS/MS information from inclusion list identified more peptide elements of proteins and allowed us to clearly strengthen the confidence for the presence of specific proteins in the sample. Complementary to the inclusion lists approach, the alignment processes of Q-MEND,70 SuperHirn, and PEPPeR automatically utilize MS/MS information acquired in one LCMS run to annotate aligned peptides in the other runs. Therefore, MS/MS information acquired in an LC-MS run is automatically projected to all other runs and increases significantly the fraction of identified MS1 peptide signals in the aligned peptide list. For a more sophisticated AMT workflow, PEPPeR includes specially developed routines to statistically evaluate the annotation of MS1 signals using MS/MS information from inclusion list runs. The application of PEPPeR was demonstrated in a study of complex biological mixtures where differentially regulated MS1 peptide signals were assigned with MS/MS identifications and then validated using inclusion lists. An additional AMT approach is implemented in msInspect, where also external MS/MS resources, for example, from MS/ MS data management systems or other preprocessed MS/MS data are integrated into the assignment of peptide signals.63 Even though this strategy is very powerful because the list of aligned peptides can be annotated from a very large pool of already acquired MS/MS information, it is obviously more challenging due to inherent experimental and instrumental variations within the data and requires robust mass calibration and retention time normalization routines. Label-free quantification represents a key technology for the comprehensive analysis of complex biological samples due to its ability to exhaustively sample peptide signals detected directly on the MS1 level. Indispensable for this approach, however, is sophisticated computational methodology to process the acquired data. Table 2 summarizes all described software solutions and provides an overview with respect to their availability, file compatibility, and platform portability. Furthermore, we have introduced in this section novel concepts, which have emerged from label-free quantification approaches, and it is believed that based on these strategies a 58
Journal of Proteome Research • Vol. 7, No. 01, 2008
series of newly developed or further extended techniques will follow up on addressing challenges such as the MS/MSindependent identification of peptides, the assessment of the post-translational modification state of peptides, or the comparison of patterns of detected MS1 peptide signals across multiple measurements.
5. Perspectives for the Development of LC-MS Software An important factor for LC-MS software development is the crosstalk between individual software suites. As mentioned throughout this review, every software solution is compatible with a limited number of MS platforms and comprises a characteristic set of functionalities for the processing of LCMS data. The possibility to unify these functionalities in an intersoftware workflow would be very powerful. The main hurdle in this process is the availability of a common data format, which could serve as a communication platform between software tools. This is a problem analogous to biological raw data types where generic file standards have been developed for the exchange and storage of raw data.6 An initiative toward the standardization of preprocessed LC-MS data is taken by the Annotated Putative Peptide Markup Language (APML).79 APML stores peak picked and aligned LCMS data in a generic XML format, and already, a set of software programs provide APML compatibility through data import and export functions.19,20 This allows, for example, extraction of peptide features from a tool specially developed for a distinct MS instrument type and then the import of the preprocessed data via APML into the multiple LC-MS alignment routine of another program. While APML addresses the communicative challenge between software tools, the cross-linking of these routines into one workflow can be cumbersome due to their technical differences (different program languages, program execution, etc.). The novel Corra79 framework covers these hurdles by integrating a set of published LC-MS quantification tools19,20 as well as backend statistical analysis routines for the processing of LC-MS measurements obtained from different instrument types. This allows us not only to process LC-MS data in
reviews
Assessment of Software Solutions for Proteomics Data an intersoftware fashion but also to unify the information from biological experiments performed on different LC-MS platforms into one workflow. While databases such as PeptideAtlas8,80,81 and PhosphoPep78 collect MS/MS identified peptide sequences and phosphorylation sites, respectively, future perspectives for MS1 processing frameworks comprise the assembly of all MS1 peptide signals from various LC-MS measurements to generate a comprehensive peptide map for specific biological specimen (organisms, tissue types, cell states, etc.).
(8) (9) (10) (11)
6. Conclusions Throughout this text, we have discussed the application of computational methodologies for the quantitative analysis of LC-MS data. The summary of all introduced software tools can be found in Tables 1 and 2. The focus of this review was presented from a user perspective to provide researchers an entry point into the current LC-MS quantification principle and the extensive collection of available software tools. Although available software packages can be prioritized from a purely technological point of view (e.g., OS requirement, scalability, etc.), we believe that the applicability of LC-MS quantification programs regarding intended usability is an important factor. Therefore, this review provides computational biologists with a guide for the application of existing software solutions to their data set. This should also help to render the development of computational methodology and the design of software tools for the analysis of LC-MS data generic and useful for the proteomic research community.
(12) (13) (14)
(15)
(16)
Acknowledgment.
Markus Müller and Ruth Hüttenhain from the Institute of Molecular Systems Biology (Zurich, Switzerland) are acknowledged for proofreading of this article. Financial support was provided by ETH Zurich, Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, under contract No. N01-HV-28179, and by the Swiss National Science Foundation grant No. 3100A0-107679.
(17)
(18) (19)
References (1) Hood, L.; Heath, J. R.; Phelps, M. E.; Lin, B. Systems biology and new technologies enable predictive and preventative medicine. Science 2004, 306 (5696), 640–3. (2) Hunt, D. F.; Henderson, R. A.; Shabanowitz, J.; Sakaguchi, K.; Michel, H.; Sevilir, N.; Cox, A. L.; Appella, E.; Engelhard, V. H. Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry. Science 1992, 255 (5049), 1261– 3. (3) Eng, J.; McCormack, A.; Yates, J. R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976– 989. (4) Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. Science 2006, 312 (5771), 212–7. (5) Codrea, M. C.; Jimenez, C. R.; Heringa, J.; Marchiori, E. Tools for computational processing of LC-MS datasets: a user’s perspective. Comput. Methods Programs Biomed. 2007, 86 (3), 281–90. (6) Pedrioli, P. G.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 2004, 22 (11), 1459–66. (7) Rauch, A.; Bellew, M.; Eng, J.; Fitzgibbon, M.; Holzman, T.; Hussey, P.; Igra, M.; Maclean, B.; Lin, C. W.; Detter, A.; Fang, R.; Faca, V.; Gafken, P.; Zhang, H.; Whiteaker, J.; States, D.; Hanash, S.; Paulovich, A.; McIntosh, M. W. Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for
(20)
(21) (22)
(23)
(24) (25) (26) (27) (28)
(29)
evaluating and publishing proteomic data and high throughput biological experiments. J. Proteome Res. 2006, 5 (1), 112–21. Deutsch, E. W.; Eng, J. K.; Zhang, H.; King, N. L.; Nesvizhskii, A. I.; Lin, B.; Lee, H.; Yi, E. C.; Ossola, R.; Aebersold, R. Human Plasma PeptideAtlas. Proteomics 2005, 5 (13), 3497–500. Keller, A.; Eng, J.; Zhang, N.; Li, X. J.; Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 2005, 1, 2005–0017. Ong, S. E.; Foster, L. J.; Mann, M. Mass spectrometric-based approaches in quantitative proteomics. Methods 2003, 29 (2), 124– 30. Liu, H.; Sadygov, R. G.; Yates, J. R., III. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004, 76 (14), 4193–201. Gao, J.; Opiteck, G. J.; Friedrichs, M. S.; Dongre, A. R.; Hefta, S. A. Changes in the protein expression of yeast as a function of carbon source. J. Proteome Res. 2003, 2 (6), 643–9. Colinge, J.; Chiappe, D.; Lagache, S.; Moniatte, M.; Bougueleret, L. Differential proteomics via probabilistic peptide identification scores. Anal. Chem. 2005, 77 (2), 596–606. Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 2005, 4 (9), 1265–72. Allet, N.; Barrillat, N.; Baussant, T.; Boiteau, C.; Botti, P.; Bougueleret, L.; Budin, N.; Canet, D.; Carraud, S.; Chiappe, D.; Christmann, N.; Colinge, J.; Cusin, I.; Dafflon, N.; Depresle, B.; Fasso, I.; Frauchiger, P.; Gaertner, H.; Gleizes, A.; Gonzalez-Couto, E.; Jeandenans, C.; Karmime, A.; Kowall, T.; Lagache, S.; Mahe, E.; Masselot, A.; Mattou, H.; Moniatte, M.; Niknejad, A.; Paolini, M.; Perret, F.; Pinaud, N.; Ranno, F.; Raimondi, S.; Reffas, S.; Regamey, P. O.; Rey, P. A.; Rodriguez-Tome, P.; Rose, K.; Rossellat, G.; Saudrais, C.; Schmidt, C.; Villain, M.; Zwahlen, C. In vitro and in silico processes to identify differentially expressed proteins. Proteomics 2004, 4 (8), 2333–51. Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol. Cell. Proteomics 2005, 4 (10), 1487–502. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74 (20), 5383–92. Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75 (17), 4646–58. Mueller, L. N.; Rinner, O.; Schmidt, A.; Letarte, S.; Bodenmiller, B.; Brusniak, M. Y.; Vitek, O.; Aebersold, R.; Muller, M. SuperHirn - a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 2007, 7 (19), 3470–80. Li, X. J.; Yi, E. C.; Kemp, C. J.; Zhang, H.; Aebersold, R. A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry. Mol. Cell. Proteomics 2005, 4 (9), 1328–40. Kuster, B.; Schirle, M.; Mallick, P.; Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell. Biol. 2005, 6 (7), 577–83. Wang, G.; Wu, W. W.; Zeng, W.; Chou, C. L.; Shen, R. F. Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: Reproducibility, linearity, and application with complex proteomes. J. Proteome Res. 2006, 5 (5), 1214–23. Higgs, R. E.; Knierman, M. D.; Gelfanova, V.; Butler, J. P.; Hale, J. E. Comprehensive label-free method for the relative quantification of proteins from biological samples. J. Proteome Res. 2005, 4 (4), 1442–50. Cutillas, P. R.; Vanhaesebroeck, B. Quantitative profile of five murine core proteomes using label-free functional proteomics. Mol. Cell. Proteomics 2007. Lill, J. Proteomic tools for quantitation by mass spectrometry. Mass Spectrom. Rev. 2003, 22 (3), 182–94. Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422 (6928), 198–207. Yan, W.; Chen, S. Mass spectrometry-based quantitative proteomic profiling. Brief Funct. Genomic Proteomics 2005, 4 (1), 27–38. Kusmierz, J. J.; Sumrada, R.; Desiderio, D. M. Fast atom bombardment mass spectrometric quantitative analysis of methionineenkephalin in human pituitary tissues. Anal. Chem. 1990, 62 (21), 2395–400. Choe, L.; D’Ascenzo, M.; Relkin, N. R.; Pappin, D.; Ross, P.; Williamson, B.; Guertin, S.; Pribil, P.; Lee, K. H. 8-Plex quantitation
Journal of Proteome Research • Vol. 7, No. 01, 2008 59
reviews
(30)
(31)
(32)
(33) (34)
(35) (36)
(37) (38) (39)
(40)
(41) (42)
(43) (44)
(45)
(46)
(47)
(48) (49)
(50)
60
of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer’s disease. Proteomics 2007, 7 (20), 3651–60. Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 1999, 17 (10), 994–9. Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 2002, 1 (5), 376–86. Hendrickson, E. L.; Xia, Q.; Wang, T.; Leigh, J. A.; Hackett, M. Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics. Analyst 2006, 131 (12), 1335–41. Kim, Y. J.; Zhan, P.; Feild, B.; Ruben, S. M.; He, T. Reproducibility Assessment of Relative Quantitation Strategies for LC-MS Based Proteomics. Anal. Chem. 2007, 79 (15), 5651–8. Lin, W. T.; Hung, W. N.; Yian, Y. H.; Wu, K. P.; Han, C. L.; Chen, Y. R.; Chen, Y. J.; Sung, T. Y.; Hsu, W. L. Multi-Q: a fully automated tool for multiplexed protein quantitation. J. Proteome Res. 2006, 5 (9), 2328–38. Shadforth, I. P.; Dunkley, T. P.; Lilley, K. S.; Bessant, C. i-Tracker: for quantitative proteomics using iTRAQ. BMC Genomics 2005, 6, 145. Griffin, T. J.; Xie, H.; Bandhakavi, S.; Popko, J.; Mohan, A.; Carlis, J. V.; Higgins, L., iTRAQ Reagent-Based Quantitative Proteomic Analysis on a Linear Ion Trap Mass Spectrometer. J. Proteome Res. 2007, 6, 4200-4209. Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551–67. Schmidt, A.; Kellermann, J.; Lottspeich, F. A novel strategy for quantitative proteomics using isotope-coded protein labels. Proteomics 2005, 5 (1), 4–15. Zhang, X.; Jin, Q. K.; Carr, S. A.; Annan, R. S. N-Terminal peptide labeling strategy for incorporation of isotopic tags: a method for the determination of site-specific absolute phosphorylation stoichiometry. Rapid Commun. Mass Spectrom. 2002, 16 (24), 2325– 32. Wells, L.; Gao, Y.; Mahoney, J. A.; Vosseller, K.; Chen, C.; Rosen, A.; Hart, G. W. Dynamic O-glycosylation of nuclear and cytosolic proteins: further characterization of the nucleocytoplasmic betaN-acetylglucosaminidase, O-GlcNAcase. J. Biol. Chem. 2002, 277 (3), 1755–61. Cagney, G.; Emili, A. De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging. Nat. Biotechnol. 2002, 20 (2), 163–70. Yao, X.; Freas, A.; Ramirez, J.; Demirev, P. A.; Fenselau, C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal. Chem. 2001, 73 (13), 2836– 42. Stewart, I. I.; Thomson, T.; Figeys, D. 18O labeling: a tool for proteomics. Rapid Commun. Mass Spectrom. 2001, 15 (24), 2456– 65. Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Matthews, D. E.; Yates, J. R., III. Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis. Anal. Chem. 2004, 76 (17), 4951–9. Van Hoof, D.; Pinkse, M. W.; Oostwaard, D. W.; Mummery, C. L.; Heck, A. J.; Krijgsveld, J. An experimental correction for arginineto-proline conversion artifacts in SILAC-based quantitative proteomics. Nat. Methods 2007, 4 (9), 677–8. Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Quantitative profiling of differentiation-induced microsomal proteins using isotopecoded affinity tags and mass spectrometry. Nat. Biotechnol. 2001, 19 (10), 946–51. Li, X. J.; Zhang, H.; Ranish, J. A.; Aebersold, R. Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal. Chem. 2003, 75 (23), 6648–57. MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., III. A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal. Chem. 2003, 75 (24), 6912–21. Wang, G.; Wu, W. W.; Pisitkun, T.; Hoffert, J. D.; Knepper, M. A.; Shen, R. F. Automated quantification tool for high-throughput proteomics using stable isotope labeling and LC-MSn. Anal. Chem. 2006, 78 (16), 5752–61. Bouyssie, D.; Gonzalez de Peredo, A.; Mouton, E.; Albigot, R.; Roussel, L.; Ortega, N.; Cayrol, C.; Burlet-Schiltz, O.; Girard, J. P.; Monsarrat, B., MFPaQ, a new software to parse, validate, and
Journal of Proteome Research • Vol. 7, No. 01, 2008
Mueller et al.
(51) (52)
(53)
(54) (55) (56)
(57)
(58)
(59) (60) (61)
(62)
(63)
(64) (65)
(66)
(67)
(68) (69) (70)
(71)
quantify proteomic data generated by ICAT and SILAC mass spectrometric analyses: application to the proteomic study of membrane proteins from primary human endothelial cells. Mol. Cell. Proteomics 2007. Johnson, K. L.; Muddiman, D. C. A method for calculating 16O/ 18O peptide ion ratios for the relative quantification of proteomes. J. Am. Soc. Mass Spectrom. 2004, 15 (4), 437–45. Ramos-Fernandez, A.; Lopez-Ferrer, D.; Vazquez, J. Improved Method for Differential Expression Proteomics Using Trypsincatalyzed 18O Labeling with a Correction for Labeling Efficiency. Mol. Cell. Proteomics 2007, 6 (7), 1274–86. Halligan, B. D.; Slyper, R. Y.; Twigger, S. N.; Hicks, W.; Olivier, M.; Greene, A. S. ZoomQuant: an application for the quantitation of stable isotope labeled peptides. J. Am. Soc. Mass Spectrom. 2005, 16 (3), 302–6. Shinkawa, T.; Taoka, M.; Yamauchi, Y.; Ichimura, T.; Kaji, H.; Takahashi, N.; Isobe, T. STEM: a software tool for large-scale proteomic data analyses. J. Proteome Res. 2005, 4 (5), 1826–31. Schulze, W. X.; Mann, M. A novel proteomic screen for peptideprotein interactions. J. Biol. Chem. 2004, 279 (11), 10756–64. Saito, A.; Nagasaki, M.; Oyama, M.; Kozuka-Hata, H.; Semba, K.; Sugano, S.; Yamamoto, T.; Miyano, S. AYUMS: an algorithm for completely automatic quantitation based on LC-MS/MS proteome data and its application to the analysis of signal transduction. BMC Bioinformatics 2007, 8, 15. Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 2006, 22 (15), 1902–9. Listgarten, J.; Emili, A. Statistical and computational methods for comparative proteomic profiling using liquid chromatographytandem mass spectrometry. Mol. Cell. Proteomics 2005, 4 (4), 419– 34. Zimmer, J. S.; Monroe, M. E.; Qian, W. J.; Smith, R. D. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. 2006, 25 (3), 450–82. Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32 Suppl, 496–501. Callister, S. J.; Barry, R. C.; Adkins, J. N.; Johnson, E. T.; Qian, W. J.; Webb-Robertson, B. J.; Smith, R. D.; Lipton, M. S. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 2006, 5 (2), 277–86. Rinner, O.; Mueller, L. N.; Hubalek, M.; Muller, M.; Gstaiger, M.; Aebersold, R. An integrated mass spectrometric and computational framework for the analysis of protein interaction networks. Nat. Biotechnol. 2007, 25 (3), 345–52. May, D.; Fitzgibbon, M.; Liu, Y.; Holzman, T.; Eng, J.; Kemp, C. J.; Whiteaker, J.; Paulovich, A.; McIntosh, M. A platform for accurate mass and time analyses of mass spectrometry data. J. Proteome Res. 2007, 6 (7), 2685–94. Jaffe, J. D.; Mani, D. R.; Leptos, K. C.; Church, G. M.; Gillette, M. A.; Carr, S. A. PEPPeR, a platform for experimental proteomic pattern recognition. Mol. Cell. Proteomics 2006, 5 (10), 1927–41. Li, X. J.; Pedrioli, P. G.; Eng, J.; Martin, D.; Yi, E. C.; Lee, H.; Aebersold, R. A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. 2004, 76 (13), 3856–60. Palagi, P. M.; Walther, D.; Quadroni, M.; Catherinet, S.; Burgess, J.; Zimmermann-Ivol, C. G.; Sanchez, J. C.; Binz, P. A.; Hochstrasser, D. F.; Appel, R. D. MSight: an image analysis software for liquid chromatography-mass spectrometry. Proteomics 2005, 5 (9), 2381–4. Heller, M.; Ye, M.; Michel, P. E.; Morier, P.; Stalder, D.; Junger, M. A.; Aebersold, R.; Reymond, F.; Rossier, J. S. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 2005, 4 (6), 2273–82. Kohlbacher, O.; Reinert, K.; Gropl, C.; Lange, E.; Pfeifer, N.; SchulzTrieglaff, O.; Sturm, M. TOPP--the OpenMS proteomics pipeline. Bioinformatics 2007, 23 (2), e191–7. Reich, M.; Liefeld, T.; Gould, J.; Lerner, J.; Tamayo, P.; Mesirov, J. P. GenePattern 2.0. Nat. Genet. 2006, 38 (5), 500–1. Andreev, V. P.; Li, L.; Cao, L.; Gu, Y.; Rejtar, T.; Wu, S. L.; Karger, B. L. A new algorithm using cross-assignment for label-free quantitation with LC-LTQ-FT MS. J. Proteome Res. 2007, 6 (6), 2186–94. Hoopmann, M. R.; Finney, G. L.; Maccoss, M. J. High-Speed Data Reduction, Feature Detection, and MS/MS Spectrum Quality
reviews
Assessment of Software Solutions for Proteomics Data
(72)
(73)
(74) (75) (76)
(77)
Assessment of Shotgun Proteomics Data Sets Using High-Resolution Mass Spectrometry. Anal. Chem. 2007, 79 (15), 5620–32. Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 2006, 78 (3), 779–87. America, A. H.; Cordewener, J. H.; van Geffen, M. H.; Lommen, A.; Vissers, J. P.; Bino, R. J.; Hall, R. D. Alignment and statistical difference analysis of complex peptide data sets generated by multidimensional LC-MS. Proteomics 2006, 6 (2), 641–53. Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; Elmagarmid, A. K. Data pre-processing in liquid chromatography-mass spectrometrybased proteomics. Bioinformatics 2005, 21 (21), 4054–9. Bodenmiller, B.; Mueller, L. N.; Mueller, M.; Domon, B.; Aebersold, R. Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat. Methods 2007, 4 (3), 231–7. Norbeck, A. D.; Monroe, M. E.; Adkins, J. N.; Anderson, K. K.; Daly, D. S.; Smith, R. D. The utility of accurate mass and LC elution time information in the analysis of complex proteomes. J. Am. Soc. Mass Spectrom. 2005, 16 (8), 1239–49. Schmidt, A.; Gehlenborg, N.; Bodenmiller, B.; Mueller, L.; Domon, B.; Aebersold, R., An integrated, directed mass spectrometric approach for in-depth characterization of complex peptide mixtures. submitted, 2007.
(78) Bodenmiller, B.; Malmstrom, J.; Gerrits, B.; Campbell, D.; Lam, H.; Schmidt, A.; Rinner, O.; Mueller, L. N.; Shannon, P. T.; Pedrioli, P. G.; Panse, C.; Lee, H. K.; Schlapbach, R.; Aebersold, R. PhosphoPep--a phosphoproteome resource for systems biology research in Drosophila Kc167 cells. Mol. Syst. Biol. 2007, 3, 139. (79) Brusniak, M.; Bodenmiller, B.; Cooke, K.; Campbell, D.; Eddes, J. S.; Letarte, S.; Mueller, L. N.; Sharama, V.; Vitek, O.; Watts, J. D.; Aebersold, R., Corra: a LC-MS framework and computational tool for discovery and targeted mass spectrometry. in preparation, 2007. (80) King, N. L.; Deutsch, E. W.; Ranish, J. A.; Nesvizhskii, A. I.; Eddes, J. S.; Mallick, P.; Eng, J.; Desiere, F.; Flory, M.; Martin, D. B.; Kim, B.; Lee, H.; Raught, B.; Aebersold, R. Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas. Genome Biol. 2006, 7 (11), R106. (81) Brunner, E.; Ahrens, C. H.; Mohanty, S.; Baetschmann, H.; Loevenich, S.; Potthast, F.; Deutsch, E. W.; Panse, C.; de Lichtenberg, U.; Rinner, O.; Lee, H.; Pedrioli, P. G.; Malmstrom, J.; Koehler, K.; Schrimpf, S.; Krijgsveld, J.; Kregenow, F.; Heck, A. J.; Hafen, E.; Schlapbach, R.; Aebersold, R. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 2007, 25 (5), 576–83.
PR700758R
Journal of Proteome Research • Vol. 7, No. 01, 2008 61