STEM: A Software Tool for Large-Scale Proteomic Data Analyses

STEM: A Software Tool for Large-Scale Proteomic Data Analyses Takashi Shinkawa,† Masato Taoka,*,‡ Yoshio Yamauchi,† Tohru Ichimura,‡ Hiroyuki Kaji,‡ Nobuhiro Takahashi,†,§ and Toshiaki Isobe†,‡,| Integrated Proteomics System Project, Pioneer Research on Genome the Frontier, MEXT, c/o Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University, 1-1 Minami-osawa, Hachioji-shi, Tokyo 192-0397, Japan, Laboratory of Biochemistry, Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University, 1-1 Minami-osawa, Hachioji-shi, Tokyo 192-0397, Japan, Department of Applied Biological Science, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu-shi, Tokyo 183-8509, Japan, and Division of Proteomics Research, The Institute of Medical Science, The University of Tokyo, Shiroganedai 4-6-1, Minato-ku, Tokyo 108-8639, Japan Received June 6, 2005

We describe the software, STEM (STrategic Extractor for Mascot’s results), which efficiently processes large-scale mass spectrometry-based proteomics data. V (View)-mode evaluates the Mascot peptide identification dataset, removes unreliable candidates and redundant assignments, and integrates the results with key information in the experiment. C (Comparison)-mode compares peptide coverage among multiple datasets and displays proteins commonly/specifically found therein, and processes data for quantitative studies that utilize conventional isotope tags or tags having a smaller mass difference. STEM significantly improves throughput of proteomics study. Keywords: software • bioinformatics • data analysis • quantitation • liquid chromatography • mass spectrometry • proteomics

1. Introduction Proteomics has become an essential tool for biological studies. Proteomics technologies are based on mass spectrometry and bioinformatics coupled with the ever-evolving information in gene/protein sequence databases. In particular, recent advances in integrated liquid chromatography (LC) mass spectrometry (MS)-based protein identification technology have significantly expanded proteomics studies to a much larger scale, with the potential to yield a global description of the proteome.1-3 Such analyses have been applied to catalog proteins and to find novel components in functional multiprotein complexes (e.g., cell signaling complexes), cellular organelles, and the constituents of a variety of normal and aberrant cells/tissues. We have investigated these biological systems by combining a fully automated microscale multidimensional LC4 or a direct nanoflow LC5 with a high-resolution Q-TOF hybrid mass spectrometer coupled with a data analysis system. We have used these systems for global analysis of proteins expressed in Escherichia coli,6 Caenorhabditis elegans,7 mouse embryonic stem cells,8 and for large-scale identification * To whom correspondence should be addressed. Fax: +81-426-77-2525. E-mail: [email protected]. † Integrated Proteomics System Project, Pioneer Research on Genome the Frontier, MEXT, c/o Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University. ‡ Laboratory of Biochemistry, Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University. § Department of Applied Biological Science, Tokyo University of Agriculture and Technology. | Division of Proteomics Research, The Institute of Medical Science, The University of Tokyo.

1826

Journal of Proteome Research 2005, 4, 1826-1831

Published on Web 08/20/2005

of N-glycoproteins expressed in C. elegans9 as well as for snapshot analyses of ribosome biogenesis in mammalian cells.10-12 In most mass spectrometry-based “shotgun” proteomics, proteins are identified by searching the databases using Mascot13 and/or Sequest14 and several thousands to several tens of thousands of spectral data as probes. Such analyses generate extremely large datasets that include peptide/protein assignments and variables that yield detailed information on protein structure and function. In addition, the resulting datasets generally need further evaluation and reorganization because they often include ambiguous peptide identity data and redundant peptide assignments for each protein; thus, data processing in proteomics studies is both labor-intensive and time-consuming.15 The recent development of quantitative proteomics, such as the ICAT approach,16,17 which use different stable isotope tags, further increases the complexity of data processing. A number of software packages have been developed to expedite the data processing step. For instance, Autoquest,18 SEQUEST Summary,14 DTAselect19 and INTERACT20 are designed to organize and rearrange peptide/protein identification results, whereas RelEx,21 ASAPRatio,22 and XPRESS17 extract quantitative information. These software packages permit rapid data processing after identification of proteins by Sequest, thereby expediting proteomics studies. However, there is no such software run on personal computers for data rearrangement of Mascot,23 an alternative search engine widely used in many proteomics studies. Moreover, the quantitative software 10.1021/pr050167x CCC: $30.25

 2005 American Chemical Society

research articles

STEM for Large-Scale Proteomic Data Analyses

Figure 1. STEM data flowchart. Precise data flow is described in the Experimental Section. There are three main procedures that generate reports that summarize a single experiment (V-mode), compare results from multiple experiments (CF-mode), or quantify the relative abundance in a single experiment (CQ-mode). Each of the main procedures automatically produces multiple reports that emphasize specific aspects of the overall dataset.

packages for comparative proteomics experiments have been designed primarily for ICAT or metabolic labeling studies, which introduce a relatively large mass difference. Moreover, there is no equivalent software for analyses that use isotope tags having a smaller mass difference comparable, for instance, H216O/H218O differential labeling.24-26 Here, we present a software package, STEM (STrategic Extractor for Mascot’s results), which efficiently processes Mascot search data. STEM is a stand-alone computational tool that evaluates, integrates, and compares large datasets produced by Mascot. STEM is compatible with quantitative proteomics studies that utilize various stable isotope tags. Information to how to obtain the STEM programs can be found on our web site (http://www.sci.metro-u.ac.jp/proteomicslab/).

2. Experimental Section 2.1. Outline of STEM. STEM software is designed to execute high-speed processing/organization of experimental data from a large-scale proteomics study. It consists of a V (view)-mode and a C (comparison)-mode (Figure 1). In V-mode, the information for rapid and automated identification of proteins can be extracted from Mascot search results, and the identified proteins can be properly organized and displayed together along with key information from the experiment. In C-mode, the organized datasets can be compared. C-mode has two submodes: comparison and find mode (CF-mode) and comparison and quantitation mode (CQ-mode), which have different algorithms. In CF-mode, a list of comparative abundance of each protein existing in respective mixtures is generated by processing datasets arranged in V-mode, allowing preliminary Journal of Proteome Research • Vol. 4, No. 5, 2005 1827

research articles

Shinkawa et al.

Figure 2. Portion of a summary table obtained using the STEM V-mode. Peptides are grouped and associated with the identified proteins, which are sorted according to the identification (ID) number in the NCBI database. The tabular display includes the assigned peptide sequence (Sequence), peptide position in the protein, percent coverage, the flat file name (DAT), Ions Score (Isc), Threshold Score (Tsc), charge of the precursor ion (Charge), observed mass/charge (m/z), and the difference between theoretical and observed mass (∆). (Note: DAT, peptide position and ∆ are not shown.) Note that the Ions Score of each listed peptide is equal to or greater than its Threshold Score, consistent with the evaluation criterion used to extract data from the Mascot files.

comparison of proteins in the samples. CQ-mode analyzes mass spectral data of two protein mixtures, each labeled with a different stable isotope tag, and the relative abundance of each protein pair present in the samples is quantitatively measured by comparing two different isotope mass signals. 2.2. Data Preparation for STEM. To collate each tandem mass (MS/MS) spectrum with the peptide sequence, Mascot reports a probability-based value called Ions Score, which is defined as -10*log10(P), where P is the absolute probability assuming that the observed match between the experimental data and a sequence is a random event.13 The relevance for peptide identification is determined by comparing Ions Score with Threshold Score, which is a database-dependent threshold given by a probability, at which the MS/MS spectrum is randomly matched with a sequence in the database. However, Mascot uses peptide-based identification and outputs a summary table that includes the identification of peptides having Ions Score lower than the threshold. Therefore, without proper filtering, the table is of minimal use for further peptide-based data analysis. Thus, it is imperative that ambiguous peptide identifications be removed using the STEM V-mode according to user-defined criteria that yield a peptide-centric summary table. To introduce the data into V-mode, we match the peptide sequence to the MS/MS spectrum, as described below. (1) Peptides separated by direct nanoflow LC (DNLC) or twodimensional LC (2DLC) are analyzed by a mass spectrometer (Q-TOF2, Micromass-Waters, Milford, MA) in a data-dependent mode, yielding MS/MS spectra. (2) Each MS/MS spectrum is converted into an independent peak list (PKL) file by MassLynx software (Micromass-Waters). (3) The PKL file is used as a query for the Mascot search, and the resulting flat data file (called a DAT file) is used for V-mode processing. 1828

Journal of Proteome Research • Vol. 4, No. 5, 2005

In addition to the data described above, the data used for quantitative analysis in C-mode are obtained by converting MS spectra to an ASCII file using MassLynx. To evaluate the performance of STEM, we used MS/MS spectra obtained in our previous studies, as follows: a pre-ribonucleoprotein complex that was pulled down from 293EBNA cells using nucleolin as bait,5 E. coli proteins,6 and C. elegans proteins.7 We also used spectra from an MS and MS/MS analysis of a rat liver ribosomal particle labeled with 12C/14N- or 13C/15N-labeled O-methylisourea, as described elsewhere (Yamauchi et al., manuscript in preparation). In brief, ribosomal particles were prepared as described,27 S-alkylated with iodoacetamide, and digested with lysyl endopeptidase. The digest was labeled by a guanidination reaction with 12C/14N or 13C/15N O-methylisourea. These “light” or “heavy” reagents introduce an additional 43 or 46 atomic mass units (amu), respectively, into an epsilon-amino group of a lysine residue, such that the modification causes a difference of 3 amu/peptide into each peptide pair. After the modification reaction, the peptide samplessdifferentially labeled with isotopesswere mixed and analyzed using a DNLCMS system to obtain spectra for the CQ-mode. 2.3. Algorithm for STEM Software. V-mode. The V-mode comprises three phases, Extraction, Evaluation, and Rearrangement. The Extraction phase collects important information from DAT files to confirm peptide identification and outputs the data to a flat text file. Each DAT file is read to acquire Ions Score, Threshold Score, the identified sequence with modifications, precursor mass, calculated precursor mass, charge state, the number of assigned y-series ions, the number of assigned b-series ions, and the protein or genetic locus from which the peptide derives. The information from DAT files and database lookup are stored in the STEM text file for the Evaluation phase, as shown in Figure 2.

research articles


Figure 3. (A) A portion of a multiple comparison table and (B) a sample summary report obtained using the STEM CF-mode. (A) Each row in the table represents one protein. The percent sequence coverage found in each data set is shown. (B) The table summarizes the number of proteins identified in each cellular fraction, the number of proteins found specifically (unique) in each fraction or commonly (common) among fractions, and the sum-set (sum-set) of the sample data.

During the Evaluation phase, STEM filters out ambiguous peptide identifications according to the significant threshold calculated from each Threshold Score and the number of yor b-series signals assigned in the Extraction phase. In these steps, the user can set the threshold and the number of series ions, where 95% confidence and 3 are the respective default settings. Only the information that passes through these steps is judged as an “identified peptide” and carried forward in the Rearrangement phase. During the Rearrangement phase, the peptides are sorted by locus, and peptides for each locus are sorted by sequence. Next, the same database used by Mascot is searched for the descriptive name, peptide positions and percentage of identified peptide sequence in the full protein sequence. C-mode. As described above, the C-mode has two submodes: the CF-mode (Comparison and Find) and the CQmode (Comparison and Quantitation). In the CF-mode, two or more datasets are compared, and information is extracted regarding identified proteins present commonly or partially in the datasets. The comparison is performed on the processed datasets in V-mode, and the dataset names, the number of candidate peptides, and coverage are outputted in CSV format in accordance with the accession number of the protein (Figure 3A.) A summary table is also generated, as shown in Figure 3B. In CQ-mode, mass spectral data for two peptide mixtures labeled with different stable isotopes are analyzed, and the relative abundance of each corresponding protein present in the samples is quantitated by comparing two different isotope mass peaks. STEM extracts the identified peptide information from the DAT file generated by the V-mode and then outputs a list of identified proteins. Mass spectral peaks of identified proteins are then searched within the mass spectra in ASCII format. Further, mass peaks of peptides labeled with light and heavy isotopes are searched by utilizing the difference in mass to find pairs of labeled peptides. The relative abundance of peptides is determined by comparing the peak intensities of these pairs, and the results are outputted as a list of identified peptides. The user can set the difference in the masses of the isotopes. For instance, in case of H216O/H218O differential labeling of peptides,24 the difference is 4, and in the case of our O-methylisourea method, the difference is 3. If the mass

peaks overlap as a consequence of a small mass difference between labeling agents, the natural peptide peak ratio is theoretically calculated using the natural isotope abundance ratio. Then, the observed peak ratio is corrected using the calculated value. To evaluate the quantitative results of the method, we have defined Pptn, as illustrated in the following equation (eq 1). n

Pptn )

∑ [(Obs /Obs -T /T ) ]

2 1/2

n

1

n

1

(1)

k)2

where Obs is an observed height of the isotope mass peak, and T is a theoretical abundance ratio calculated using the natural isotope abundance ratio. The numerical suffixes, 1, 2, 3.... to Obs and T, are in the order of isotope molecular mass, beginning with the smallest. We considered the level of concomitant noise to be insignificant if the value of Pptn is less than 0.4. When the value is greater than 0.4, the mass spectra were visually inspected to verify the reliability of the quantitation. To facilitate visual inspection, we developed a graphical interface for the user, which displays the portion of the mass spectra to be inspected (see Figure 4A). If a single protein is quantitated by multiple peptides derived from different portions of the polypeptide, STEM calculates and displays an average value (Figure 4B). 2.4. System Requirements for STEM. STEM requires a Windows NT4.0-, 2000-, or XP-based PC equipped with a minimum of a 600-MHz Pentium processor and ∼128 MB of RAM. Some of the STEM commands use internal functions of Microsoft Excel; therefore, Excel 2000, XP, or 2003 is also required. In this study, we used a Windows 2000-based DELL Dimension 4300S (Dell Inc., Texas, USA) equipped with an Intel Pentium 4 1.9-GHz processor and 256 MB SDRAM.

3. Results and Discussion 3.1. Application of V-Mode. To validate the function of V-mode, we used a dataset that had been previously analyzed manually in our laboratory. One of the example datasets was obtained by analyzing protein components in a ribonucleoprotein complex pulled down from 293EBNA cells using nucleolin as bait.5 Nucleolin is a trans-acting factor that functions during ribosome biogenesis.28 We analyzed a lysyl endopeptidase Journal of Proteome Research • Vol. 4, No. 5, 2005 1829

research articles

Shinkawa et al.

Figure 4. (A) Portion of a relative quantitation table and the graphical interface for the user, and (B) a portion of a protein summary report obtained using the STEM CQ-mode. (A) Each row in the table represents one peptide. The tabular display includes the assigned protein with ID number, relative quantity of peptides (Peptide-Q) and Pptn. Clicking on a peptide causes its spectrum to be shown via the STEM graphical user interface. (B) Each row in the table represents a single protein. The tabular display includes the protein name with ID number, the mean relative quantity of peptides (Protein-Q) and the number of the peptides (Pep-No) found in each data set.

digest of 0.2 µg of the nucleolin-associated ribonucleoprotein complex using DNLC-MS/MS, and we identified 92 proteins by searching 501 DAT files through the NCBInr database via Mascot.5 These proteins included 64 ribosomal proteins and 28 nonribosomal proteins. It took almost 2 days to identify and evaluate these proteins manually. The same dataset was processed using STEM, with the following default threshold values: the probability was set to 95%, and the detected number of y- or b-series ions was set to 3. Within 5 min, STEM identified 92 proteins that were identical to those obtained using the manual process (see Table 2 in ref 5). STEM V-mode was also applied to analyze a much larger dataset that was generated by automated 2DLC-MS/MS shotgun analysis of a tryptic digest of E. coli crude cell lysate.6 The analysis was performed on a fully automated microscale system that was programmed to complete the analysis within 16 h, and mass spectra were collected in a data-dependent collision-induced dissociation MS/MS mode. Although the analysis generated ∼20 000 DAT files in a single run, a typical dataset containing 19 368 DAT files was processed under the default settings. In less than 45 min, STEM V-mode assigned 3886 tryptic peptides from the dataset through the Extraction and Evaluation processes and selected 1006 proteins by removing redundant identifications of unique proteins using the Rearrangement process (part of the data are shown in Figure 2). The Rearrangement process also generated the number of identified peptides within each unique protein and reported the percent coverage of the complete amino acid sequence. The list of the identified peptides was in complete agreement with that obtained previously by labor-intensive manual processing, which took more than 2 weeks, proving that STEM is a useful tool for large-scale proteomics studies. 3.2. Application of CF-Mode. CF-mode is designed to compare/summarize multiple datasets generated by, for instance, large-scale shotgun analyses. To evaluate the performance of this mode, we used datasets generated by profiling the proteins expressed in the nematode C. elegans.7 In this study, crude protein extracts were centrifuged to generate soluble and insoluble fractions, and each fraction was separately analyzed by the shotgun approach using tryptic digestion 1830

Journal of Proteome Research • Vol. 4, No. 5, 2005

and an automated microscale 2DLC-MS/MS system. Under our standard analytical conditions described above, the soluble and insoluble protein fractions generated 10 899 and 6696 DAT files, respectively. These data were searched independently against the Wormpep database29 via Mascot, and the resulting protein identification tables were processed first by the STEM V-mode to remove redundant identifications, etc. This procedure produced two independent lists, one each for the soluble (766 proteins) and insoluble (1097) proteins. The CF-mode then compared these lists (datasets), extracted the proteins found in both fractions, and within a few minutes had generated a comparison table of the two datasets (Figure 3A,B). We also designed the CF-mode to output peptide coverage of each protein, which is an indicator of relative protein abundance within the sample30 and therefore is useful for semiquantitative comparison among the proteins listed in the table. In fact, the table provides preliminary information on the subcellular distribution of each protein with regard to relative abundance in the soluble or insoluble fraction. 3.3. Application of CQ-Mode. Differential protein analysis using the shotgun approach coupled with stable isotope labeling of proteins in vivo or in vitro accelerates the analysis of dynamic aspects of protein expression and protein-protein interactions in cells.31,32 The STEM CQ-mode is designed for this purpose; for example, the data processing of quantitative proteomics that utilizes different isotope tags. To validate the software performance, we isolated ribosomes from rat liver by conventional sucrose density gradient ultracentrifugation, digested the sample with lysyl endopeptidase, and labeled with “heavy” or “light” O-methylisourea (see the Experimental Section). The differentially labeled preparations were then mixed at a 1:1 ratio and analyzed by the DNLC-MS system to quantitatively identify the ribosomal components. It is generally accepted that the functional mammalian ribosome consists of ∼80 protein components, 47 from the large 60S subunit and 33 from the small 40S subunit,33 as well as ribosomal RNAs. Using the LC-MS analysis coupled with data processing with Mascot and the CQ-mode (part of the data is shown in Figure 4A,B), we identified most of these proteins and simultaneously estimated the relative abundance of each component within

research articles


the ribosome complex (with the exception of several proteins expected to produce a small number of multiply charged peptide ions within the range of mass spectrometric detection). The relative abundance of 53 ribosomal proteins quantitated from 157 peptides indicates reasonably accurate quantitative values, with a mean standard deviation of 1.01 ( 0.08 (“heavy”/ “light”). To avoid erroneous quantitation due to mass signals of nontarget peptides or contaminants, we introduced a parameter, Pptn (see the Experimental Section), into the CQmode. When applying the CQ-mode to many biological samples, we found the Pptn value to be a good indicator of the reliability of quantitative results, because the mass spectra with Pptn values of less than 0.4 practically contained no detectable overlapping signals that interfered the quantitative analysis.

4. Conclusions We have shown that STEM is a powerful software tool for the rapid analysis of large datasets obtained by mass spectrometry-based proteomics coupled with Mascot. The STEM V-mode removes unreliable candidates from the large amount of Mascot search data, removes redundant peptide assignments for individual proteins, and efficiently extracts and integrates valuable information relevant to peptide/protein identification. Thus, V-mode greatly reduces the time and labor required to prepare a summary table of the identified proteins. The STEM C-mode compares lists of identified proteins, and proteins found in common between the lists and/or unique to a list from a specific analysis can be pulled out rapidly, enabling easy comparison of very complex proteomics results. In addition, C-mode can process data for large-scale quantitative proteomics that utilize different stable isotope labeling. STEM has a peak analysis function to differentiate overlapping mass signals resulting from the natural abundance of isotopes, and thus it can be applied to studies in which the isotope tags have a small mass difference (e.g., 3∼4 amu) as well as conventional isotope tags used for ICAT or SILAC technologies. Because STEM can accomplish very complicated procedures within a few minutes to an hour, it can expedite proteomics studies and help accelerate molecular studies of cellular events. Abbreviations: 2D, two-dimensional; 2DLC, two-dimensional liquid chromatography; MS/MS, tandem mass spectrometry; NCBI, national center for biotechnology information.

Acknowledgment. We thank Dr. Muneo Saito for valuable comments and advice. This work was supported in part by grants for Integrated Proteomics Project, Pioneer Research on Genome the Frontier from MEXT, Japan. References (1) McDonald, W. H.; Yates, J. R., 3rd Curr. Opin. Mol. Ther. 2003, 5, 302-309. (2) Patterson, S. D.; Aebersold, R. H. Nat. Genet. 2003, 33 Suppl, 311323. (3) Aebersold, R.; Mann, M. Nature 2003, 422, 198-207. (4) Isobe, T.; Yamauchi, Y.; Taoka, M.; Takahashi, N. In Proteins and Proteomics; Laboratory Manual; Simpson, R. J., Ed.; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY. 2003, 869876.

(5) Natsume, T.; Yamauchi, Y.; Nakayama, H.; Shinkawa, T.; Yanagida, M.; Takahashi, N.; Isobe, T. Anal. Chem. 2002, 74, 4725-4733. (6) Taoka, M.; Yamauchi, Y.; Shinkawa, T.; Kaji, H.; Motohashi, W.; Nakayama, H.; Takahashi, N.; Isobe, T. Mol. Cell. Proteomics. 2004, 3, 780-787. (7) Mawuenyega, K. G.; Kaji, H.; Yamuchi, Y.; Shinkawa, T.; Saito, H.; Taoka, M.; Takahashi, N.; Isobe, T. J. Proteome Res. 2003, 2, 23-35. (8) Nagano, K.; Taoka, M.; Yamauchi, Y.; Itagaki, C.; Shinkawa, T.; Nunomura, K.; Okamura, N.; Takahashi, N.; Izumi, T.; Isobe, T. Proteomics. 2005, 5, 1346-1361. (9) Kaji, H.; Saito, H.; Yamauchi, Y.; Shinkawa, T.; Taoka, M.; Hirabayashi, J.; Kasai, K.; Takahashi, N.; Isobe, T. Nat. Biotechnol. 2003, 21, 667-672. (10) Yanagida, M.; Hayano, T.; Yamauchi, Y.; Shinkawa, T.; Natsume, T.; Isobe, T.; Takahashi, N. J. Biol. Chem. 2004, 279, 1607-1614. (11) Yanagida, M.; Shimamoto, A.; Nishikawa, K.; Furuichi, Y.; Isobe, T.; Takahashi, N. Proteomics 2001, 1, 1390-1404. (12) Hayano, T.; Yanagida, M.; Yamauchi, Y.; Shinkawa, T.; Isobe, T.; Takahashi, N. J. Biol. Chem. 2003, 278, 34309-34319. (13) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567. (14) Ducret, A.; Van Oostveen, I.; Eng, J. K.; Yates, J. R., 3rd; Aebersold, R. Protein Sci. 1998, 7, 706-719. (15) MacCoss, M. J. Curr. Opin. Chem. Biol. 2005, 9, 88-94. (16) Zhou, H.; Ranish, J. A.; Watts, J. D.; Aebersold, R. Nat. Biotechnol. 2002, 20, 512-515. (17) Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Nat. Biotechnol. 2001, 19, 946-951. (18) Tabb, D. L.; Eng, J. K.; Yates, J. R., 3rd In Proteome Research: Mass Spectrometry; James, P., Ed.; Springer: NY, 2001, Vol. 1, pp 125142. (19) Tabb, D. L.; McDonald, W. H.; Yates, J. R., 3rd J. Proteome Res. 2002, 1, 21-26. (20) Von Haller, P. D.; Yi, E.; Donohoe, S.; Vaughn, K.; Keller, A.; Nesvizhskii, A. I.; Eng, J.; Li, X. J.; Goodlett, D. R.; Aebersold, R.; Watts, J. D. Mol. Cell. Proteomics 2003, 2, 428-442. (21) MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., 3rd Anal. Chem. 2003, 75, 6912-6921. (22) Li, X. J.; Zhang, H.; Ranish, J. A.; Aebersold, R. Anal. Chem. 2003, 75, 6648-6657. (23) Yang, X.; Dondeti, V.; Dezube, R.; Maynard, D. M.; Geer, L. Y.; Epstein, J.; Chen, X.; Markey, S. P.; Kowalak, J. A. J. Proteome Res. 2004, 3, 1002-1008. (24) Qian, W. J.; Monroe, M. E.; Liu, T.; Jacobs, J. M.; Anderson, G. A.; Shen, Y.; Moore, R. J.; Anderson, D. J.; Zhang, R.; Calvano, S. E.; Lowry, S. F.; Xiao, W.; Moldawer, L. L.; Davis, R. W.; Tompkins, R. G.; Camp, D. G., 2nd; Smith, R. D. Mol. Cell. Proteomics. 2005, 4, 700-709. (25) Zang, L.; Palmer Toy, D.; Hancock, W. S.; Sgroi, D. C.; Karger, B. L. J. Proteome Res. 2004, 3, 604-612. (26) Brown, K. J.; Fenselau, C. J. Proteome Res. 2004, 3, 455-462. (27) Borgese, N.; Mok, W.; Kreibich, G.; Sabatini, D. D. J. Mol. Biol. 1974, 88, 559-580. (28) Takahashi, N.; Yanagida, M.; Fujiyama, S.; Hayano, T.; Isobe, T. Mass Spectrom. Rev. 2003, 22, 287-317. (29) WormPD: http://www.sanger.ac.uk/Projects/C_elegans/WORMBASE/current/wormpep.shtml-. (30) Florens, L.; Washburn, M. P.; Raine, J. D.; Anthony, R. M.; Grainger, M.; Haynes, J. D.; Moch, J. K.; Muster, N.; Sacci, J. B.; Tabb, D. L.; Witney, A. A.; Wolters, D.; Wu, Y.; Gardner, M. J.; Holder, A. A.; Sinden, R. E.; Yates, J. R.; Carucci, D. J. Nature. 2002, 419, 520-526. (31) Julka, S.; Regnier, F. J. Proteome Res. 2004, 3, 350-363. (32) Ong, S. E.; Foster, L. J.; Mann, M. Methods 2003, 29, 124-130. (33) Swiss-Prot: http://us.expasy.org/sprot/-.

PR050167X

Journal of Proteome Research • Vol. 4, No. 5, 2005 1831

STEM: A Software Tool for Large-Scale Proteomic Data Analyses

Recommend Documents