Computational Methods for Comparison of Large Genomic and

Mar 14, 2006 - Together, our results demonstrate that the BlastPro system can be used to compare large genomic and proteomic datasets to reveal import...
0 downloads 0 Views 215KB Size
Computational Methods for Comparison of Large Genomic and Proteomic Datasets Reveal Protein Markers of Metastatic Cancer Yingchun Wang, Rachel Hanley, and Richard L. Klemke* Department of Pathology and Moores Cancer Center, University of California, San Diego, Basic Science Building 1039, 9500 Gilman Drive, MC 0612, La Jolla, California 92093 Received November 8, 2005

Large-scale genomic and proteomic analysis has provided a wealth of information on biologically relevant systems, and the ability to analyze this information is crucial to uncovering important biological relationships. However, it has proven difficult to compare large datasets from different sources due to different gene and protein identifiers assigned by individual laboratories and database systems. Here, we describe the design of a fully automated blast program (BlastPro) that facilitates rapid comparison of large protein-protein, nucleotide-nucleotide, or nucleotide-protein datasets from numerous, independent studies. Using this system, we compared several published genomic and proteomic databases for proteins that are upregulated in highly motile, metastatic tumor cells. Analysis of five independent studies comprised of greater than 1 × 106 genomic sequences and greater than 1000 proteins revealed that the cytoskeletal-associated protein R-actinin is increased at both the mRNA and protein level in metastatic breast, prostate, and skin cancer cells. Interestingly, spatial analysis of R-actinin expression revealed that it is amplified 8-fold in the leading pseudopodium compared to the cell body compartment of migrating cells. These findings indicate that amplification of R-actinin and its localization to the leading pseudopodium are potential biomarkers of cancer progression to a more metastatic phenotype. Together, our results demonstrate that the BlastPro system can be used to compare large genomic and proteomic datasets to reveal important biological relationships including those associated with cancer progression. Keywords: cancer • metastasis • biomarkers • bioinformatics • proteomics • genomics • protein identification • pseudopodia • cell migration • data mining

Introduction Recent technological breakthroughs in genomics and proteomics methods have provided the means to generate a large reservoir of biological data.1,2 The complete sequencing of the human genome, DNA microarray approaches to profile gene expression levels, and the development of liquid chromatography coupled with tandem mass spectrometry have set the stage for large scale genome and proteome identification in complex cells and whole tissues.1,2 These large-scale approaches are now being used routinely in many fields to address physiologically important questions, including those related to human cancer.3,4 However, as more and more data are generated, computational systems that facilitate comparisons between diverse systems and across species will be needed to produce a coherent picture of physiological processes and human disease. Underlying the “omics” era is the sequence database, which is necessary for unequivocal gene and protein identifications as well as data mining and relational comparisons of intra- and inter-datasets. Nucleotide sequence databases or gene cluster databases such as GenBank or UniGene have been used for * Corresponding author. Phone: 858-822-5610. Fax: 858-822-4566. Email: [email protected]. 10.1021/pr050390u CCC: $33.50

 2006 American Chemical Society

microarray analysis. Protein sequence analysis can utilize multiple database sources, including NCBI RefSeq, Swiss-Prot/ TrEMBL, and ENSEMBL. However, a major problem with having multiple identification systems is that the same gene/ protein can be assigned multiple identifiers or accession numbers. Even within the same database, the same protein may be assigned different GI numbers due to versioning. For example, the latest GI number assigned to fibronectin 1 isoform 2 preproprotein is 47132547, while the previous one is 16933544. Although the identifiers can be cross-referenced or translated manually by searching Web-based databases, it makes comparing large genomic and proteomic datasets time-consuming and tedious. The data incompatibility increases when microarray and mass spectral-based proteomics datasets are collated or when datasets across species are compared. For example, the human focal adhesion kinase has the GI number 24476013 and RefSeq accession number NP_722560, whereas the same protein in mice has the GI number 6679741 and RefSeq NP_032008. Thus, the development of universal identifiers and algorithms that can integrate various datasets is necessary for biological applications of genomic and proteomic information. Multiple approaches have been introduced to solve the problem of data incompatibility. One solution is data standardization, as extensively discussed by the Proteomic StanJournal of Proteome Research 2006, 5, 907-915

907

Published on Web 03/14/2006

research articles dards Initiative HUPO 3rd annual congress.5 It was suggested that two or more identifiers be used to reference a given protein within an independent dataset.5 Another approach that has been suggested is to create a standard database that utilizes a universal identifier assigned to a particular gene or gene product. For example, the International Protein Index (IPI) database cross-referenced several main protein databases and assigned each protein from a particular organism a unique identifier, which is stably maintained with incremental versioning. While it is generally agreed that data standardization is one of the best ways to create data compatibility, practical development of such a system has been slow because it requires an organized, worldwide effort to be applicable for current and future data platforms. A more immediate approach is to create a relational database for all gene identifiers commonly described in public databases that can be used to match similar identifiers across data platforms, as exemplified by the MatchMiner program.6 This algorithm is fast at runtime, but the accuracy of target identification is still suboptimal because many different genes have very similar names or identifiers that cannot be discriminated by this system. Moreover, the MatchMiner algorithm does not provide cross-species or protein-nucleotide data comparison. Thus, there are currently no analysis systems that are sufficient for comparing large genomic and proteomic datasets. This has limited our ability to decipher the complex gene and protein signaling networks that regulate many diseases, such as cancer metastasis. The dissemination of cancer cells from the primary tumor (metastasis) is the major cause of death in cancer patients,7 and there are no pharmacological treatments available to target this complex process. Current evidence indicates that complex genetic and signaling networks control the migration machinery of metastatic cancer cells. Metastatic cells migrate away from the primary tumor, through the surrounding tissue and toward blood vessels. These invasive cells then protrude membrane processes (pseudopodia) that penetrate through 3.0 µm openings in the vessel wall,8,9 allowing metastatic cells to gain access to the vessel lumen and disseminate to distant tissues where they establish secondary tumors. Pseudopodium formation is critical for the successful transmigration of the cell through tissues and into the vessel lumen. This process involves actinmediated membrane extension and focal adhesion (FA) assembly at the front of the extending pseudopodium as well as FA disassembly at the rear of the cell body. Indeed, several FA and pseudopodial-associated proteins have been shown to be deregulated in human cancers10,11 and are thought to contribute to the spread of cancer in patients. Identification of key FA and pseudopodial-associated proteins that are deregulated in human cancers would provide powerful biomarkers of metastatic potential. While identification of true metastatic markers is crucial information needed by the clinician to design the appropriate course of treatment for the patient, large-scale computational comparisons of hundreds and even thousands of genetic and protein changes in cancer cells documented in interlaboratory data are necessary to uncover key genes and/or proteins that are deregulated in metastatic cancer cells. Therefore, we developed an automated blast program (BlastPro) that allows the comparative analysis of large proteomic and genomic datasets based on nucleotide and amino acid sequence identity. BlastPro provides data compatibility across all platforms, including nucleotide to protein and species to species, with a 908

Journal of Proteome Research • Vol. 5, No. 4, 2006

Wang et al.

high level of accuracy. Using BlastPro, we compared proteomic and genomic datasets from five independent studies that sought to identify markers of high metastatic cancers. Our comparison uncovered several known as well as novel metastatic markers that were identified in all five studies. The cytoskeletal protein R-actinin was found to be strongly amplified in metastatic breast, prostate, and skin cancers and was highly enriched in the pseudopodia of invasive cells. Our findings demonstrate how the BlastPro system can be used to identify biomarkers of human disease progression.

Materials and Methods Software. Stand-alone Blast was downloaded from NCBI, at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/. The user interface and the automation of stand-alone Blast were programmed using HTML, Perl, and Javascript. Apache server was used to run the Web server. Genomics and Proteomics Data for Comparison. Protein identifiers in all datasets were copied from online, published papers or supplemental data. The sequences were automatically retrieved using the sequence retrieval system provided by BlastPro as described below. Purification of Pseudopodia and Western Blot Analysis. The pseudopodia were purified from COS-7 cells as previously described.12 Briefly, cells were serum-starved overnight and then allowed to attach for 2 h to the top of a 3-µm porous filter coated on both sides with fibronectin in a 6-well Transwell plate. Pseudopodia growth was stimulated for 1 h by adding 100 ng/mL LPA to the lower chamber of the well. Cells were then fixed with ice-cold methanol. Cell bodies were wiped off the top of the filter with a cotton swab, and pseudopodia were harvested from the bottom of the filter with 1% SDS lysis buffer. Similarly, cell bodies were harvested with 1% SDS lysis buffer after pseudopodia were wiped off the bottom of the filter. A 20 µg lysate sample from each fraction was separated with 4-20% gradient SDS-PAGE gel, and R-actinin was detected by Western blot using anti-R-actinin antibody (Chemicon, Temecula, CA).

Results Datasets and the BlastPro User Interface. BlastPro compares two datasets, one designated as the “query” and one as the “database.” Users can compare any genomic or proteomic datasets as long as the sequences are available for the corresponding genes and/or proteins. A dataset can be entered as a list of protein identifiers from NCBI RefSeq, GeneBank, and UniProt (Swiss-Prot/TreEMBL), and BlastPro will automatically retrieve the sequences from the respective remote server. Alternatively, the user can select a local file in fasta format as a dataset or even enter sequences directly, and BlastPro will use these sequences rather than visiting a remote server. To meet the requirements for blast analysis, the fasta sequence file used as the “database” will be further formatted by the formatdb program, which is part of the stand-alone blast and is fully automated and embedded in the BlastPro system. Users can specify the similarity and expect value (E-value) thresholds for each BlastPro search. The similarity is the percent of alignment between two sequences, and the E-value is a statistically derived indicator of significance that reflects the size of the database and the scoring system used, where the lower the E-value, the more significant the hit.13 BlastPro will only output hits that have a similarity greater than or equal to and an E-value less than or equal to the user-specified

Genomic and Proteomic Data Comparison System

Figure 1. Schematic of the BlastPro system. (Step 1) To compare two datasets containing different nucleotide or protein identifiers, the BlastPro system accepts input as a list of protein identifiers or sequences (simply copied and pasted into the text area provided by the user interface) or a file in fasta format. (Step 2) The sequence retrieval system then retrieves nucleotide or protein sequences from corresponding databases. Note that the BlastPro sequence retrieval system will not visit remote databases if sequences are already available (i.e., if sequences or fasta files were inputted). Thus, users may choose to use any batch sequence retrieval tool prior to running a BlastPro analysis. (Step 3) The database maker creates a fasta file to save the retrieved sequences. One of the fasta files will be used as the “query”, and the other as the “database”. The database file will be further formatted by the formatdb program, a component of stand-alone blast. For nucleotide-protein comparison, the nucleotide sequences must be the query and the protein sequences must be the database. For all other comparisons, either dataset can be the query or database. (Step 4) Stand-alone blast then blasts the query file against the database. (Step 5) The output of the blast will be inputted to the parser. (Step 6) If the criteria set by the user are met, the identifiers representing sequences that have significant alignment will be retrieved by the parser, and the information will be outputted to a table with each identifier hyperlinked to the raw blast output file. (Step 7) Users may trace back to the raw blast output by a simple click on the hyperlinked identifier in the formatted table.

thresholds. Additionally, users may specify a file name in which to save the search results or leave the field blank and BlastPro will display the results to an HTML file. BlastPro Program Design. BlastPro was built on a Web server for use on the Internet and is composed of four major parts: (1) the Stand-alone Blast, provided by NCBI; (2) the Parser, which parses the output from Stand-alone Blast and retrieves the blast hit(s) with the highest similarity; (3) the Sequence Retrieval System, which retrieves nucleotide/protein sequences from Web-based, remote databases; (4) the Database Maker, which generates fasta format sequence databases for blast analysis (Figure 1). To compare two datasets, the sequence corresponding to the identifiers in each dataset will be automatically retrieved from either remote or local databases by the sequence retrieval system (Figure 1, Step 1 and 2). If remote sequence retrieval is required, BlastPro utilizes NCBI Batch Entrez for NCBI identifiers and a custom search algorithm for other remote databases. The retrieved sequences will be saved by the Database Maker in two fasta-formatted files, one representing the query and one representing the database (Figure 1, Step 3). The sequences in the query file will be blasted against the sequences in the

research articles database file using the Stand-alone Blast (Figure 1, Step 4). Protein-protein comparisons are accomplished with the blastp component of Stand-alone Blast, which determines the similarity between protein sequences. This program also performs peptide-peptide comparisons using different arguments. Similarly, blastn is used for nucleotide-nucleotide comparison and blastx for nucleotide-protein comparison. The scoring matrix PAM30 is used for peptide-peptide comparison, and BLOSUM62 is used for all other comparisons. Since the inputted data represents actual protein or nucleotide sequences, the SEG filter, which employs sequence masking (“XXXXX” or “NNNNN” in sequence alignment) to filter low complexity or repeat regions,13 is turned off in all comparisons to increase the accuracy of the similarity values. Turning off the SEG filter is especially important for peptide comparisons because many peptides are low complexity per se. All other arguments are set to the NCBI default values, including a word-size of 2 for peptide and 3 for other comparisons. After the query sequences have been blasted against the database, the parser will open the output file generated by the stand-alone blast and examine the hits for each queried protein (Figure 1, Step 5). If the sequence alignment has a higher similarity and a lower E-value than the user-specified thresholds, then the sequences will be recognized as matching in both datasets and the identity of the sequence and related information will be retrieved by the parser. These cutoffs must be used together to prevent false-positive hits. For example, a short, partial sequence alignment may have 100% similarity, but the high E-value will indicate that it is not a true match. Similarly, a long sequence alignment may have a low E-value but a low similarity. The sequence alignment (bit) score, another statistical measure of alignment, is also included as a reference criterion, as a higher bit score generally indicates a higher sequence similarity if the proteins being compared are the same size. Usually, the first hit has the highest similarity. Using these criteria, the parser will retain a maximum of 10 sequence alignments per queried protein. This limit reduces the output file size while still retaining the most significant results. The hit with the highest similarity is outputted to a table (Figure 1, Step 6). Because a queried protein may produce significant alignment with more than one protein, a hyperlink to an HTML file containing the blast result for each queried protein is provided in the table. Thus, the user can click on the protein ID in the table to trace back to the blast results and see the other proteins in the database that produce significant sequence alignment with the query protein as well as the detailed information for each sequence alignment (Figure 1, Step 7). Currently, the BlastPro system and the Apache server are installed on an Internet-accessible laptop with a 1.6 GHz Pentium M CPU. When Batch Entrez is used, 1000 protein sequences can be retrieved within 2 min. BlastPro takes about 5 min to compare two existing databases containing 1000 protein sequences each. Similarly, comparing 100 proteins whose sequences must be retrieved from remote databases to an existing database containing 1000 protein sequences can be completed within 5 min. Nucleotide comparisons require 3-6 times longer because blastx compares all six possible reading frames from query to database sequences. BlastPro will be accessible through the cell migration consortium gateway (http://www.cellmigration.org), and the source code is available upon request. Journal of Proteome Research • Vol. 5, No. 4, 2006 909

research articles Biological Interpretation of BlastPro Results. Biological knowledge is critical to specifying an appropriate similarity and E-value threshold for a given BlastPro comparison. Though BlastPro is capable of identifying all matching proteins and/or nucleotides, whether within or across species, from two datasets by using less stringent cutoffs, biological interpretation may be required to determine the significance of such results. For example, although human paxillin and Entamoeba histolytica paxillin share only 33% overall similarity in protein sequence, they are likely to be functional homologues since they possess the same functional domains, such as multiple zinc-binding domains and a Lim domain.14,15 However, it is difficult to determine whether two hypothetical proteins are functional homologues if they share only 33% similarity in sequence and have no common predicted functional domains. Importantly, though, users can distinguish matches between members of the same gene family from the identical gene or a homologue in a different species by using the BlastPro traceback function. This function allows users to access the raw output of stand-alone blast, which contains a brief description of the identified proteins or genes. To test the efficiency of identifying matching proteins or nucleotides within and across species, we compared 50 randomly chosen human proteins and their corresponding mRNAs from eight different species. The 400 protein and 400 nucleotide sequences were saved as two fasta files (data not shown). As expected, BlastPro identified human paxillin in the protein dataset as the best match with human paxillin in the nucleotide dataset. When we removed human paxillin from the protein dataset, BlastPro identified mouse paxillin as the best match to human paxillin. For all protein-protein, nucleotide-nucleotide, and protein-nucleotide comparisons, BlastPro identified the identical protein as the best match as well as all seven homologues through its trace-back hyperlink. Using human paxillin as a representative query, we analyzed the Score and E-value of cross-species comparisons between protein and nucleotide datasets (Table S1, Supporting Information). If two homologues shared high similarity, then the difference in E-values between protein-protein and nucleotide-nucleotide comparison was small. However, if the similarity of two homologues was low, then E-values could vary significantly. For example, comparison of human and Drosophila paxillin produced an E-value of e-139 for proteinprotein comparison and 5e-30 for nucleotide-nucleotide comparison, given the same-sized database. Therefore, if we compare two datasets containing human and Drosophila paxillin, respectively, using an E-value of e-50 as the threshold, the program will identify the match if the two datasets are protein identifiers, but not if the two datasets are nucleotide identifiers. The similarity threshold needs to be adjusted for optimal results as well. For example, the similarity between Drosophila and human paxillin is 57%, so BlastPro will identify a match with a similarity cutoff set at 50% but not at 60%. Differences in E-value and similarity between protein-protein and nucleotide-nucleotide comparisons occur for two primary reasons. One is the degeneracy of genetic code; that is, an amino acid can be encoded by different codons so that two proteins from different species having 100% similarity in sequence may have only 80% similarity in nucleotide sequence. The other is that homologues across species may have a different number or location of introns in gene sequences, which may generate gaps during pairwise sequence comparison that do not exist in corresponding protein sequences. Nucleotide-protein com910

Journal of Proteome Research • Vol. 5, No. 4, 2006

Wang et al.

parisons generate the same values as protein-protein comparisons because the blastx program translates the nucleotide sequence into its amino acid sequence before performing the comparison. On the basis of these results, we suggest using a lower threshold to perform initial nucleotide-nucleotide comparisons followed by subsequent higher thresholds to filter the false-positive hits. For protein-protein and nucleotide-protein comparisons, empirically, a similarity cutoff of 50% and an E-value cutoff of e-50 can identify all matching proteins from two datasets, except for phylogenetically distant species, which may require lower similarity and higher E-value thresholds. In Silico Identification of Metastatic Genes and Proteins. The biochemical identification of focal adhesion (FA) proteins and their spatial and temporal organization is crucial to understanding how metastatic cells assemble and disassemble adhesive structures during cell invasion. Recently, FA proteins were identified and quantified in large-scale by de Hoog et al. using stable isotope labeling with amino acids in cell culture (SILAC) and LC-MS/MS.16 Proteins that co-immunoprecipitated with the known FA proteins paxillin, vinculin, and talin were identified in either attached cells or suspended cells which display assembled and disassembled FA structures, respectively. Therefore, proteins bound to FA complexes in attached cells are also likely to facilitate formation of new focal adhesions in the leading pseudopodia of metastatic cells.12 Conversely, proteins associated with FA proteins in suspension cells are likely to facilitate disassembly of FA complexes at the rear of the cell body during invasion and dissemination. To analyze the spatio-temporal regulation of proteins during metastasis, we developed a novel method to fractionate and independently purify the pseudopodium and cell body compartments of migratory cells.12 Large-scale proteomic analysis and comparison of these structures by Lin et al. uncovered several important proteins that are enriched in the pseudopodium and play a role in cancer progression, including Lasp-1, R-actinin, thymosin, and peroxiredoxin.17 Many of the proteins, including Lasp-1 and R-actinin, are found in focal adhesions. Therefore, we wanted to identify all potential FA proteins present in the purified pseudopodium and cell body compartments of migratory cells, as these proteins are likely altered in metastatic cells. The FA proteins identified by de Hoog et al.16 and the pseudopodium and cell body proteins identified by Lin et al.17 were copied from the supplemental data of each paper, and sequences were retrieved as described. Using an E-value less than e-50 and a similarity greater than 50%, we compared the complete list of FA-associated proteins with the pseudopodium- and cell body-unique proteins (Tables 1 and 2, respectively). We used a similarity cutoff of 50% rather than 100% to ensure that protein homologues between human MRC5 cells (FA protein data) and mouse NIH 3T3 cells (pseudopodiumand cell body-unique protein data) were identified. BlastPro identified 23 FA proteins that are uniquely present or have homologues in the pseudopodium (Table 1). Of these, 13 proteins show high homology between their sequences, including eight proteins with 100% similarity and two highly conserved isoforms of β-tubulin with 98% and 96% similarity, respectively (Table 1). BlastPro identified 44 FA proteins that are the same or highly homologous to proteins present in the cell body (Table 2). Of these, 27 proteins display 100% similarity. The remaining proteins are either the same proteins with less than 100% similarity or homologues, subunits, or isoforms of a given protein.

research articles

Genomic and Proteomic Data Comparison System Table 1. Focal Adhesion Proteins Uniquely Present in Pseudopodia acc1a

acc2b

Score (bits)c

E-valued

similarity (%)e

query namef

113272

P04270

764

0

100

Actin, R cardiac

113278 1170955

P02571 P14174

758 237

0 3.00E-65

100 100

127144

P16475

309

9.00E-87

100

γ-Actin, cytoplasmic 2 Macrophage migration inhibitory factor (MIF) Myosin light chain alkali, nonmuscle isoform (MLC3 nm)

548453 121027 133875

Q06830 P25388 P17075

412 662 237

e-117 0 2.00E-65

100 100 100

Q9H3F4 135459 6686256

Q9H3F4 P05217 P80723

602 893 431

e-174 0 e-123

100 98 96

135471 IPI00026138

P05218 IPI00026138

878 412

0 e-117

96 95

13641563

XP_017419

404

e-115

88

20547036

XP_065127

223

7.00E-61

79

6166599

P35579

2999

0

77

3183544

P11940

960

0

76

112695

P29312

354

e-100

74

115269

P02452

2277

0

70

P17858

P17858

1129

0

70

113001 19855162

P21333 P08123

3791 1831

0 0

69 64

115351

P05997

2040

0

64

115313

P20908

2117

0

56

Peroxiredoxin 1 RACK1 40S ribosomal protein S20 Ribosomal protein L5 Tubulin β-2 chain Brain acid-soluble protein 1 Tubulin β-5 chain Similar to ribosomal protein S3a similar to L-lactate dehydrogenase A chain similar to 60S ribosomal protein L5 Myosin heavy chain, nonmuscle type A Polyadenylate-binding protein 1 (PABP1) 14-3-3 protein ζ/δ Collagen R 1(I) chain precursor 6-phosphofructokinase, liver type Filamin A Collagen R 2(I) chain precursor Collagen alpha 2(V) chain precursor Collagen R 1(V) chain precursor

subject nameg

adhered/floating ratioh

cardiac muscle R actin proprotein actin, γ 1 propeptide macrophage migration inhibitory factor smooth muscle and nonmuscle myosin alkali light chain isoform 1 peroxiredoxin 1 RACK1 ribosomal protein S20 ribosomal protein L5 tubulin, β-4 brain acid-soluble protein 1 tubulin, β4 ribosomal protein S3a lactate dehydrogenase A

1.4 (P) 2.1 (V), 0.2 (T) 1.2 (V)

ribosomal protein L5

ID (V)

myosin heavy chain 10, nonmuscle poly A binding protein, cytoplasmic 4 tyrosine 3-monooxgenase R 1 type II collagen isoform 1 phosphofructokinase, muscle filamin B, β R1 type II collagen isoform 2, preproprotein R 1 type II collagen isoform 1 collagen, type V, R 3 preproprotein

1.3 (V), 0.6 (P)

0.9 (V), 0.4 (T) 1.2 (P), 1.3 (V), 0.4 (T) ID (V)

0.9 (V, P)

ID (P) 1.7 (P), 2.2 (V) ID (T) 1.6 (P) ID (P) ID (V)

1.5 (V), ID (T) 0.9 (P) 1.6 (P) ID (T) 1.0 (P), ID (T) 0.9 (P) ID (P, T) 1.1 (P)

a The accession number from the database dataset. If a list of original protein identifiers (e.g., Swiss-Prot) was used to retrieve sequences from NCBI using Batch Entrez, then the NCBI GI number for each protein will be listed in addition to the original protein identifier. b The protein identifier from the query dataset. In most cases, these identifiers are exactly the same as the user-inputted identifiers, unless the original protein identifiers are obsolete in current public databases. c The highest score of the blast result for a query sequence. d The lowest E-value of the blast result for a query sequence. e The highest similarity of the blast result for a query sequence. f The description of the sequence in the query dataset (protein or nucleotide name).16 g The description of the sequence in the user database dataset (protein or nucleotide name).17 h The adhered/floating ratio from the original paper.16 The binding proteins are indicated as P for paxillin, V for vinculin, T for talin. ID indicates that the protein was identified, but no quantitative information was obtained.

We next wanted to identify FA proteins that are increased in attached cells and are uniquely present in pseudopodia. If a protein was coprecipitated with one of the three focal adhesion proteins with a ratio higher than 1.2 in attached cells versus suspended cells (adhered/floating ratio), then the level of the protein was considered to be increased in FA structures. BlastPro identified nine such proteins, including the cytoskeletal-associated proteins actin, myosin, tubulin, and RACK 1, which are known to play an important role in cell spreading and migration (Table 1). The increased expression and association of these proteins in FA and pseudopodia suggest that they play a role in FA formation at the leading edge of the extending membrane. Using the same criteria, we also identified 23 proteins that have an increased level in FA structures from attached cells and enriched in cell body compartment, including heteroge-

neous nuclear ribonucleoprotein, ribosomal protein L38, proline- and glutamine-rich splicing factor, and Ras-GTPaseactivating protein binding protein (Table 2). The localization of these proteins to FA structures in the rear of a migrating cell suggests that they promote FA adhesion maturation and/ or disassembly. Identification of Motility-Related Proteins That Can Be Used as Biomarkers for Tumor Metastasis. DNA microarray3 or massively parallel signature sequencing (MPSS)18,19 approaches have been used successfully to discover gene signatures that are differentially expressed in highly metastatic cells relative to matched low metastatic cells. These experiments typically measure the expression level of large numbers (>106) of genes at the mRNA level. Therefore, we wanted to compare the pseudopodial proteins to genes that have been reported to be altered in the highly metastatic prostate tumor cell line Journal of Proteome Research • Vol. 5, No. 4, 2006 911

research articles

Wang et al.

Table 2. Focal Adhesion Proteins Uniquely Present in the Cell Body acc1

acc2

Score (bits)

E-value

similarity (%)

1350762 133021 133978 130853

Q02878 P18124 P10660 P20618

572 496 488 481

e-165 e-142 e-140 e-138

100 100 100 100

118090

P23284

421 e-120

100

21903462 34395930 134000 112803

P26373 P22492 P23821 P08195

417 386 376 1049

e-119 e-109 e-106 0

100 100 100 100

22002063 P36578 119339 P06733 125731 P13010

852 0 861 0 1453 0

100 100 100

Q07065

Q07065

1155 0

100

1730139

P51114

1238 0

100

123648

P11142

1274 0

100

585911

Q07244

953 0

100

P52272

P52272

1467 0

100

131528

P26599

1041 0

100

14916572 Q13283

953 0

100

1709851

P23246

1500 0

100

125729

P12956

1209 0

100

135471 1172991 464628

P05218 P46778 P35268

122098 P62988 132936 125962 20178296

P02304 P62988 P23411 P02545 P14618

206 151 139 1086 1047

130353 O60506

P15259 O60506

514 e-148 1262 0

907 0 331 2.00E-93 257 3.00E-71 7.00E-56 1.00E-39 7.00E-36 0 0

100 100 100 100 100 100 99 99

98 98

135459 P05217 55977767 P08670

890 0 874 0

97 97

Q9BUX9 Q8TB01

Q9BUX9 Q8TB01

399 e-113 975 0

96 96

3183544

P11940

O14979

O14979

1000 0

80

411 e-117

69

14916999 P11021

803 0

66

Q9Y6M1

718 0

65

912

Q9Y6M1

query name

60S ribosomal protein L6 60S ribosomal protein L7 40S ribosomal protein S6 Proteasome subunit β type 1 Peptidyl-prolyl cis-trans isomerase B precursor (PPIase) (Rotamase) 60S ribosomal protein L13 Histone H1t 40S ribosomal protein S7 4F2 cell-surface antigen heavy chain (4F2hc) 60S ribosomal protein L4 R enolase ATP-dependent DNA helicase II, 80 kD subunit Cytoskeleton-associated protein 4 Fragile X mental retardation syndrome related protein 1 Heat shock cognate 71 kDa protein Heterogeneous nuclear ribonucleoprotein K (hnRNP K) Heterogeneous nuclear ribonucleoprotein M (hnRNP M) Polypyrimidine tract-binding protein 1 (PTB) (Heterogeneous nuclear ribonucleoprotein I) (hnRNP I) Ras-GTPase-activating protein binding protein 1 Splicing factor, proline-and glutamine-rich Thyroid-lupus autoantigen, DNA helicase II, 70 kDa subunit Tubulin β-5 chain 60S ribosomal protein L21 60S ribosomal protein L22 Histone H4 Ubiquitin 60S ribosomal protein L38 Lamin A/C Pyruvate kinase, isozymes M1/M2 (Pyruvate kinase muscle isozyme) Phosphoglycerate mutase 2 Heterogeneous nuclear ribonucleoprotein Q (hnRNP Q) Tubulin β-2 chain Vimentin Tubulin R-3 Similar to cytoskeletonassociated protein 4 Polyadenylate-binding protein 1 (Poly(A)-binding protein 1) (PABP 1) Heterogeneous nuclear ribonucleoprotein D-like (HNRPDL protein) 78 kDa glucose- regulated protein precursor (GRP 78) Hepatocellular carcinoma autoantigen

Journal of Proteome Research • Vol. 5, No. 4, 2006

subject name

adhered/floating ratio

ribosomal protein L6 ribosomal protein L7 ribosomal protein S6 proteasome β 1 subunit

ID (P) ID (P) 1.2 (P) 0.4 (V)

peptidylprolyl isomerase B precursor

0.5 (V), ID (P)

ribosomal protein L13 H1 histone family, member T ribosomal protein S7 solute carrier family 3

1.0 (P) ID (P) 1.2 (P) 1.2 (P)

ribosomal protein L4 enolase 1 ATP-dependent DNA helicase II cytoskeleton-associated protein 4 fragile X mental retardation-related protein 1 heat shock 70 kDa protein 8 isoform 1 heterogeneous nuclear ribonucleoprotein K isoform b heterogeneous nuclear ribonucleoprotein M isoform a polypyrimidine tract-binding protein 1 isoform c

1.2 (P) 0.6 (P) 1.3 (P)

Ras-GTPase-activating protein SH3-domainbinding protein splicing factor proline/ glutaminerich thyroid-lupus autoantigen p70

5.3 (V), ID (P, T)

0.9 (P) 1.7 (V) 0.9 (P) 2.4 (V), 0.5 (T) ID (P) 2.4 (P)

1.8 (P), 4.1 (V), 0.2 (T) 0.7 (P)

tubulin, β-5 ribosomal protein L21 ribosomal protein L22 proprotein germinal histone H4 ubiquitin B precursor ribosomal protein L38 lamin A/C isoform 2 pyruvate kinase, muscle; Pyruvate kinase-3

1.6 (P) ID (P) ID (P)

Phosphoglycerate mutase 2 NS1-associated protein 1

ID (V) 4.7 (V)

tubulin, β5 vimentin

1.7 (P), 2.2 (V) 3.2 (P), 4.8 (V), 0.5 (T) 1.2 (P) 0.7 (P), ID (T)

tubulin, R3 cytoskeleton-associated protein 4 polyA-binding protein, cytoplasmic homologue

0.9 (P) 0.8 (V) 2.2 (P) 2.4 (P) 1.4 (P)

1.5 (V)

heterogeneous nuclear 3.4 (V) ribonucleoprotein D isoform b heat shock 70 kDa protein 8 isoform 1 insulin-like growth factor 2, binding protein 3

1.0 (P) 0.8 (V)

research articles

Genomic and Proteomic Data Comparison System Table 2 (Continued) acc1

acc2

Score (bits)

Q9BRB1

Q9BRB1

487

e-139

61

Hypothetical protein

121978 13124797

P20671 Q15233

147 515

5.00E-38 e-148

60 59

Q9Y687

Q9Y687

361

e-102

55

113950

P07355

337

1.00E-94

54

Histone H2A 54 kDa nuclear RNA- and DNA-binding protein (p54(nrb)) M-phase phosphoprotein homologue Annexin A2

E-value

similarity (%)

query name

CL1 relative to its low metastatic counterpart LnCAP.19 From 5 million MPSS signatures, 966 genes were found to be overexpressed in CL1 cells. Using Batch Entrez from NCBI, we retrieved the distinct nucleotide sequences for these 966 genes from GeneBank and saved them in fasta format in one minute. This file was used as the query and was compared to the pseudopodium-unique protein database described above.17 BlastPro identified 24 genes that are overexpressed in highly metastatic CL1 cells and enriched in the pseudopodium (Table S2, Supporting Information). Interestingly, the majority (62.5%) of the hits are cytoskeletal-associated proteins previously associated with cell migration, including actin, tubulin, thymosin β-10, cofilin, Lasp-1, and R-actinin (Figure 2A). The original proportion of cytoskeletal proteins in the pseudopodium-unique proteins was 33.6%, which supports the conclusion that pseudopodium-unique cytoskeletal proteins are more likely to be upregulated in high metastatic cancer cells. Interestingly, PARP1, peroxiredoxin 1, 14-3-3 protein, and R-actinin have been previously associated with cancer progression.20-24 In addition, when we used the trace-back function provided by BlastPro, we found that peroxiredoxin 2 is also enriched in the pseudopodium. Recent evidence has shown that peroxiredoxin 2 can regulate PDGF signaling and suppress cell migration.25 The enrichment of peroxiredoxin 1 and 2 in leading pseudopodia of migrating cells suggests that it is a negative regulator of pseudopodium formation and cell migration. Metastatic cancer cells may have lost the ability to regulate

Figure 2. Comparison of pseudopodium-enriched proteins and genes expressed in CL1 cells. The sequences of pseudopodiumenriched proteins were compared with sequences of genes that are up-regulated (A) or down-regulated (B) in CL1 cells. Proteins/ genes identified in both datasets were functionally grouped as cytoskeletal, signaling, and other proteins.

subject name

DEAD (Asp-Glu-Ala-Asp) box polypeptide 48 H2A histone family, member Y2 splicing factor proline/ glutamine rich

adhered/floating ratio

1.9 (V) 0.6 (P) 1.5 (V), ID (P)

M-phase phosphoprotein 4

ID (T)

annexin A1

1.3 (P), 2.2 (V)

these proteins, resulting in aberrant pseudopodium formation and cell migration. We also compared proteins found in both CL1 cells and the pseudopodium to FA proteins identified by de Hoog et al.16 Actin γ1 propeptide, tublin-β 4, myosin alkali light chain, peroxiredoxin 1, ribosomal protein L5 and 14-3-3 protein , and R-actinin were found to be increased under these conditions. These proteins may regulate FA turnover and pseudopodial adhesion during cell migration. The 1022 genes that are down-regulated in CL1 relative to LnCaP cell were also compared with pseudopodium-enriched proteins. These genes may serve as important suppressors or negative regulators of cell migration and invasion. BlastPro identified 16 pseudopodium-enriched proteins or corresponding homologues that were down-regulated in CL1 cells. Of these, only caldesmon and nonmuscle myosin heavy chain ten are cytoskeletal-associated proteins (Figure 2B and Table S3, Supporting Information). No focal adhesion proteins were identified under these conditions. We note that the original proportion of cytoskeletal proteins in the pseudopodiumunique proteins and the proportion in the comparison results decreased from 33.6% to 12.5%. Therefore, only a small proportion of the pseudopodium-unique cytoskeletal proteins are down-regulated in metastatic cells, which suggests that these proteins serve as negative regulators in metastasis. Together, this comparative analysis reveals that the majority (62.5%) of proteins up-regulated in metastatic CL1 cells and enriched in the pseudopodium are components of the cytoskeleton. We also compared pseudopodium-enriched proteins with four different datasets of genes that are overexpressed in various types of metastatic cancer cells, including breast cancer, adenocarcinoma, mesothelioma, and melanoma cells.3,17,26-28 Comparison of these datasets with BlastPro revealed that the R-actinin family of proteins (1, 3, 4) are enriched in pseudopodia and overexpressed in metastatic tumor cells (Figure S1, Supporting Information). These data suggest that R-actinin localizes to the pseudopodium and is commonly deregulated in metastatic cells or tumor microenviroement. Indeed, previous reports have associated these proteins with tumorigenesis and cancer metastasis.29-31 Quantitative Western blot analysis confirmed that R-actinin was increased approximately 8-fold in the pseudopodium compared to the cell body (Figure 3). Thus, R-actinin is a strong candidate to be a biomarker of metastatic potential in many types of cancer. Together, these findings demonstrate the feasibility of using BlastPro to compare large genomic and proteomic datasets to reveal markers of metastatic cells, such as R-actinin. Translational Analysis of mRNAs That Are Up-Regulated in Highly Metastatic Prostate Cells. While gene expression profiles can provide biomarkers for prognosis, mRNA expresJournal of Proteome Research • Vol. 5, No. 4, 2006 913

research articles

Figure 3. Quantitative detection of R-actinin family proteins. R-Actinin from the cell body (CB) and pseudopodium (Pseud) of migrating COS-7 cells was detected by Western blot. Erk2 was used as a loading control.

sion does not always correlate with protein expression within a cell.32 Furthermore, many diseases result from post-translational modifications of proteins which occur independently of mRNA changes. Therefore, changes in mRNA expression profiles need to be validated by protein analysis methods before a particular protein is classified as a potential marker of cancer progression. This presents a particular problem, since it is technically challenging to perform large-scale validation on hundreds of proteins that have been predicted to be altered by gene array technology. This is also important for validation of hypothetical genes present in the human genome because it is not known whether these genes are translated into real protein products. However, the expression of many hypothetical gene products has already been documented using proteomic-based approaches. Clearly, a computational system is needed that can compare large mRNA databases to existing protein expression databases. Using BlastPro, we can easily compare changes in mRNA levels to their corresponding protein products, which have been identified in related proteomic studies. For example, we compared the 966 genes overexpressed in CL1 cells (Query) to the quantitative proteomics dataset consisting of 271 proteins found to be overexpressed in the highly metastatic prostate cancer cell line (PC3M-LN4) compared to its low metastatic counterpart PC3M cells.4 Sixty-two genes or their close homologues were matched between the two datasets, representing 23% of the up-regulated proteins in the PC3MLN4 cells (Figure 4; Table S4, Supporting Information). Of the 62 overexpressed proteins, 19 (30.6%) are translational machinery proteins, including ribosomal proteins and translation

Figure 4. Genes up-regulated in high metastatic prostate cancer at the mRNA and protein level. When BlastPro was used to compare changes in mRNA levels in CL1 cells to proteins that are overexpressed in PC3M-LN4 cells, 62 genes were found to be up-regulated. These 62 matches include not only genes with 100% similarity in translated protein sequence between the two datasets, but also those with up to 50% similarity, such as the ribosomal protein L28. We include genes with lower similarity because some nucleotide sequences in the query represent only part of the entire coding region of the gene. Also, the gene may be alternatively spliced or sequence data may be from a different species, which can decrease the similarity score. 914

Journal of Proteome Research • Vol. 5, No. 4, 2006

Wang et al.

initiation factors (Figure 4; Table S4, Supporting Information). Considering that the proportion of translational machinery proteins in the original query dataset is 28.2%, it is possible that the high proportion of translational machinery proteins in the comparison is due to their high original proportion. However, a high percentage of translation machinery proteins were also identified in other migratory cells.33 Therefore, we believe that the overexpression of translational proteins may be needed to maintain the high-expression level of metastasisrelated proteins present in invasive cancer cells. Also, five cytoskeletal proteins, including tubulin isoforms R-1, β-1, and β-2, annexin, and dynein, are up-regulated in both metastatic cell lines (Figure 4). Interestingly, 8.1% of the hypothetical proteins were matched between these two datasets, indicating that these genes are translated into bona fide cellular proteins. We also compared these 62 matched genes to the pseudopodium-enriched protein database. Tubulin β-2, RuvB-like2, Poly(ADP-ribose) polymerase-1, peroxiredoxin 1, 14-3-3 protein , and lactate dehydrogenase B are also enriched in the pseudopodia of migrating cells, suggesting that these proteins are involved in mediating pseudopodium function in migratory cells. Together, these findings demonstrate that BlastPro can be used as a tool to assist in the validation of known and hypothetical proteins uncovered by large-scale gene array studies.

Discussion The BlastPro system we describe provides an automated tool for biologists to compare interlaboratory proteomics or genomics data. It is an automation of stand-alone blast and takes advantage of blastp, blastn, and blastx for comparison of protein, peptide, and nucleotide data. However, the BlastPro system also includes an automated database generator and a parser to automatically retrieve the most significant information from the output of stand-alone blast. The biggest advantage of this system is that it is user- and database-independent. Users do not need to acquire any bioinformatics or computer science knowledge to perform BlastPro analyses. The database generator can create a query and a database file in fasta format using a list of protein identifiers inputted by the user (as simple as copy and paste). This system not only supports proteins identified in multiple, public biological sequence databases, but also supports GeneBank nucleotide accession numbers and UniGene clusters. Theoretically, any data format can be supported by this system as long as the sequence is available. Comparisons between protein and protein, nucleotide and nucleotide, and nucleotide and protein are all supported. Furthermore, the user interface accepts a list of protein or nucleotide identifiers as well as sequence files in fasta format as input. Thus, users can analyze any data stored in fasta format, including prior experimental or published data, or even other BlastPro comparison results. We note that this system is designed to be used by biologists. A comparison of proteins may identify matches with less than 100% similarity. To determine whether these matches are significant, the user still needs to be knowledgeable in biology, especially when identifying homologues from different organisms. The parser can accurately identify proteins in two datasets that have the highest similarity and output them in a formatted table. Because the parser employs user-specified threshold values and the formatted output provides convenient hyperlinks to the raw blast output for detailed information, BlastPro allows biologists to determine the significance of blast hits. One of the major applications of this system is comparing genomics and proteomics data to find proteins that could be

research articles

Genomic and Proteomic Data Comparison System

biomarkers for tumor prognosis and therapy. When this approach was used, the cytoskeletal protein R-actinin was predicted to be a marker of metastatic cancers. Western blot analysis reveals that this protein is also enriched in the pseudopodia of migrating cells, which may indicate its involvement in cancer metastasis. BlastPro can also identify genes that are up-regulated at both mRNA level and protein level in highly metastatic cells. The expression patterns of these up-regulated genes may serve as signatures of metastatic potential. Generally, this system can be applied to any research that uses genomics and proteomics approaches to screen for disease-related biomarkers or to find perturbed gene expression patterns in response to environmental cues. Although computer algorithms, such as MatchMiner,6 exist that translate identifiers of a gene or protein from one database to another, these algorithms only perform exact matches and are not flexible enough to meet all of the data mining needs of current genomics and proteomics research. First, a gene may have multiple splice isoforms that are each assigned a unique identifier and hence cannot be identified using the exact match approach. Second, the exact match approach cannot accomplish comparisons of nucleotide and protein databases since the identifiers of a gene and a protein are always different. Last, important information can be gained by comparing interlaboratory datasets that address the same biological question but use different organisms, which is also beyond the ability of the exact match approaches. However, BlastPro is built on NCBI’s blast program, which compares genes or proteins based on their sequences. Genes or proteins from different datasets with different identifiers can be matched accurately based on their sequence similarity, whether they are exactly the same, different splice isoforms, or from different organisms. The comparison of nucleotides to proteins can also be accomplished by comparing the translated amino acid sequence from the nucleotide with the protein sequence. Therefore, the BlastPro program represents a significant advance in data mining for current genomic and proteomic research.

Acknowledgment. This work is supported by a grant from the Susan G. Komen Foundation (to Y.W.), GM068487 and CA097022 (to R.L.K.), and by the Cell Migration Consortium (GM064346). Supporting Information Available: Figure showing the comparison of pseudopodium-enriched proteins with four different datasets of genes; tables showing the Score and E-value of cross-species comparison between protein and nucleotide datasets, the genes up-regulated and down-regulated in CL1 cells, and the genes up-regulated in high metastatic prostate cancer at both the mRNA and protein levels. This material is available free of charge via the Internet at http:// pubs.acs.org. References (1) Schena, M.; Shalon, D.; Davis, R. W.; Brown, P. O. Science 1995, 270, 467-470. (2) Washburn, M. P.; Wolters, D.; Yates, J. R., III, Nat. Biotechnol. 2001, 19, 242-247. (3) Clark, E. A.; Golub, T. R.; Lander, E. S.; Hynes, R. O. Nature 2000, 406, 532-535. (4) Everley, P. A.; Krijgsveld, J.; Zetter, B. R.; Gygi, S. P. Mol. Cell. Proteomics 2004, 3, 729-735. (5) Orchard, S.; Hermjakob, H.; Binz, P. A.; Hoogland, C.; Taylor, C. F.; Zhu, W.; Julian, R. K., Jr.; Apweiler, R. Proteomics 2005, 5, 337339.

(6) Endocrinol, E. J.; Bussey, K. J.; Kane, D.; Sunshine, M.; Narasimhan, S.; Nishizuka, S.; Reinhold, W. C.; Zeeberg, B.; Ajay, W.; Weinstein, J. N. GenomeBiology 2003, 4, R27. (7) Liotta, L. A. Nature 2001, 410, 24-25. (8) Wyckoff, J. B.; Jones, J. G.; Condeelis, J. S.; Segall, J. E. Cancer Res. 2000, 60, 2504-2511. (9) Weis, S.; Cui, J.; Barnes, L.; Cheresh, D. J. Cell Biol. 2004, 167, 223-229. (10) Dorssers, L. C.; Grebenchtchikov, N.; Brinkman, A.; Look, M. P.; van Broekhoven, S. P.; de Jong, D.; Peters, H. A.; Portengen, H.; Meijer-van Gelder, M. E.; Klijn, J. G.; van Tienoven, D. T.; GeurtsMoespot, A.; Span, P. N.; Foekens, J. A.; Sweep, F. C. Clin. Cancer. Res. 2004, 10, 6194-6202. (11) Recher, C.; Ysebaert, L.; Beyne-Rauzy, O.; Mansat-De Mas, V.; Ruidavets, J. B.; Cariven, P.; Demur, C.; Payrastre, B.; Laurent, G.; Racaud-Sultan, C. Cancer Res. 2004, 64, 3191-3197. (12) Cho, S. Y.; Klemke, R. L. J. Cell Biol. 2002, 156, 725-736. (13) Wootton, J. C.; Federhen, S. Methods Enzymol. 1996, 266, 554571. (14) Loftus, B.; Anderson, I.; Davies, R.; Alsmark, U. C.; Samuelson, J.; Amedeo, P.; Roncaglia, P.; Berriman, M.; Hirt, R. P.; Mann, B. J.; Nozaki, T.; Suh, B.; Pop, M.; Duchene, M.; Ackers, J.; Tannich, E.; Leippe, M.; Hofer, M.; Bruchhaus, I.; Willhoeft, U.; Bhattacharya, A.; Chillingworth, T.; Churcher, C.; Hance, Z.; Harris, B.; Harris, D.; Jagels, K.; Moule, S.; Mungall, K.; Ormond, D.; Squares, R.; Whitehead, S.; Quail, M. A.; Rabbinowitsch, E.; Norbertczak, H.; Price, C.; Wang, Z.; Guillen, N.; Gilchrist, C.; Stroup, S. E.; Bhattacharya, S.; Lohia, A.; Foster, P. G.; Sicheritz-Ponten, T.; Weber, C.; Singh, U.; Mukherjee, C.; El-Sayed, N. M.; Petri, W. A., Jr.; Clark, C. G.; Embley, T. M.; Barrell, B.; Fraser, C. M.; Hall, N. Nature 2005, 433, 865-868. (15) Salgia, R.; Li, J. L.; Lo, S. H.; Brunkhorst, B.; Kansas, G. S.; Sobhany, E. S.; Sun, Y.; Pisick, E.; Hallek, M.; Ernst, T.; et al. J. Biol. Chem. 1995, 270, 5039-5047. (16) de Hoog, C. L.; Foster, L. J.; Mann, M. Cell 2004, 117, 649-662. (17) Lin, Y. H.; Park, Z. Y.; Lin, D.; Brahmbhatt, A. A.; Rio, M. C.; Yates, J. R., III; Klemke, R. L. J. Cell Biol. 2004, 165, 421-432. (18) Brenner, S.; Johnson, M.; Bridgham, J.; Golda, G.; Lloyd, D. H.; Johnson, D.; Luo, S.; McCurdy, S.; Foy, M.; Ewan, M.; Roth, R.; George, D.; Eletr, S.; Albrecht, G.; Vermaas, E.; Williams, S. R.; Moon, K.; Burcham, T.; Pallas, M.; DuBridge, R. B.; Kirchner, J.; Fearon, K.; Mao, J.; Corcoran, K. Nat. Biotechnol. 2000, 18, 630634. (19) Lin, B.; White, J. T.; Lu, W.; Xie, T.; Utleg, A. G.; Yan, X.; Yi, E. C.; Shannon, P.; Khrebtukova, I.; Lange, P. H.; Goodlett, D. R.; Zhou, D.; Vasicek, T. J.; Hood, L. Cancer Res. 2005, 65, 3081-3091. (20) Althaus, F. R. Oncogene 2005, 24, 11-12. (21) Bryant, H. E.; Schultz, N.; Thomas, H. D.; Parker, K. M.; Flower, D.; Lopez, E.; Kyle, S.; Meuth, M.; Curtin, N. J.; Helleday, T. Nature 2005, 434, 913-917. (22) Fella, K.; Gluckmann, M.; Hellmann, J.; Karas, M.; Kramer, P. J.; Kroger, M. Proteomics 2005, 5, 1914-1927. (23) Hermeking, H. Nat. Rev. Cancer 2003, 3, 931-943. (24) Shen, J.; Person, M. D.; Zhu, J.; Abbruzzese, J. L.; Li, D. Cancer Res. 2004, 64, 9018-9026. (25) Choi, M. H.; Lee, I. K.; Kim, G. W.; Kim, B. U.; Han, Y. H.; Yu, D. Y.; Park, H. S.; Kim, K. Y.; Lee, J. S.; Choi, C.; Bae, Y. S.; Lee, B. I.; Rhee, S. G.; Kang, S. W. Nature 2005, 435, 347-353. (26) Wang, W.; Goswami, S.; Lapidus, K.; Wells, A. L.; Wyckoff, J. B.; Sahai, E.; Singer, R. H.; Segall, J. E.; Condeelis, J. S. Cancer Res. 2004, 64, 8585-8594. (27) Hegmans, J. P.; Bard, M. P.; Hemmes, A.; Luider, T. M.; Kleijmeer, M. J.; Prins, J. B.; Zitvogel, L.; Burgers, S. A.; Hoogsteden, H. C.; Lambrecht, B. N. Am. J. Pathol. 2004, 164, 1807-1815. (28) Celis, J. E.; Gromov, P.; Cabezon, T.; Moreira, J. M.; Ambartsumian, N.; Sandelin, K.; Rank, F.; Gromova, I. Mol. Cell. Proteomics 2004, 3, 327-344. (29) Honda, K.; Yamada, T.; Seike, M.; Hayashida, Y.; Idogawa, M.; Kondo, T.; Ino, Y.; Hirohashi, S. Oncogene 2004, 23, 5257-5262. (30) Menez, J.; Le Maux Chansac, B.; Dorothee, G.; Vergnon, I.; Jalil, A.; Carlier, M. F.; Chouaib, S.; Mami-Chouaib, F. Oncogene 2004, 23, 2630-2639. (31) Otey, C. A.; Carpen, O. Cell Motil. Cytoskeleton 2004, 58, 104111. (32) Gygi, S. P.; Rochon, Y.; Franza, B. R.; Aebersold, R. Mol. Cell. Biol. 1999, 19, 1720-1730. (33) Jia, Z.; Barbier, L.; Stuart, H.; Amraei, M.; Pelech, S.; Dennis, J. W.; Metalnikov, P.; O’Donnell, P.; Nabi, I. R. J. Biol. Chem. 2005, 280, 30564-30573.

PR050390U Journal of Proteome Research • Vol. 5, No. 4, 2006 915