Human Tissue Profiling with Multidimensional Protein Identification Technology Gerard Cagney,*,†,‡ Stephen Park,†,§ Clement Chung,| Bianca Tong,§ Colm O’Dushlaine,§ Denis C. Shields,§ and Andrew Emili| Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland, Royal College of Surgeons in Ireland, 123 St Stephen’s Green, Dublin 2, Ireland, and Banting and Best Department of Medical Research and Program in Proteomics and Bioinformatics, University of Toronto, Toronto, Ontario, Canada M5G 1L6 Received February 14, 2005
Profiling of tissues and cell types through systematic characterization of expressed genes or proteins shows promise as a basic research tool, and has potential applications in disease diagnosis and classification. We used multidimensional protein identification protein identification technology (MudPIT) to analyze proteomes for enriched nuclear extracts of eight human tissues: brain, heart, liver, lung, muscle, pancreas, spleen, and testis. We show that the method is approximately 80% reproducible. We address issues of relative abundance, tissue-specificity, and selectivity, and the significance of proteins whose expression does not correlate with that of the corresponding mRNA. Surprisingly, most proteins are detected in a single tissue. These proteins tend to fulfill specialist (and potentially tissuespecific) functions compared to proteins expressed in two or more tissues. Keywords: proteomics • mass spectrometry • bioinformatics • tissue specificity
Introduction The publication of the DNA sequence of the human genome in 20011 was both an opportunity and a challenge for systems biology research. For the first time, the genetic information required to direct the production of every human protein was available as a single comprehensive unit, yet this achievement has only highlighted how little we know about the functions and expression patterns (the expressed proteome) of the myriad of predicted proteins in the various cell types, tissues and organ systems of the body.2 [Note, we distinguish between those proteins that may potentially be expressed in a cell ‘proteome’ and those that are actually expressed ‘expressed proteome’ as described in ref 2]. Not only does our current lack of understanding of the information encoded in the DNA sequence fail to identify the complete set of open-reading frames (ORFs) comprising the proteome, we cannot accurately predict how protein diversity is multiplied through alternative splicing or modification. Furthermore, the proteome will differ in each cell type, a product of a complex series of regulatory processes. These include cell-specific control of gene expression by transcription factors,3 chromatin modeling,4 genetic imprinting,5 and also extracellular mediated signaling events such as hormone action6 and cell-cell contact.7 Large-scale gene expression studies using oligonucleotide microarrays represent a major conceptual and practical milestone in systems biology because temporal and experimental changes in the cell can be * To whom correspondence should be addressed. Fax: (353) 716-6962. E-mail:
[email protected]. † These authors contributed equally. ‡ Conway Institute, University College Dublin. § Royal College of Surgeons in Ireland. | University of Toronto. 10.1021/pr0500354 CCC: $30.25
2005 American Chemical Society
studied on a cell, tissue or organism scale. However, there are many proteins for which mRNA and protein expression do not correlate,8-10 and there is therefore an urgent need to develop methods that follow protein expression and processing in human tissues as function of development, physiology (stimulus/ response) and disease. Developments in protein mass spectrometry (MS) now allow for the identification of proteins to be carried out rapidly and accurately.11 For many forms of MS-based experiment, the protein (or peptide) to be identified must be presented to the mass spectrometer in relatively pure form, either as an intact protein, or as an endoproteolytic digest product of trypsin or another suitable enzyme. This has typically been accomplished by separating a starting protein mixture using one- or twodimensional (2D) polyacrylamide gel electrophoresis. These gelbased approaches have been moderately successful in characterizing the proteomes of different organisms, tissues and organelles,12,13 but are generally insufficient to resolve the very complex proteome of a typical human tissue, which may have 10 000 distinct protein species present and where the relative levels of the most- to least-abundant protein may vary by as much as 9 orders of magnitude.14 Gel-based methods are also prone to biases against lower abundance, hydrophobic, and lower molecular weight proteins. In contrast, gel-free fractionation procedures, whereby peptides are separated by liquid-chromatography prior to MS, appear to be less prone to these limitations.15,16 The MudPIT (MultiDimensional Protein Identification Technology) approach developed by Yates and colleagues17,18 uses two orthogonal chromatographic separations to increase the resolution of proteome characterization at least 10-fold. Peptides generated Journal of Proteome Research 2005, 4, 1757-1767
1757
Published on Web 07/07/2005
research articles by a tryptic digest of a complex protein mixture are separated first by charge using strong cation exchange media, and then by hydrophobicity using reverse phase chromatography, the two dimensions relying on largely independent physicochemical properties of peptides. Miniaturization and reduced-scale (capillary-columns and sub-microliter flow rates) combined with automated data-driven tandem mass spectrometry systems markedly enhance the sensitivity of detection. Importantly, the use of MudPIT permits identification of proteins present far below the levels of the high abundance promiscuous ‘housekeeping’ proteins that tend to dominate proteomic expression profiling studies using 2D-PAGE. Here, we use MudPIT to characterize a nuclear-enriched protein fraction isolated from a selection of key human organs in an effort to access the level of proteome diversity in different tissues. We hypothesized that proteome profiling can be used to classify different tissue functional characteristics, thus allowing the approach be used to gain insight into functional specification, to examine changes in individual cell types during development, disease, or in response to a drug or experimental treatment, or to classify unknown cell types, such as metastases arsing from an unknown source. We have also examined the relationship between the genome, the transcriptome (the expressed genes), as determined by DNA-microarray based screening, and the expressed proteome (the expressed proteins), as detected by gel-based and gel-free forms of MS-based expression profiling, and compare our results with previously published datasets obtained for different human tissues. To our knowledge, this is the most comprehensive attempt to define the sub-proteome of these tissues to date, offering a framework for evaluating and interpreting gel-free proteomic profiling methodologies for investigating fundamental aspects of human cell biology and serving as a starting point larger more systematic examinations of proteome adaptations.
Experimental Procedures Source of Tissue Extracts. Nuclear-enriched soluble protein extracts were obtained from a commercial supplier (Active Motif, www.activemotif.com). The tissues were obtained from the National Disease Research Interchange (NDRI) Philadelphia, PA. NDRI Policy & Ethics states “NDRI supplies materials that would otherwise be discarded, including tissue specimens from surgery and autopsy, as well as donated organs that do not meet established criteria for human transplantation” (http://www.ndriresource.org/html/policy.htm). Nuclei were prepared according to the Dignam method19 and initial processing of the tissues (before trypsin digestion) was carried out in a safety hood. Protein Fractionation and Identification by MudPIT. 150 µg of total protein from each fraction was precipitated overnight with 5 volumes of ice-cold acetone followed by centrifugation at 21 000 × g for 20 min. The pellet was solubilized in 8M urea, 50mM Tris-HCl, pH 8.5 at 37 °C for 2 h and reduced by the addition of 1 mM DTT for 1 h at room temperature followed by carboxyamidomethylation with 5 mM iodoacetamide for 1 h at 37 °C. The samples were diluted to 4 M urea with 50 mM ammonium bicarbonate, pH 8.5 and digested with a 1:150 molar ratio of endoproteinase Lys-C at 37 °C overnight. The next day the mixtures were further diluted to 2 M urea with 50 mM ammonium bicarbonate, pH 8.5, supplemented with CaCl2 to a final concentration of 1 mM, and incubated overnight with Poroszyme trypsin beads at 30 °C with rotating. The resulting peptide mixtures were solid phase-extracted with SPEC-Plus 1758
Journal of Proteome Research • Vol. 4, No. 5, 2005
Cagney et al.
PT C18 cartridges (Ansys Diagnostics, Lake Forest, CA) according to the manufacturer’s instructions and stored at -80 °C until use. A fully automated 15-cycle 30-h MudPIT chromatographic procedure was set up essentially as described previously.18,20,21 Briefly, an HPLC quaternary pump was interfaced with an LCQ DECA XP ion trap tandem mass spectrometer (ThermoFinnigan, San Jose, CA). 150-µm-inner diameter fused silica capillary microcolumns (Polymicro Technologies, Phoenix, AZ) were pulled to a fine tip using a P-2000 laser puller (Sutter Instruments, Novato, CA) and packed with 10 cm of 5-µm Zorbax Eclipse XDB-C18 resin (Agilent Technologies, Ontario, Canada) and then with 6 cm of 5-µm Partisphere strong cation-exchange resin (Whatman). Samples were loaded manually onto separate columns using a pressure vessel. The chromatography was carried out as described in Wolters and co-workers.21 Database Searching and Gene Expression Analysis. The SEQUEST algorithm22 was used to search a database populated with nonredundant mammalian Swiss-Prot and TrEMBL protein sequences.23 A probability-based evaluation algorithm, STATQUEST,20 was used for filtering of all putative matches based on a maximum P value threshold corresponding to 90% likelihood of correct identification. Of 1980 high confidence proteins identified, 267 were removed after reciprocal BLAST analysis revealed redundancy (for mouse-human reciprocal best-hit pairs with BLAST expectation value of transcriptome
proteome < transcriptome
SODM_HUMAN ANX1_HUMAN THIL_HUMAN CAH1_HUMAN HPT1_HUMAN QOR_HUMAN FIBB_HUMAN MCA1_HUMAN GTA1_HUMAN IF3A_MOUSE FUMH_HUMAN DECR_HUMAN A2MG_HUMAN VTDB_HUMAN ICAL_HUMAN FIBG_HUMAN KAD1_HUMAN ACDM_HUMAN DHCA_HUMAN CFAB_HUMAN THRB_HUMAN CSR2_HUMAN ELNE_HUMAN PRN3_HUMAN S3B1_MOUSE MAT3_HUMAN PI52_MOUSE Q9BT58 DDH1_HUMAN Q99JF8 Q922Y7 PMG1_MOUSE EF1G_MOUSE VIME_HUMAN ECH1_HUMAN NUCL_HUMAN CYPB_MOUSE HS27_HUMAN CATD_HUMAN RS3A_MOUSE PDX1_HUMAN ENPL_HUMAN FRIH_HUMAN LDHA_HUMAN PGBM_HUMAN IDHP_HUMAN ZYX_HUMAN ALC1_HUMAN RS3_MOUSE TYPH_HUMAN RTN4_HUMAN AATM_HUMAN RL13_HUMAN PP1A_HUMAN Q91V31 TCPH_MOUSE A2M1_HUMAN RL7_MOUSE DAG1_HUMAN RL4_HUMAN D3D2_HUMAN PTE2_HUMAN SERA_MOUSE
brain
heart
liver
high
high high high high
high high high high
high high high high
lung
spleen
testis
high
high high high high high high high
high high
high high high
high high
pancreas
high
high high high
high
high low
high high
high
high
high high high
high high
high high high high high
high high high
high high high high high high low
low low low
low low low
low low
low low low low
low low
low low low low low low
low low
low
high low
low low low low low low low low low low low low
low low
low low
low low low low
low low low low
low low
high high high high high high
high high high low
low low low
high high
high high
low low low low low low
low low low low low low low low low low
high high high
high
high
high high low low low low
high
low
low low low
low low low
a Proteins whose proteomes and transcriptomes were among 20% most discordant in two or more tissues are listed. ‘High’ indicates that the protein was so ranked when considering proteins whose abundance is greater than expected based on its transcriptome; ‘Low’ indicates the opposite.
is an effective method for identifying tissue-specific proteins and is therefore useful for functional investigation of the biochemical adaptations associated with tissue samples. Comparison of the Transcriptome and Proteome. Because systematic studies of gene expression patterns have been reported for multiple tissues using oligonucleotide-based mi-
croarrays, we next compared the patterns of the protein expression detected by MudPIT with the tissue-patterns of the corresponding cognate mRNA transcripts. Although there is no reason to assume that mRNA and protein expression levels would correlate for a restricted compartment like the nucleus, we were interested in determining whether the extent of Journal of Proteome Research • Vol. 4, No. 5, 2005 1763
research articles
Cagney et al.
Figure 6. Functional classification of the measured proteome. The functional profile (Gene Ontology annotations) of the tissue-specific proteome is distinct from proteins present in the proteome of more than one tissue. The number of proteins annotated with keywords indicating regulatory function was plotted to compare regulatory and housekeeping activities of the single and common datasets.
selectivity/specificity seen in the proteome experiments exceeds that seen in transcriptional studies of mammalian cells, and second to quantify the extent of correlation between mRNA and proteomic surveys across tissues. We used the Chi-square test as a measure for confirming that proteins identified in a tissue were also likely to be present in the corresponding proteome of that same tissue (Table 1). Indeed, this was the case for all tissues, although the correlation for testis was lower than observed for the other tissues. Pairwise comparisons taking results from one proteome measurement (for one tissue) and one transcriptome experiment were also performed. For all tissues except spleen, two transcriptome datasets were available, and these were averaged. For five of the seven proteome datasets, the strongest correlation observed was that between the proteome and transcriptome datasets for the same tissue (Table 1). For the remaining two, the spleen proteome data was found to correlate most strongly with the lung transcriptome data, while the testis proteome data correlated best with the liver transcriptome data. In these two cases, the difference between the highest and next-to-highest score is much lower than for the other tissues, suggesting either that (a) for spleen and testis, the measured proteome is not as specific as for other tissues, or that (b) the correlation between the measured transcriptome and proteome is not strong for these two tissues. Interestingly, a diverse range of mRNA transcript patterns have frequently been observed during synthesis of testes cDNA libraries (Derek Murphy, personal communication). To obtain a more holistic sense of the global similarities and differences between the two datasets, a clustering approach was used to compare the transcriptome and proteome data from all tissues (Figure 7). Both “omes” cluster independently, indicating that transcript profiles from different tissues are generally more alike than the transcriptome and proteome profiles recorded from a single tissue. Within the transcriptome 1764
Journal of Proteome Research • Vol. 4, No. 5, 2005
Figure 7. Comparison of human transcriptome and nuclear proteomes from different tissues. Proteins were arranged by normalized MudPIT score and gene expression array scores using hierarchical agglomerative clustering.
and proteome domains, however, certain tissues tended to cluster together, for instance lung and spleen. Conversely, the pair of tissues most distantly related from all other tissues, liver and pancreas, were found to be so using both platforms. Other features that are visible on the cluster heat-map diagram are sets of genes that are coordinately expressed in all tissues as both mRNA and protein, sets of genes that are more prominently unregulated in the transcriptome but not the proteome,
research articles
Human Tissue Profiling
and large numbers of mRNAs and proteins that appear to be expressed exclusively in liver, which is metabolically highly active. Although there is a broad correlation between mRNA and protein expression patterns when viewed across all genes, for individual genes the correspondence may be low.37 This may be because our data is enriched only for the nuclear compartment. However, genes whose expressed cognate transcription and translation products do not correlate are of special interest since they may be indicative of protein regulation by posttranslational mechanisms, for instance by targeted protein degradation. (Another important form of post-translational regulation, reversible enzymatic modification by phosphate, carbohydrate, or lipid moieties, is not addressed in this study.) For the proteomic dataset, we classified genes into ‘correlation classes’, including those whose cognate mRNA and protein expression characteristics were tightly correlated (top 20% by regression analysis using a nonparametric statistic) versus those where mRNA and protein expression were most discordant (bottom 20%). Outliers whose protein expression was significantly higher than expected based on the corresponding mRNA levels might include long-lived proteins and, as a class, might be expected to be less common than outliers whose mRNA expression was higher than would be expected based on protein expression. This latter class may contain proteins subject to proteolytic regulation, or perhaps regulation by transport into or out of the nucleus, to another organelle or the plasma membrane. Proteins found among these respective classes in two or more of the tissues examined are listed in Table 2. Many ‘proteome-high’ proteins are associated with circulation (e.g., fibrinogen, complement factors) and probably represent traces of blood contamination of the tissue during sample preparation. However, superoxide dismutase (SODM_HUMAN), an enzyme known to exhibit high turnover, was likewise classified as ‘proteome-high’ in six of the seven comparable tissues, while a Crystallin protein (QOR_HUMAN), from a family of proteins with long half-lives, was similarly classified in brain, liver, and spleen. Conversely, proteins classed as ‘proteome-low′ include highly regulated proteins such as ribosome components and chaperones (Table 3). In general, proteins in the ‘proteome-high’ group display a slightly lower (albeit statistically insignificant) average Instability Index (-0.364 ( 0.346), a measure of one type of protein instability,38 than proteins in the ‘proteome-low’ group (-0.407 ( 0.339). Gene products that are discordant in terms of mRNA and protein expression, but only in a single tissue, are also of interest because they may point to tissue-selective and tissuespecific control processes. Supplementary Table 4 (see the Supporting Information) lists proteins that were discordant at the transcription/proteome levels where the principle discordance was observed predominantly in one tissue. Examples of this class include several major histocompatibility antigens that were identified as ‘transcriptome-high’ uniquely in spleen (HLAF_HUMAN, HLAE_HUMAN, HG2A_HUMAN, O19617, 1A01_HUMAN) and several cytochrome P450-family related enzymes preferentially ‘transcriptome-high’ in liver (CP4Y_HUMAN, CPE1_HUMAN, NCPR_HUMAN). In contrast, examples of ‘Transcriptome-low’ proteins included the tightjunction protein occludin (OCLN_HUMAN) and chloride intracellular channel 2 (CLI2_HUMAN) in lung. This subset merits further study because they are potentially important in determining or regulating tissue-selective and tissue-specific biological functions.
Discussion We used a powerful proteomics method (MudPIT) to analyze the proteomes of enriched nuclear extracts for eight tissues: brain, heart, liver, lung, muscle, pancreas, spleen, and testis. Comparing the expressed proteins in different tissues is a step toward elucidating the biochemical properties of various major tissue systems and understanding the molecular basis of physiology, metabolism, and disease. It also generates a resource for comparing mRNA and protein expression data across these tissues. We showed using mouse liver tissue that the proteome profiling approach is approximately 80% reproducible. Ideally, at least three replicates should be carried out to ensure each tissue is equivalently analyzed. Because we had insufficient sample, we used the same protocol for both mouse and human samples. While this means that data concerning individual proteins cannot be considered definitive, analysis of the protein fractions in toto may yield fruitful results. To our knowledge, no comprehensive comparative analysis of the human nuclear proteome from different tissues has been carried out to date. Although the proteomes of many tissues have been analyzed by MALDI mass spectrometry following separation of the proteome by 2D gel electrophoresis (www.expasy.org/ch2d/), these generally result in the identification of about 100-200 proteins due to severe limitations in overall sensitivity and dynamic range. The use of 2D gels provides additional valuable information, such as evidence of post-translation processing and modification and the presence of isoform variants of individual proteins. On the other hand, gel-free profiling methods provide far greater proteome coverage, and offer a more robust platform for determining the relative abundance of proteins. Studies that have characterized over 1000 proteins from tissue or even entire organisms using MudPIT-type analysis include the proteomes of yeast,18 rice,39 rat brain,40 the human mammary epithelial cell line HMEC 184 AIL5,41 and various mouse organs,20,42 and the malaria parasite.43 Another analysis of the malaria parasite that identified over 1289 proteins used separation by 1D gel electroporesis combined with LC-MS/MS.44 A previously published comparative analysis of six transformed human cell lines derived from kidney (HEK293), brain (SKNBE2), colon (SW480), liver (HepG2), and cervix (HeLa, HeLaS3) used 1D gel electrophoresis and LC-MS/MS.45 The target samples were soluble cytoplasmic extracts, except for the HEK293 cells where 970 proteins were identified in a cytoplasmic fraction and 976 in a nuclear fraction, including 553 overlapping identifications. Interestingly, these authors identified only 104 proteins shared in common among the total of 1543 proteins found in the six cell lines. This cannot entirely be attributed to inefficient or biased sampling because the authors showed through repeat analyses that typical experiment reproducibility was 60-70%. The ratio may be influenced by the range of proteins identified in different fractions, ranging from 260 proteins detected in the HeLa cell line to 976 proteins in HEK293 (in our study, the range of tissue specific proteins varied from 192 in muscle to 616 in liver). Surprisingly, the majority of proteins identified were found in only one tissue. Although we show that the MudPIT-nuclear proteome is more extensive and more representative of the predicted in silico proteome than the 2D-nuclear proteome, we cannot rule out that this observation is due to undersampling. Our results suggest that with a single analysis approximately 20% of proteins might be missed in each tissue. We also provide evidence that under-sampled proteins are Journal of Proteome Research • Vol. 4, No. 5, 2005 1765
research articles likely to be the least abundant. Given these caveats, there still seems to be a very large proportion of proteins found in a single tissue. Two recent analyses of gene expression across diverse human tissues found that ∼6% of genes were expressed in all tissues, although higher proportions were observed across several tissue combinations.34,35 Several authors have shown that whereas mRNA and protein expression correlates well for highly expressed proteins,46 the relationship was an unreliable predictor of protein expression for many or the majority of proteins examined.8-10 Although we examined enriched nuclear fractions, which are likely to have distinct composition to the complete cell, proteins for which mRNA and protein expression are discordant were targeted in this paper because the differential regulation may imply important function at the protein level. Abbreviations: MudPIT, multidimensional protein identification protein identification technology; ORF, open-reading frame; MS, mass spectrometry; 2D, two-dimensional; GO, gene ontology; MIL, mean intron length; MALDI, matrix-assisted laser desorption ionization.
Acknowledgment. This work was supported by grants from the Science Foundation of Ireland (02/IN.1/B117 and UREKA to G.C.), the Enterprise Ireland Research Innovation Fund (to DS, S.P.), and the Natural Science and Engineering Research Council of Canada and Genome Canada (to A.E.). G.C. and D.S. were also partly supported by the Program for Research in Third Level Institutions administered by the Higher Education Authority of Ireland. Supporting Information Available: Supplementary Tables 1-4. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Lander, E. S. et al., International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature 2001, 409, 860-921. (2) Greenbaum, D.; Jansen, R.; Gerstein, M. Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts. Bioinformatics 2002, 18, 585596. (3) Hochheimer, A.; Tjian, R. Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression. Genes Dev. 2003, 17, 1309-1320. (4) Lusser, A.; Kadonaga, J. T. Chromatin remodeling by ATPdependent molecular machines. Bioessays 2003, 25, 1192-2000. (5) Verona, R. I.; Mann, M. R.; Bartolomei, M. S. Genomic imprinting: intricacies of epigenetic regulation in clusters. Annu. Rev. Cell Dev. Biol. 2003, 19, 237-259. (6) Meijer, O. C.; Karssen, A. M.; de Kloet, E. R. Cell- and tissuespecific effects of corticosteroids in relation to glucocorticoid resistance: examples from the brain. J. Endocrinol. 2003, 178, 13-18. (7) Friedl, P.; Brocker, E. B. The biology of cell locomotion within three-dimensional extracellular matrix. Cell Mol. Life Sci. 2000, 57, 41-64. (8) Anderson, L.; Seilhamer, J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 1997, 18, 533-537. (9) Gygi, S. P.; Rochon, Y.; Franza, B. R.; Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell Biol. 1999, 19, 1720-1730. (10) Chen, G.; Gharib, T. G.; Huang, C.-C.; Taylor, J. M. G.; Misek, D. E.; Kardia, S. L. R.; Giordano, T. J.; Iannettoni, M. D.; Orringer, M. B.; Hanash, S. M.; Beer, D. G. Discordant protein and mRNA expression in lung adenocarcinomas. Mol. Cell Proteomics 2002, 1.4, 304-313.
1766
Journal of Proteome Research • Vol. 4, No. 5, 2005
Cagney et al. (11) Lin, D.; Tabb, D. L.; Yates, J. R., 3rd Large-scale protein identification using mass spectrometry. Biochim. Biophys. Acta 2003, 1646, 1-10. (12) Huber, L. A.; Pfaller, K.; Vietor, I. Organelle proteomics. Implications for subcellular fractionation in proteomics. Circ. Res. 2003, 92, 962-968. (13) Taylor, S. W.; Fahy, E.; Ghosh, S. Global organellar proteomics. Trends Biotechnol. 2003, 21, 82-88. (14) Omenn, G. S. The Human Proteome Organization Plasma Proteome Project pilot phase: Reference specimens, technology platform comparisons, and standardized data submissions and analyses. Proteomics 2004, 4, 1235-1240. (15) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198-207. (16) Kislinger, T.; Emili, A. Going global: protein expression profiling using shotgun mass spectrometry. Curr. Opin. Mol. Ther. 2003, 5, 285-293. (17) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvick, B. M.; Yates, J. R., 3rd Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 1999, 17, 676-682. (18) Washburn, M. P.; Wolters, D.; Yates, J. R. III Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001, 19, 242-247. (19) Dignam, J. D.; Lebovitz, R. M.; Roeder, R. G. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 1983, 11, 14751489. (20) Kislinger, T.; Rahman, K.; Radulovic, D.; Cox, B.; Rossant, J.; Emili, A. PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals. Mol. Cell Proteomics 2003, 2, 96-106. (21) Wolters, D. A.; Washburn, M. P.; Yates, J. R., 3rd. An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem 2001, 73, 5683-5690. (22) Eng, J. K.; McCormack, A. L.; Yates, J. R., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Mass Spectrom. 1994, 5, 976-989. (23) Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28, 45-48. (24) Lockhart, D. J.; Dong, H.; Byrne, M. C.; Follettie, M. T.; Gallo, M. V.; Chee, M. S.; Mittmann, M.; Wang, C.; Kobayashi, M.; Horton, H.; Brown, E. L. Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nat. Biotechnol. 1996, 14, 16751680. (25) Eisen, M. B.; Spellman, P. T.; Brown, P. O.; Bostein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 14863-14868. (26) Hartigan, J. A. Clustering Algorithms; John Wiley & Sons: New York, 1996. (27) Draghici, S.; Khatri, P.; Martins, R. P.; Oscategoryeier, G. C.; Krawetz, S. A. Global functional profiling of gene expression. Genomics 2003, 81, 98-104. (28) Zhang, B.; Schmoyer, D.; Kirov, S.; Snoddy, J. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 2004, 5, 16-23. (29) Castillo-Davis, C. I.; Mekhedov, S. L.; Hartl, D. L.; Koonin, E. V.; Kondrashov, F. A. Selection for short introns in highly expressed genes. Nat. Genet. 2002, 31, 415-418. (30) Cagney, G.; Amiri, S.; Premawaradena, T.; Lindo, M.; Emili, A. In silico proteome analysis to facilitate proteomics experiments using mass spectrometry. Proteome Sci. 2003, 1, 1-5. (31) Gygi, S.; Corthals, G.; Zhang, Y.; Rochon, Y.; Aebersold, R. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 9390-9395. (32) Washburn, M. P.; Ulaszek, R. R.; Yates, J. R., III Reproducibility of quantitative proteomic analyses of complex biological mixtures by multidimensional protein identification technology. Anal. Chem. 2003, 75, 5054-5061. (33) Durr, E.; Yu, J.; Krasinska, K. M,; Carver, L. A.; Yates, J. R.; Testa, J. E.; Oh, P.; Schnitzer, J. E. Direct proteomic mapping of the lung microvascular endothelial cell surface in vivo and in cell culture. Nat. Biotechnol. 2004, 22, 985-992. (34) Warrington, J. A.; Nair, A.; Mahadevappa, M.; Tsygabskaya, M.Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol. Genomics 2000, 2, 143-147.
research articles
Human Tissue Profiling (35) Hsiao, L. L.; Dangond, F.; Yoshida, T.; Hong, R.; Jensen, R. V.; Misra, J.; Dillon, W.; Lee, K. F.; Clark, K. E.; Haverty, P.; Weng, Z.; Mutter, G. L.; Frosch, M. P.; Macdonald, M. E.; Milford, E. L.; Crum, C. P.; Bueno, R.; Pratt, R. E.; Mahadevappa, M.; Warrington, J. A.; Stephanopoulos, G.; Stephanopoulos, G.; Gullans, S. R. A compendium of gene expression in normal human tissues. Physiol. Genomics 2001, 7, 97-104. (36) Bader, G. D.; Hogue, C. W. V. Analyzing yeast protein-protein interaction data obtained from different sources. Nat. Biotechnol. 2002, 20, 991-997. (37) Greenbaum, D.; Colangelo, C.; Williams, K.; Gerstein, M. Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 2003, 4, 117. (38) Varshavsky, A. The N-end rule: functions, mysteries, uses. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 12142-12149. (39) Koller, A.; Washburn, M. P.; Lange, B. M.; Andon, N. L.; Deciu, C.; Haynes, P. A.; Hays, L.; Schieltz, D.; Ulaszek, R.; Wei, J.; Wolters, D.; Yates, J. R., 3rd Proteomic survey of metabolic pathways in rice. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 1196911974. (40) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Yates, J. R. A method for the comprehensive proteomic analysis of membrane proteins. Nat. Biotechnol. 2003, 21, 532-538. (41) Jacobs, J. M.; Mottaz, H. M.; Yu, L.-R.; Anderson, D. J.; Moore, R. J.; Chen, W.-N. U.; Auberry, K. J.; Strittmatter, E. F.; Monroe, M. E.; Thrall, B. D.; Camp, II D. G.; Smith, R. D. 2004 Multidimensional proteome analysis of human mammary epithelial cells. J. Proteome Res. 2004, 3, 68-75.
(42) Pan, Y.; Kislinger, T.; Gramolini, A. O.; Zvaritch, E.; Kranias, E. G.; MacLennan, D. H.; Emili Identification of biochemical adaptations in hyper- or hypocontractile hearts from phospholamban mutant mice by expression proteomics. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 2241-2246. (43) Florens, L.; Washburn, M. P.; Raine, J. D.; Anthony, R. M.; Grainger, M.; Haynes, J. D.; Moch, J. K.; Muster, N.; Sacci, J. B.; Tabb, D. L.; Witney, A. A.; Wolters, D.; Wu, Y.; Gardner, M. J.; Holder, A. A.; Sinden, R. E.; Yates, J. R.; Carucci, D. J. A proteomic view of the Plasmodium falciparum life cycle. Nature 2002, 419, 520-526. (44) Lasonder, E.; Ishihama, Y.; Andersen, J. S.; Vermunt, A. M.; Pain, A.; Sauerwein, R. W.; Eling, W. M.; Hall, N.; Waters, A. P.; Stunnenberg, H. G.; Mann, M. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 2002, 419, 537-542. (45) Schirle, M.; Heurtier, M.-A.; Kuster, B. Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry. Mol. Cell Proteomics 2003, 2, 1297-1305. (46) Celis, J. E.; Kruhoffer, M.; Gromova, I.; Frederiksen, C.; Ostergaard, M.; Thykjaer, T.; Gromov, P.; Yu, J.; Palsdottir, H.; Magnusson, N.; Orntoft, T. F. Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. FEBS Lett. 2000, 480, 2-16.
PR0500354
Journal of Proteome Research • Vol. 4, No. 5, 2005 1767