ARTICLE pubs.acs.org/jpr
Large-Scale Protein Profiling in Human Cell Lines Using Antibody-Based Proteomics Linn Fagerberg,† Sara Str€omberg,‡ Adila El-Obeid,‡ Marcus Gry,† Kenneth Nilsson,‡ Mathias Uhlen,† Fredrik Ponten,‡ and Anna Asplund*,‡ †
Department of Proteomics, School of Biotechnology, AlbaNova University Center, KTH-Royal Institute of Technology, SE-10691 Stockholm, Sweden ‡ Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, SE-75185 Uppsala, Sweden
bS Supporting Information ABSTRACT: Human cancer cell lines grown in vitro are frequently used to decipher basic cell biological phenomena and to also specifically study different forms of cancer. Here we present the first large-scale study of protein expression patterns in cell lines using an antibody-based proteomics approach. We analyzed the expression pattern of 5436 proteins in 45 different cell lines using hierarchical clustering, principal component analysis, and two-group comparisons for the identification of differentially expressed proteins. Our results show that immunohistochemically determined protein profiles can categorize cell lines into groups that overall reflect the tumor tissue of origin and that hematological cell lines appear to retain their protein profiles to a higher degree than cell lines established from solid tumors. The two-group comparisons reveal well-characterized proteins as well as previously unstudied proteins that could be of potential interest for further investigations. Moreover, multiple myeloma cells and cells of myeloid origin were found to share a protein profile, relative to the protein profile of lymphoid leukemia and lymphoma cells, possibly reflecting their common dependency of bone marrow microenvironment. This work also provides an extensive list of antibodies, for which high-resolution images as well as validation data are available on the Human Protein Atlas (www.proteinatlas.org), that are of potential use in cell line studies. KEYWORDS: human cell lines, immunohistochemistry, proteomics, image analysis, hierarchical clustering
’ INTRODUCTION Human cancer cell lines constitute an important resource for biomedical research and are widely used as experimental model systems to understand cellular mechanisms. Research based on cell lines has led to the identification of multiple genetic and phenotypic alterations in cancer cells and the discovery and characterization of several cancer drugs.1 3 Although hematopoietic tumors represent only a minor percentage of all human tumors, the extensive panel of well-defined hematopoietic cancer cell lines has rendered the scientific importance of cell lines established from hematological malignancies disproportionately large. Today, well-characterized cell lines are available covering most major leukemias and lymphomas representing different stages of hematopoietic differentiation. Such cell lines have allowed for studies not otherwise possible with primary patient tumor samples.4 The in vitro microenvironment causes cell lines to exhibit significant alterations in the biological properties characteristic for the corresponding in vivo cell types. It has for example been shown that genes involved in proliferation and increased energy metabolism are up-regulated in cell lines, whereas genes encoding cell adhesion molecules and membrane-associated signaling molecules are down-regulated.5 Cell lines thereby share common r 2011 American Chemical Society
traits in terms of protein expression profiles regardless of cellular origin, an “in vitro cell line protein signature”. This consequently raises the questions of how similar cell lines of various origins are to each other in terms of gene/protein expression profiles, and accordingly how adequate specific cell lines are as model systems for defined cancer types. Large-scale expression profiling in combination with hierarchical clustering analysis can be used for describing the level of similarity/dissimilarity, as defined by their respective gene expression profiles, between different cell types. Previously, the gene expression spaces of both tissues6 and cell lines5 have been studied using mRNA transcript levels and large sets of microarray data, since global expression analyses on the protein level have long been hampered by a lack of specific probes for most proteins encoded by the human genome. To fill this void, an ongoing proteomics effort has been undertaken with the aim to generate specific antibodies toward the human proteome.7 Antibodies, such as antibodies to cell differentiation (CD) molecules, remain important tools in both research and clinical pathology, as specific probes used in immunohistochemistry and immunocytochemistry Received: March 21, 2011 Published: July 04, 2011 4066
dx.doi.org/10.1021/pr200259v | J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research (FACS) to identify and determine cellular distribution of protein expression. At present, 389 human CD antigens are available for characterization of surface molecules on leukocytes.8 The availability of specific probes targeting products from 50% of the protein coding genes9 now enables the creation of a publicly available cellular as well as subcellular protein atlas, based on immunohistochemistry and immunofluorescence analysis of human tissues and cells.10,11 In addition, immunohistochemistrybased protein profiling has enabled a description of the proteomic landscape in human tissues, according to which normal cell types could be subdivided into groups that harmonize well with the current concepts of histology and cellular differentiation.12 On the global level, a large fraction of proteins (>65%) was expressed in most of the analyzed cells and tissues. The present study describes an approach to investigate the level of similarity and difference between in vitro cultured cell lines of diverse origins on a protein level, combining antibodybased proteomics and automated image analysis. The extensive compiled protein expression data, covering 20% of the protein coding genes, provide a possibility to explore the proteomic space in commonly used human cell lines and to identify gene expression patterns signifying different phenotypes.
’ RESULTS Analysis of Global Protein Expression Patterns
The relative expression levels of 4356 proteins (Table S1, Supporting Information) were estimated using immunohistochemistry in 45 cell lines, representing major types of solid cancers as well as defined types of leukemia and lymphoma (Table S2, Supporting Information). The level and extent of protein expression were analyzed using immunohistochemistry (IHC) and automated image analysis, taking into account both intensity and area percentage of IHC staining. Immunohistochemical staining results in all cell lines along with high-resolution images are presented on the Human Protein Atlas (www. proteinatlas.org) (Figure 1). The relationships, as defined by overall protein profiles, between cell types were analyzed using hierarchical clustering. The clusters in the resulting dendrogram by large reflected the six subgroups of cell types defined by tumor tissue of origin and/or phenotype, (i) leukemia/lymphoma, (ii) multiple myeloma, (iii) myeloid cells, (iv) epithelial carcinoma, (v) glioma/sarcoma/melanoma, and (vi) TIME (Figure 2). TIME represents a pool of telomerase-gene immortilized human microvascular endothelial cells rather than a cell line established from a malignant tumor and was therefore considered a separate entity. In addition to these six subgroups, hierarchical clustering revealed a seventh subgroup consisting of four cell lines with various tissue origin but with a common denominator of expressing neuroid markers (blue, Figure 2). The spatial distribution of all cell lines was also analyzed by principal component analysis (PCA) (Figure 3). By performing two-group comparisons between pairs of selected subgroups of cell lines, we identified candidate lists of differentially expressed proteins. Most of the performed twogroup comparison tests included groups containing few cell lines, which inevitably leads to lower significance levels. In large-scale proteomic studies, there is always a risk of false positive discoveries and a trade-off has to be considered between the benefits of identifying interesting genes for a certain biological process and the cost caused by including a false positive.13 Here, although the significance of many two-group comparison results could not
ARTICLE
be supported by a FDR adjustment, manual inspection of IHC results and the fold change indicated a biological relevance. We therefore made the decision to consider the resulting protein lists as potentially differentially expressed and as interesting candidate proteins for follow-up studies and report on p-value, FDR value, and fold change in supplementary tables. These proteins were further investigated using Gene Ontology (GO)-based enrichment analyses. The top 100 most differentially expressed proteins in one group compared to another (Table S3, Supporting Information), as well as the top lists of enriched GO terms (Table S4 S11, Supporting Information), from each two-group comparison are listed in supplementary tables. Examples of differentially expressed protein as well as enriched GO terms for the different subgroups of cell lines will be discussed under appropriate subheadings below. Differences between Hematological Cell Lines and Cell Lines Established from Solid Tumors
Among all in vitro cultured cell lines, TIME displayed the most deviant protein profile, residing in a terminal branch at the highest level of separation in the dendrogram. The remaining cell lines all form a second large cluster, consisting in turn of two subclusters, one including cell lines established from solid tumors and one comprising hematological cell lines. Aside from hematological cell lines, the latter cluster also included a group of four cell lines known to share the expression of neurospecific markers, that is, HEK 293 from embryonic kidney, NTERA-2 embryonal carcinoma, SH-SY5Y from neuroblastoma and SCLC-21H from small cell lung carcinoma (blue, Figure 2). A PCA sample plot of in vitro cultured cell lines (Figure 3B) showed a similar arrangement with the neuroid cluster (blue) located between the cell lines from solid tumor cell lines (gray/green) and the hematopoietic cell lines (red/brown). Two-group comparisons of protein profiles between hematological cell lines and cell lines established from solid tumors (not including the group of cell lines expressing neuroid markers) and subsequent GO-based enrichment analysis (Table S5, Supporting Information) revealed differential expression in hematological cell lines of proteins mainly involved in biological processes such as immune response, cell surface receptor linked signal transduction, defense response, and lymphocyte activation. The protein profiles of the cell lines from solid tumors instead exhibited differential expression of cell adhesion-, cell motion-, response to endogenous stimulus-, and cell junction organization-markers (Table S6, Supporting Information), for example, alpha V integrin, typIV alpha 1 collagen, and keratin 18 (Table S3, Supporting Information), as compared to the hematological cell lines. In the dendrogram, several of the adherent cell lines were grouped together according to known phenotypical properties. The glioma-derived cell lines (U-87MG, U-251MG, U-138MG) resided in a terminal branch together with the fibrous histiocytoma cell line (U-2197). The closest relationship was found between the IL-6 dependent multiple myeloma cell line U-266/ 70 and its IL-6 independent long-term in vitro passaged derivative cell line U-266/84. Cell lines derived from hematological malignancies overall showed a clear pattern of subclustering, whereas such pattern was much less obvious for cell lines derived from solid tumors. Within this large solid tumor cluster, cell lines derived from epithelial tumors were intermingled with cell lines from sarcoma, melanoma, and glioma and only a few cell lines showed expected subclustering, for example, cell lines from cervical cancer and skin. 4067
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research
ARTICLE
Figure 1. Annotation of IHC staining in the Human Protein Atlas project. Intensity and fraction, as interpreted by automated image analysis (TMAx), of immuno-positive (brown color) cells are combined to create a four-graded color code for tissue and cells. Example images from the gene ICAM3 (http://www.proteinatlas.org/ENSG00000076662/cell) represent strong membranous immunohistochemical staining in multiple myeloma cell line RPMI-8226, along with image analysis overlay, as well as negative immunohistochemical staining in lung cancer cell line A-549. All annotations and images are publically available on the Human Protein Atlas webpage (www.proteinatlas.org).
Hematological Clustering Reveals Retained Protein Profiles
The hematological cell lines were clustered separately and the characteristics of the hematopoietic dendrogram appeared to harmonize well with established hematological differentiation pathways using CD antigen expression and other phenotypic characteristics as markers (Figure 4A and B). Protein profiles corresponding to 199 antibodies targeting CD molecules were available, and the hematopoietic hierarchical cluster analysis based on this subset of antibodies resulted in a dendrogram (Figure 4C), which highly resembled the dendrogram resulting from all 4356 antibodies. Looking at the dendrogram generated with the full set of antibodies, the Hodgkin lymphoma cell line
(HDLM-2) formed a terminal branch at the highest level of separation. Also in the PCA sample plot, HDLM-2 is located closest to the adherent cell lines (Figure 3B). The remaining hematological cell lines formed two subclusters, one containing all myeloid cell lines and five of the cell lines derived from multiple myeloma, and the other containing the remaining four lymphoid cell lines. The two subclusters show that multiple myeloma cells, representing terminally differentiated B-cells (plasmablasts and plasma cells) growing in bone marrow, are more closely related to cells of myeloid origin compared to more immature lymphocytes in peripheral lymphoid tissue, here represented by leukemia/ lymphoma cell lines (Daudi, U-698, KM3, MOLT-4). 4068
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research
ARTICLE
Figure 2. Hierarchical clustering of cell lines. Dendrogram generated from unsupervised hierarchical clustering of protein expression data from 4356 proteins, showing the relationship between the 45 cell lines. The colors of the scale below the dendrogram depict categories used to classify the cell types. The lymphoid cell lines were further divided into two subclusters of leukemia/lymphoma and mueltiple myeloma cells. In addition, blue is included to visualize the seventh category of cell lines, which was not predefined but defined by the hierarchical cluster analysis and comprised four cell lines with a shared expression of neuroid markers.
Protein Expression of Leukemia/Lymphoma Cells
The deviant protein profiles of the leukemia/lymphoma cells were further analyzed using two-group comparisons. When compared to the terminally differentiated multiple myeloma cells, the leukemia/lymphoma cells displayed a protein profile of active, responsive cells with differential expression of proteins involved in processes such as regulation and activation of lymphocytes, regulation of immune system processes, response to stimulus, and leukocyte activation (Table S6, Supporting Information). When compared to the myeloid cells, leukemia/ lymphoma cells indicated differential expression of gene ontologies protein binding, transcription cofactor activity, biological regulation, and transmembrane receptor protein tyrosine kinase adaptor protein activity (Table S7, Supporting Information). Protein Expression in Myeloid and Myeloma Cells
Two group comparisons were also performed in the reverse direction to describe processes and biology signifying the myeloid/myeloma cluster relative the leukemia/lymphoma cell lines. The results indicated that myeloid and myeloma cells share a differential expression of genes involved in protein folding, regulation of ossification, vasculogenesis, regulation of osteoblast differentiaton, and monosaccaride binding compared to the leukemia/lymphoma cells (Table S8, Supporting Information). Similar results were found also after more in depth two-group comparisons, where the subclusters of myeloma and myeloid cells were compared separately to leukemia/lymphoma cells (Table S9 and S10, Supporting Information). Although the subclusters of myeloid and myeloma cells showed a shared protein
Figure 3. Principal component analysis. Diagrams generated from principal component analysis of protein expression data using Qlucore Omics Explorer, and the same color codes as in Figure 2 for the categorization of cell types into subgroups. The first three principal components (PCs) are visualized in 3D-plots showing the spatial distribution for (A) all cell types (n = 45) using expression data from all 4356 proteins and (B) for all cell lines except TIME cells (n = 44), showing that each cell is found close in space to the cells belonging to the same subgroup.
profile relative to the leukemia/lymphoma cell lines, several differences in protein expression patterns are found between myeloma and myeloid cell lines. Myeloid cell lines displayed higher expression of proteins involved in nucleic acid binding and metabolic and biosynthetic processes, whereas myeloma cell 4069
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research
ARTICLE
Figure 4. Hierarchical clustering of hematopoietic cell lines. (A) Schematic figure showing a simplified diagram of the hematopoiesis, from the multipotent hematopoietic stem cell (MSC) to increasingly differentiated phenotypes. (LyPC, lymphoid progenitor cell; MCPC, mast cell progenitor cell; EPC, erythroid progenitor cell; MPC, myeloid progenitor cell; B, B lymphocyte; T, T lymphocyte; MC, mast cell; E, erythrocyte; Gr, granulocyte; Mo, Monocyte; PC, plasma cell). The 17 hematological in vitro cultured cell lines included in the analysis are depicted at positions in the diagram according to what is known from previous studies regarding their phenotype and classification based on CD markers. (B) Dendrogram derived from hierarchical cluster analysis of protein expression patterns in the hematopoietic cell lines based on 4356 proteins is displayed. The subclustering of cell lines appears consistent with expected positions in the schematic figure of hematopoietic development. (C) Dendrogram derived from hierarchical clustering of the same cell lines using a subset of 199 proteins classified as CD molecules. The different colors of the dendrogram branches correspond to those used for the different cell lineages. *The HDML-2 cell line has been positioned above lymphopoietic development since information regarding its relation to the stage of B-cell development is inconclusive.
lines on the other hand expressed higher levels of proteins involved in antigen processing and presentation, immune response, and immune system process. In the subcluster of myeloid cell lines, HMC-1, which originates from a mast-cell leukemia, was found in a terminal branch. On the next level, the cell line K-562, expressing markers for erythroid, granulocytic, monocytic, and megakaryocytic lineages, was found in a branch together with the erythroleukemia cell line HEL. The promyelocytic cell line HL-60 was found in a branch leaving the remaining three cell lines with myelomonocytic origin in a tight subcluster (THP-1, U-937, NB-4). Differences in Protein Profiles between Lymphoid and Myeloid Cells
The protein profiles of the myeloid cell lines, as compared to all lymphoid cell lines, that is, leukemia/lymphoma as well as myeloma cell lines, were characterized by differential expression of proteins involved in biological processes such as response to external stimulus, receptor activity, defense response, cell adhesion, and response to wounding (Table S11, Supporting Information). Two well-known macrophage markers, IL-18 and CD68 (Figure 5B), were identified as differentially expressed protein candidates in the myeloid cell lines, along with several proteins more or less described in the context of myeloid phenotype or function (Table S3, Supporting Information), for example,
PYGL, the liver isoform of glucose phospholylase. PYGL catalyzes the degradation of glucose and the connection to myeloid phenotype is unknown. The lymphoid cell lines, that is, leukemia/lymphoma and myeloma cell lines, on the other hand, indicated differential expression of proteins involved in biological processes such as immune response, antigen processing and presentation, and MHC class I receptor activity (Table S12, Supporting Information), as compared to in myeloid cell lines. This is exemplified by CD74, a protein known to be expressed by antigen presenting cells, that is, B-cells and macrophages.
’ DISCUSSION In this study, we characterized the protein profiles of commonly used human cell lines and explored how differences specify phenotype and cellular function. Cell lines have successfully been used in basic research for decades, and data collected from the studies of cell lines have provided crucial insights into basic mechanisms regarding cell growth and differentiation. However, the use of cell lines as model systems for comparative studies of genotypic and phenotypic features of normal and malignant cells is controversial.14,15 It should also be noted that the comparison between cell line and tumor tissue is not straightforward but complicated by the fact that a tumor does not constitute a homogeneous mass of clonal cells but rather a 4070
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research
Figure 5. Examples of differentially expressed proteins. Proteins identified as differentially expressed in different categories cell types using two-group comparisons. (A) CD79a expression in lymphoma cell lines (Daudi) compared to in multiple myeloma cell lines (LP-1) (http:// www.proteinatlas.org/ENSG00000105369/cell), (B) CD68 expression in myeloid cell lines (THP-1) compared to in multiple myeloma cell lines (LP-1) (http://www.proteinatlas.org/ENSG00000129226/cell), (C) ICAM3 expression in hematological cell lines (U-937) compared to cell lines from solid tumors (PC-3) (http://www.proteinatlas.org/ ENSG00000076662/cell), and (D) expression of NUMB in cell lines from solid tumors (PC-3) compared to in hematological cell lines (LP-1) (http://www.proteinatlas.org/ENSG00000133961/cell). Cell lines representing each subgroup of cell lines are indicated within parentheses along with links to the respective Human Protein Atlas webpages.
microecosystem within which multiple evolving subclones exist.16 19 In addition, cells growing in vivo naturally engage in an interactive and dynamic interplay with the cellular and structural surroundings. Growing devoid of this natural microenvironment, in vitro cultured cells have been shown to adapt their gene expression profiles20 and thereby lose some of the protein expression profile that signify the cell type of origin. This has also previously been shown in a study comparing expression profiles between primary leukemia patient cells and corresponding in vitro cultured cell lines.21 The large-scale proteomic results presented here suggest that commonly used established cell lines reflect the biology of their respective progenitor cells to a variable degree. This consequently leads to the question of the validity of using cell lines as model systems for particular cell types and defined forms of cancer. The generation and availability of specific probes for 4356 different human proteins allowed us to explore the variability of relative protein expression levels in 47 different cell types represented as established in vitro cultured cell lines. As the number of analyzed proteins corresponds to approximately 20% of the human proteome, our data provides an insight into how protein profiles in a more general sense contribute to differences between cell lines. It has previously been shown that on average approximately 70% of the proteome is expressed in any given cell type and that very few proteins are cell-type specific.12 This implies that phenotypical and functional differences between cell lines, reflected here in results from hierarchical clustering and PCA analysis (Figures 2 and 3), most probably are the effect of a more finetuned regulation of protein levels resulting from a complex machinery of processes such as transcriptional, translational and epigenetic regulation, alternative splicing, protein degradation
ARTICLE
and post-translational protein modifications,22 26 rather than an “off and on” regulation of expression. To a certain extent, the proteomic landscape in cells seems comparable to the transcriptome, where previous deep sequencing studies have shown that nearly 75% of all genes are expressed on the transcript level in various human tissues and cells27 and that it is possible to detect a majority of transcripts in a human cell line.28 TIME cells showed the most deviant protein profile, with a position in the dendrogram that implies that all other cell lines, regardless of both cellular origin and of growth properties, are more similar to each other. TIME cannot be grown indefinitely and is not considered to be an established cell line. The fact that TIME cells, previously shown to retain characteristics of the primary endothelial cells from which they were derived,29 was the only nontransformed cell line might explain its unique protein profile. Despite a presumed shared “in vitro protein expression signature”, the characteristics of the dendrogram generated from hierarchical clustering of cell lines overall appeared to reflect the identity of the tumor from which they were derived. Hematological cell lines and cell lines established from solid tumors formed two separate branches, probably as a consequence of both the inherent divergence of originating from hematopoietic malignancies and solid tumors respectively, and of the cultivating differences of growing in suspension and on a plastic surface. In concordance with this, when comparing these two groups of cell lines our results reveal an up-regulation of proteins involved in immune response, for example, ICAM3 (Figure 5C), and intercellular signaling cascade in suspension cell lines, and of celladhesion proteins in the adherent cell lines. ICAM3 is a protein constitutively and abundantly expressed by all leucocytes and possibly the most important ligand for LFA-1 in the initiation of the immune response. In addition to the expected proteins upregulated in adherently growing cells, for example, integrin and collagen proteins, NUMB was also found to be differentially expressed (Figure 5D). NUMB determines cell fate by antagonizing the activity of the plasma membrane receptor of the NOTCH family, by asymmetrically partitioning at mitosis.30 NUMB has also been found to regulate the tumor suppressor gene p53.31 Although culturing conditions in general undoubtedly affect the protein profile of a cell, it is evident from the cluster of suspension cell lines SH-SY5Y (human neuroblastoma) and SCLC-21H (small cell lung cancer), and adherent cell lines NTERA-2 (from metastatic embryonal carcinoma) and HEK 293 (embryonic kidney) that more inherent properties have a greater impact on gene expression for at least a subset of cell lines (Figure 2). Interestingly, these cell lines were also clearly separated with similar distances from the two main groups of hematopoietic and solid cell lines in the PCA sample plot (Figure 3B). The group of the four cell lines, despite their different growth conditions, is most likely due to a shared neuronal/neuroendocrine molecular signature. The HEK 293 cell line was originally derived by transformation of primary cultures of human embryonic kidney cells with sheared adenovirus type 5 DNA.32 However, HEK 293 has recently been reported to express several neurofilament subunits and exhibits characteristics of neuronal stem cells.33 Thus, our findings corroborate these observations and explain its presence in a cluster together with neuronal SH-SY5Y. In addition, the similarity in expression patterns of these cell lines with SCLC-21H is not surprising, as this lung tumor is well-known to express markers 4071
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research of neuroendocrine differentiation.34 NTERA-2 has been demonstrated as a cell line expressing neurofilament proteins, capable of terminal differentiation into neurons.35,36 Taken together, this implies that although it is evident that adaptation for growth in culture causes a certain level of similarity among cell lines, traits of different lineages of differentiation remain distinguishable. This is in line with a study in which the primary identifiable factor accounting for variation in gene expression among 60 cell lines was found to be the identity of the progenitor cell type.5 Interestingly, our results reveal a clustering that in greater detail harmonizes with cellular origin for the hematological cell lines as compared to the adherent cell lines. Whether this is due to inherent features of the cell types themselves, or whether adherent in vitro growth causes greater aberrations to the native protein profile is not known. Although a subset of adherent cell lines, for example, cell lines derived from different gliomas and cervical carcinomas, form clusters as expected, the results show that lymphoid and myeloid cell lines cluster almost impeccably according to hematological lineages of differentiation (Figure 4B). KM3 and MOLT-4 cluster together as poorly differentiated lymphocytic cell types; U-698 and Daudi in turn are cell lines deriving from B-lymphocytic lymphomas, and Karpas-707, LP-1, RPMI-8226, U-266/70 and U-266/84 cluster together as cell lines derived from multiple myeloma, with the highest correlation overall seen between the two U-266 sublines. Not surprisingly, proteins involved in antigen presentation and processing were shown to potentially be differentially expressed in the leukemia/lymphoma cell lines, as compared to in multiple myeloma cell lines (Table S7, Supporting Information), with markers of differentiation similar to those of normal B-cells. An example is the overexpression of CD79a required, in cooperation with CD79b, for the initiation of the signal transduction cascade activated by binding of antigen to the B-cell antigen receptor complex (BCR), which leads to internalization of the complex, trafficking to late endosomes and antigen presentation. The myeloid cell lines also display protein profiles that reflect cellular origin. All cells derived from leukemias of the monocytic lineage form a cluster (HL-60, NB-4, U-937 and THP-1), with HL-60 displaying the most divergent protein profile as the most undifferentiated cell line of the four. The relationship seen between HEL and K-562 is not unexpected. HEL, an erythroleukemia cell line is known to express markers of several hematopoietic lineages37,38 similarly to K-562, earlier shown to be inducible to differentiate along the erytrocytic cell lineage.39 Furthermore, the surface-expression of granulocyte/monocyte markers of HEL has been shown to share considerable similarities with K-562.40 We found the protein profile of multiple myeloma cells more similar to the protein profile of cells of myeloid cellular origin than that of lymphocytic leukemia/lymphoma cells, as shown in both the dendrogram and the PCA sample plot (Figure 3B). One explanation for this rather unexpected finding could be a shared dependency of myeloma and myeloid cells of a bone marrow microenvironment, and a common need for interactions with bone marrow stromal cells and adhesion to extracellular matrix,41 as opposed to leukemia and lymphoma cells that grow in blood and peripheral lymphoid tissues. The GO enrichment analyses supports this, as ontologies such as vasculogenesis, ossification and bone development describes processes taking place in bone marrow with corresponding proteins up-regulated in the cluster of myeloma and myeloid cells. Although distinctly separated in the dendrogram, myeloma and leukemia/lymphoma cells are also
ARTICLE
found to differentially express proteins belonging to the same ontologies. For example, genes involved in processes of immune response and antigen processing and presentation are enriched compared to the myeloid cells. However, it appears as though the common “homing” to bone marrow for myeloid and myeloma cells overrides the similarities between myeloma and leukemia/ lymphoma cells. Within the group of hematological cell lines, HDLM-2 displayed the most divergent protein profile, and is located close to the group of adherent cell lines both in the dendrogram and in the PCA sample plot (Figure 3B). The underlying explanation for the position of HDLM-2 is unclear as the exact nature of the progenitor cell is unknown. In summary, the large-scale proteomic data presented in the Human Protein Atlas enables in silico searches for potentially interesting proteins expressed in cell lines, and provides the scientific community with validated antibodies for proteins expressed in selected cell lines. In addition, this study shows that large-scale IHC analysis of multiple cell lines, assembled in a CMA, is an attractive strategy for global protein profiling and comparative analyses of protein expression. It is evident from this global analysis that cell lines, although likely to share a “in vitro protein signature”, retain traits and characteristics of their progenitor cells enough to generate a hierarchical cluster according to cellular origin and phenotype. Particularly hematological cell lines appear to constitute good model systems for the respective progenitor cell type, judging from analysis of approximately 20% of the proteome. Moreover, our two-group comparisons reveal novel and previously uncharacterized proteins of interest for further investigations of cellular phenotypes, in addition to expected proteins well-known to be expressed in different cell types.
’ EXPERIMENTAL PROCEDURES Cells and Cell Lines
Forty-seven cell lines, continuously analyzed within the HPA project11 (see Table S1 for extended information on cells and cell lines, Supporting Information), were included in this study. All cell lines were classified into one of six categories, as defined by the respective tissue of origin, that is, leukemia/lymphoma cell lines, multiple myeloma cell lines, myeloid cell lines, carcinoma cell lines, glioma/sarcoma/melanoma cell lines, and TIME. TIME is, although grown in vitro, not a transformed cell line but a pool of Telomerase-immortilized human endothelial cells and was therefore considered a separate group. The cell lines were selected to represent different forms of solid tumors as well as cellular hematopoietic phenotypes representing different fundamental stages of hematopoiesis. Two cell lines (D341 Med and SK-BR-3) were excluded from the analysis due to lack of data for >10% of the antibodies included in the analysis. In total, 45 cell lines were analyzed. Antibodies
In this study, 4356 antibodies (Table S2, Supporting Information) were used for immunohistochemical staining of CMA sections.42 Out of these antibodies, 2752 were generated within the Human Protein Atlas (HPA) project43 and 1604 were obtained from commercial antibody vendors. All antibodies were subjected to an extensive scheme of quality assurance.43 Cell Culture and Cell Microarray Production
The details of cell culturing and cell microarray (CMA) production have previously been described.44 The cell lines were 4072
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research cultured according to recommendations from respective distributor and harvested in a proliferative stage. For cell microarray production, cells were fixed in formalin and dispersed into agarose. The generated cell pellets were histoprocessed and paraffin embedded, resulting in donor blocks for CMA production. From each cell donor block, duplicate 0.6 mm punches were taken and put into one recipient CMA. Immunohistochemistry and Image Analysis
Immunohistochemical staining of CMA sections was performed essentially as previously described11 using an Autostainer Plus instrument (Dako, Glostrup, Denmark). In brief, the primary monospecific antibodies and goat antirabbit/mouse HRP-conjugated Envision (Dako) were incubated for 30 min at room temperature. Diaminobenzidine (DAB) was used as chromogen and Harris hematoxylin (Sigma-Aldrich, St. Louis, MO) was used as counterstaining. All stained CMA sections were scanned using an automated slide-scanning system, Scanscope XT (Aperio Technology, Vista, CA) and generated TIFF images representing separated cell spots were analyzed with automated image analysis software, TMAx (Beecher instruments, Sun Praire, WI). The software automatically identifies cells and detects immunostaining, generating an output file containing information about staining intensity, fraction positive cells, number of cells present per spot, etc. Protein Quantification
A protein quantification score was calculated using TMAx output parameters (i) areas of weak, moderate and strong staining intensity and (ii) the number of cells present in the cell image. Images with insufficient representation of cells (n < 20) were excluded from the analysis. With the assumption that staining intensity to a certain degree reflects the amount of protein present, the total amount of protein detected in each image was calculated. Areas of weak, moderate and strong staining were added together, weighting moderate and strong with arbitrary coefficients 2 and 3, respectively. The total amount of protein was then normalized with respect to the number of cells present in the image, generating average values of protein expression level per cell. To correct for bias introduced by the correlation between cell size and level of protein expression, the values of protein expression level per cell were also normalized with respect to cell size as previously described.45 Protein quantification scores ranged from 0 to 30 259 and were positively skewed; therefore, a log10-transformation was used. These log-transformed quantification scores formed the basis for cell analyses in this study.
ARTICLE
rank correlation coefficients for each pairwise combination of cells. A 1-correlation coefficient transformation converted the correlation matrix into a distance metric, which was used to perform unsupervised hierarchical clustering.49,50 Distances between clusters were measured using the average linkage method at each stage of the clustering. The same procedure was performed on a subset containing the 17 hematopoietic cell lines using both the complete set of proteins as well as a subset of 199 antibodies targeting CD molecules. The list of CD molecules was retrieved from UniProt (http://www.uniprot.org/docs/cdlist). Differentially Expressed Proteins and Enrichment Analyses
To identify differentially expressed proteins by statistical comparisons of two groups, pairs were selected from the categories described above: (i) hematopoietic cell lines and cell lines established from solid tumors (not including the four cell lines expressing neuroid markers), (ii) lymphoid and myeloid cell lines, (iii) leukemia/lymphoma cell lines and multiple myeloma cell lines, (iv) leukemia/lymphoma cell lines and myeloid cell lines, and (v) myeloid cell lines and myeloma cell lines. The twogroup comparisons were performed using the nonparametric Mann Whitney test, one-sided but in two directions so that two lists of differentially expressed proteins sorted by p-value and fold change, calculated as difference in mean value between the groups, were obtained for each pair of groups. P-values adjusted for multiple testing correction using false discovery rate (FDR)51 were also calculated. However, although multiple testing correction reduces the false positive errors, it can also result in tests with reduced likelihood of identifying true differences in expression between the compared groups and increase the false negative error rate.52 As the groups compared here in most cases contain only a few cell lines, FDR adjusted p-values often do not show a significant change even though the fold change may be large. An approach to report on all test results and view the resulting protein lists as candidates for differential expression was therefore decided on.53 The candidate lists, consisting of the top 100 most differentially expressed proteins based on p-value with a maximum limit of 0.05, were used for enrichment analyses based on the Gene Ontology (GO)54 categories “Biological Process” and “Molecular Function”. DAVID55 version 6.7 were used to perform the analysis for each list of genes corresponding to the resulting differentially expressed proteins for the comparisons above. All genes analyzed in this study (n = 4356) were used as background and enrichment p-value, gene count and Benjamini-Hochberg multiple testing correction were reported for the most significant terms.
Data Analysis
’ ASSOCIATED CONTENT
All expression data was analyzed using the R statistical programming environment.46 All antibodies were mapped to gene/protein identifiers obtained from Ensembl47 version 57.37. For genes with more than one mapped antibody, the one with the best validation in Western blot43 was selected so that each antibody represented one gene. The 45 cell lines were grouped into six categories according to tissue of origin, that is, progenitor cell (Table S2, Supporting Information). Qlucore Omics Explorer 2.0 (Qlucore AB, Lund, Sweden) was used to perform a principal component analysis (PCA) of all cells and proteins using the continuous log-transformed protein quantification. K-Nearest Neighbors Impute48 was used for missing value reconstruction.
bS
Hierarchical Clustering of Cells
The protein quantification scores from 4356 proteins in 45 cell lines were used to calculate a correlation matrix based on Spearman’s
Supporting Information Twelve supplemental tables. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*Dr. Anna Asplund, Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, SE-75185 Uppsala, Sweden. E-mail:
[email protected].
’ ACKNOWLEDGMENT The entire staff of the Human Proteome Atlas project (HPA) center in Stockholm and Uppsala, Sweden is acknowledged for 4073
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research their tremendous efforts. This work was supported by grants from the Knut and Alice Wallenberg Foundation and EU sixth framework MOLPAGE integrated project.
’ REFERENCES (1) Garraway, L. A.; et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 2005, 436 (7047), 117–22. (2) Monks, A.; et al. Feasibility of a high-flux anticancer drug screen using a diverse panel of cultured human tumor cell lines. J. Natl. Cancer Inst. 1991, 83 (11), 757–66. (3) Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 2006, 6 (10), 813–23. (4) Drexler, H. G.; Matsuo, Y. Malignant hematopoietic cell lines: in vitro models for the study of multiple myeloma and plasma cell leukemia. Leuk. Res. 2000, 24 (8), 681–703. (5) Ross, D. T.; et al. Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 2000, 24 (3), 227–35. (6) Lukk, M.; et al. A global map of human gene expression. Nat. Biotechnol. 2010, 28 (4), 322–4. (7) Uhlen, M.; Ponten, F. Antibody-based proteomics for human tissue profiling. Mol. Cell. Proteomics 2005, 4 (4), 384–93. (8) Zola, H.; et al. CD molecules 2006--human cell differentiation molecules. J. Immunol. Methods 2007, 319 (1 2), 1–5. (9) Uhlen, M.; et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 2010, 28 (12), 1248–50. (10) Lundberg, E.; Uhlen, M. Creation of an antibody-based subcellular protein atlas. Proteomics 2010, 10 (22), 3984–96. (11) Uhlen, M.; et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 2005, 4 (12), 1920–32. (12) Ponten, F.; et al. A global view of protein expression in human cells, tissues, and organs. Mol. Syst. Biol. 2009, 5, 337. (13) Noble, W. S. How does multiple testing correction work? Nat Biotechnol. 2009, 27 (12), 1135–7. (14) Drexler, H. G.; et al. False leukemia-lymphoma cell lines: an update on over 500 cell lines. Leukemia 2003, 17 (2), 416–26. (15) Masters, J. R. HeLa cells 50 years on: the good, the bad and the ugly. Nat. Rev. Cancer 2002, 2 (4), 315–9. (16) Barrett, M. T.; et al. Evolution of neoplastic cell lineages in Barrett oesophagus. Nat. Genet. 1999, 22 (1), 106–9. (17) Campbell, P. J.; et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 2010, 467 (7319), 1109–13. (18) Notta, F. Evolution of human BCR-ABL1 lymphoblastic leukaemiainitiating cells. Nature 2011, 469 (7330), 362–7. (19) Yachida, S.; et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 2010, 467 (7319), 1114–7. (20) Sandberg, R.; Ernberg, I. The molecular portrait of in vitro growth by meta-analysis of gene-expression profiles. Genome Biol. 2005, 6 (8), R65. (21) Dairkee, S. H.; et al. A molecular 'signature’ of primary breast cancer cultures; patterns resembling tumor tissue. BMC Genomics 2004, 5 (1), 47. (22) Vogel, C.; et al. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol. Syst. Biol. 2010, 6, 400. (23) Frappier, L.; Verrijzer, C. P. Gene expression control by protein deubiquitinases. Curr. Opin. Genet. Dev. 2011, 21 (2), 207–13. (24) Meissner, A. Epigenetic modifications in pluripotent and differentiated cells. Nat. Biotechnol. 2010, 28 (10), 1079–88. (25) Young, R. A. Control of the embryonic stem cell state. Cell 2011, 144 (6), 940–54. (26) Nagano, T.; Fraser, P. No-nonsense functions for long noncoding RNAs. Cell 2011, 145 (2), 178–81. (27) Ramskold, D.; et al. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol. 2009, 5 (12), e1000598.
ARTICLE
(28) Sultan, M.; et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 2008, 321 (5891), 956–60. (29) Venetsanakos, E.; et al. Induction of tubulogenesis in telomerase-immortalized human microvascular endothelial cells by glioblastoma cells. Exp. Cell Res. 2002, 273 (1), 21–33. (30) Roegiers, F.; Jan, Y. N. Asymmetric cell division. Curr. Opin. Cell Biol. 2004, 16 (2), 195–205. (31) Colaluca, I. N.; et al. NUMB controls p53 tumour suppressor activity. Nature 2008, 451 (7174), 76–80. (32) Graham, F. L.; et al. Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J Gen. Virol. 1977, 36 (1), 59–74. (33) Shaw, G.; et al. Preferential transformation of human neuronal cells by human adenoviruses and the origin of HEK 293 cells. FASEB J. 2002, 16 (8), 869–71. (34) Bepler, G.; et al. Markers and characteristics of human SCLC cell lines. Neuroendocrine markers, classical tumor markers, and chromosomal characteristics of permanent human small cell lung cancer cell lines. J. Cancer Res. Clin. Oncol. 1987, 113 (3), 253–9. (35) Lee, V. M.; Andrews, P. W. Differentiation of NTERA-2 clonal human embryonal carcinoma cells into neurons involves the induction of all three neurofilament proteins. J. Neurosci. 1986, 6 (2), 514–21. (36) Pleasure, S. J.; Lee, V. M. NTera 2 cells: a human cell line which displays characteristics expected of a human committed neuronal progenitor cell. J. Neurosci. Res. 1993, 35 (6), 585–602. (37) Leary, J. F.; et al. Multipotent human hematopoietic cell line K562: lineage-specific constitutive and inducible antigens. Leuk. Res. 1987, 11 (9), 807–15. (38) Papayannopoulou, T.; et al. The surface antigen profile of HEL cells. Prog. Clin. Biol. Res. 1983, 134, 277–92. (39) Andersson, L. C.; Nilsson, K.; Gahmberg, C. G. K562--a human erythroleukemic cell line. Int. J. Cancer 1979, 23 (2), 143–7. (40) Rimmer, E. F.; Horton, M. A. Origin of human mast cells studied by dual immunofluorescence. Clin. Exp. Immunol. 1987, 68 (3), 712–8. (41) Hideshima, T.; et al. Understanding multiple myeloma pathogenesis in the bone marrow to identify new therapeutic targets. Nat. Rev. Cancer 2007, 7 (8), 585–98. (42) Stromberg, S.; Bjorklund, M. G.; Asplund, C.; et al. A highthroughput strategy for protein profiling in cell microarrays using automated image analysis. Proteomics 2007, 7 (13), 2142–2150. (43) Berglund, L.; et al. A genecentric Human Protein Atlas for expression profiles based on antibodies. Mol. Cell. Proteomics 2008, 7 (10), 2019–27. (44) Andersson, A. C.; et al. Analysis of protein expression in cell microarrays: a tool for antibody-based proteomics. J. Histochem. Cytochem. 2006, 54 (12), 1413–23. (45) Lundberg, E.; et al. The correlation between cellular size and protein expression levels--normalization for global protein profiling. J. Proteomics 2008, 71 (4), 448–60. (46) R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2010; Volume 1, Issue 09/18/2009. (47) Hubbard, T. J.; et al. Ensembl 2009. Nucleic Acids Res. 2009, 37 (Database), D690–7. (48) Troyanskaya, O.; et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17 (6), 520–5. (49) Eisen, M. B.; et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95 (25), 14863–8. (50) Golub, T. R.; et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286 (5439), 531–7. (51) Y. Benjamini, Y. H. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., Ser. B 1995, 57, 289–300. (52) Rothman, K. J. No adjustments are needed for multiple comparisons. Epidemiology 1990, 1 (1), 43–6. 4074
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075
Journal of Proteome Research
ARTICLE
(53) Saville, D. J. Multiple Comparison Procedures: The Practical Solution. Am. Stat. 1990, 44, 174–180. (54) Ashburner, M.; et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25 (1), 25–9. (55) Huang da, W.; Sherman, B. T.; Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4 (1), 44–57.
4075
dx.doi.org/10.1021/pr200259v |J. Proteome Res. 2011, 10, 4066–4075