Tumor Classification of Six Common Cancer Types Based on

Jan 8, 2012 - These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entit...
6 downloads 7 Views 3MB Size
Technical Note pubs.acs.org/jpr

Tumor Classification of Six Common Cancer Types Based on Proteomic Profiling by MALDI Imaging Stephan Meding,† Ulrich Nitsche,‡ Benjamin Balluff,† Mareike Elsner,† Sandra Rauser,† Cédrik Schöne,† Martin Nipp,§ Matthias Maak,‡ Marcus Feith,‡ Matthias P. Ebert,∥ Helmut Friess,‡ Rupert Langer,⊥ Heinz Höfler,†,⊥ Horst Zitzelsberger,§ Robert Rosenberg,‡,# and Axel Walch*,† †

Institute of Pathology, Helmholtz Zentrum München, Neuherberg, Germany Department of Surgery, Klinikum rechts der Isar, Technische Universität München, Germany § Research Unit Radiation Cytogenetics, Department of Radiation Sciences, Helmholtz Zentrum München, Neuherberg, Germany ∥ II. Medizinische Klinik, Universitätsklinikum Medizinische Fakultät Mannheim der Universität Heidelberg, Germany ⊥ Institute of Pathology, Technische Universität München, Germany # Klinik für Allgemein-, Viszeral- und Gefässchirurgie, Kantonsspital Baden, Switzerland ‡

S Supporting Information *

ABSTRACT: In clinical diagnostics, it is of outmost importance to correctly identify the source of a metastatic tumor, especially if no apparent primary tumor is present. Tissue-based proteomics might allow correct tumor classification. As a result, we performed MALDI imaging to generate proteomic signatures for different tumors. These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entities located at different organ sites (esophagus, breast, colon, liver, stomach, thyroid gland, n = 171) was classified using two algorithms for a training and test set. For the test set, Support Vector Machine and Random Forest yielded overall accuracies of 82.74 and 81.18%, respectively. Then, colon cancer liver metastasis samples (n = 19) were introduced into the classification. The liver metastasis samples could be discriminated with high accuracy from primary tumors of colon cancer and hepatocellular carcinoma. Additionally, colon cancer liver metastasis samples could be successfully classified by using colon cancer primary tumor samples for the training of the classifier. These findings demonstrate that MALDI imaging-derived proteomic classifiers can discriminate between different tumor types at different organ sites and in the same site. KEYWORDS: proteomic classifier, tumor classification, proteomic classification, CUP classification, tumor diagnosis, MALDI imaging, MALDI-IMS, MALDI-MSI, imaging MS



INTRODUCTION Correct, unambiguous tumor diagnosis is the initial step in cancer therapy since the patient’s regimen is based on the tumor classification. However, in a significant number of metastasized cancers (2.3−4.2% of cancer cases, worldwide), the primary tumor cannot be identified and thus they are diagnosed as cancer of unknown primary (CUP).1,2 Clinical diagnosis typically relies on histological and often extensive immunohistochemical analyses of tumor biopsies.3 New classification methods by molecular profiling have emerged over the past decade. Gene expression profiling was the first to be employed,4−10 before attention turned toward proteomics.11−15 Most tissue-based methods, such as gene expression profiling or two-dimensional gel electrophoresis, need large amounts of tissue materials, which are normally not procured by pretherapeutic biopsies. Furthermore, these methods neglect tissue morphology, which © 2012 American Chemical Society

renders them prone to sampling artifacts. Proteomic classification approaches seem very promising, especially since novel tissuebased methods have been developed that require less sample material and have shown their capability of accurate tumor classification.16−22 Matrix-assisted laser desorption/ionization (MALDI) imaging is one of these methods. MALDI imaging generates spatially resolved proteomic profiles while preserving the morphological integrity of the analyzed tissue.23−27 As a result, the proteomic patterns of distinct tissue features, such as cancer cells, can be extracted and used for classification. Importantly, MALDI imaging has already been hypothesized to be well suited for diagnostic tumor classification28 and had been Received: August 16, 2011 Published: January 8, 2012 1996

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

Table 1. Characteristics of the Tumor Samples organ site

tumor origin

tumor type

subtype

Distal esophagus

Primary tumor

Adenocarcinoma

− (Barrett’s)

Breast

Primary tumor

Adenocarcinoma

Invasive Ductal

Colon

Primary tumor

Adenocarcinoma

Liver

Primary tumor

Adenocarcinoma

Hepatocellular

Metastasis of colon carcinoma

Adenocarcinoma



Primary tumor

Adenocarcinoma

Intestinal

Stomach

Diffuse Mixed NA Thyroid gland

Primary tumor

Adenocarcinoma

used in several studies where tumors with different clinical end points were classified with high accuracy.16,17,19,20,22 In the present study, we tested the utility of MALDI imaging for tumor classification using a two-step approach. At first, we tested whether it is possible to discriminate tumor entities located in different organ sites by their proteomic profiles. For this, we analyzed six adenocarcinoma entities (Barrett’s cancer, breast cancer, colon cancer, hepatocellular carcinoma, gastric cancer, thyroid carcinoma; n = 171) by MALDI imaging and generated a proteomic classifier, which could discriminate the tumor entities with high accuracy. After this proof-of-principle, we tested our approach in the more clinically relevant context of identifying the tumor origin of a metastasis with an unknown primary tumor. Therefore, we tested first whether it is possible to classify colon cancer primary tumors, colon cancer liver metastases and hepatocellular carcinomas accurately and then we tested if it is possible to classify colon cancer liver metastasis as being colon cancer samples if colon cancer primary tumor samples are used for the training of the classifier.



Papillary

grading

number of samples

G1 G2 G3 G2 G3 G2 G3 G1 G2 G3 G2 G3 G2 G3 G3 G3 G2 G3 −

3 11 19 2 28 13 8 1 9 5 16 3 9 14 9 3 3 5 29

(n = 19), all tissue samples were used, resulting in a cohort of 190 tumor samples (Table 1). MALDI Imaging

Tissue samples were cryo-sectioned (12 μm; CM1950, Leica Microsystems, Wetzlar, Germany) and thaw-mounted onto conductive slides (Bruker Daltonik, Bremen, Germany). The sections were washed (70 and 100% ethanol, 1 min each) and sinapinic acid (10 g/L in 60% acetonitrile in water, 0.2% trifluoroacetic acid) matrix was applied with an automated spray device (ImagePrep, Bruker Daltonik, Bremen, Germany) as described previously.19 MALDI imaging was performed on an Ultraflex III MALDI-TOF/TOF with FlexControl 3.0 and FlexImaging 3.0 software (Bruker Daltonik, Bremen, Germany) in positive linear mode with a detection range of m/z 2500− 25000, a sampling rate of 0.1 GS/s, a lateral resolution of ≤200 μm, and 200 laser shots per measuring point. Protein calibration standard I (Bruker Daltonik, Bremen, Germany) was spotted adjacent to the tissue sections and used for spectra calibration. After the MALDI imaging measurement, the matrix was rinsed off with 70% ethanol, the sections were stained with hematoxylin and eosin (H&E), scanned with a MIRAX DESK system (Carl Zeiss MicroImaging, Göttingen, Germany), and the images were coregistered with the MALDI imaging results to correlate the mass spectrometric data with the histological features of the same section (for detailed description see Supporting Information).

MATERIALS AND METHODS

Tissue Specimens

All tumor tissue specimens were procured from patients between 1987 and 2006 and written informed consent was obtained. Approval was given by the Ethics Committee of the Technische Universität München. The tumor tissue specimens were snap-frozen after resection and stored in liquid nitrogen. For the classification of six tumor entities located in different organ sites, a total of 171 primary tumors were used. The cohort comprised 33 Barrett’s cancer (adenocarcinoma of the distal esophagus), 30 breast cancer (invasive ductal), 21 colon cancer, 15 hepatocellular carcinoma, 43 gastric cancer and 29 thyroid cancer (papillary) specimens (Table 1). For classification of three tumor entities, which are either located in the same organ site or which have the same origin, 55 tumor samples were used. The cohort comprised 21 colon cancer primary tumor, 19 colon cancer liver metastasis and 15 hepatocellular carcinoma primary tumor specimens (Table 1). For classification of all primary tumor samples and the colon cancer liver metastasis samples

Data Processing and Statistical Analysis

In all MALDI imaging measurements, spectra associated with cancer cells were selected. In the FlexImaging 3.0 software (Bruker Daltonik), the MS data and the H&E staining of the tissue samples were covisualized. The high image resolution allowed analysis of cellular details within the tissue. Then, regions of interest were drawn that only contained cancer cells. The spectra of these regions of interest were extracted. Afterward, 40 spectra specific for cancer cells were randomly selected from each tissue sample, and the whole set of spectra was imported into the ClinProTools 2.2 software (Bruker Daltonik) for data processing, which included normalization to the total ion count of each spectrum, peak identification, and spectra alignment to 1997

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

Table 2. Classification of Six Tumor Entities Located in Different Organ Sites, Training Set and Test Set Make-up

Total Number of Samples Samples in Training Set Samples in Test Set

Barrett’s cancer

breast cancer

colon cancer

gastric cancer

hepatocellular carcinoma

papillary thyroid cancer

total number of samples

33 22 11

30 20 10

21 14 7

43 29 14

15 10 5

29 19 10

171 114 57

correct for mass shifts between measurements. For classification, the processed data were exported as CART files and loaded into the R statistical software (R Foundation for Statistical Computing). For classification, a training set and a test set were generated. Both sets comprised samples of all tumor types included into the respective classification. For the training set, two-thirds of the samples of each tumor type were randomly selected, the remaining third was put into the test set (Table 2 and Figure 1). In the case of the classification of all primary

the Random Forest classification had 100 additional repeats for each sampling, its frequency was divided by 100 before uniting it with the frequency of misclassification of the Support Vector Machine classification.



RESULTS

Classification of Six Tumor Entities Located in Different Organ Sites

Tumor samples (n = 171) of six tumor entities located in different organ sites (Barrett’s cancer, n = 33; breast cancer, n = 30; colon cancer, n = 21; hepatocellular carcinoma, n = 15; gastric cancer, n = 43; thyroid cancer, n = 29) were analyzed by MALDI imaging (Figure 2). This resulted in mass spectra which could already be differentiated by manual inspection. The cancer cell specific spectra were extracted and classified (Table 2). Regarding the 50 repeats of sampling the classification was based in average on 117 m/z species (112−123 m/z species, see Supplementary Data and Figure, Supporting Information). For the training set, the overall accuracy was 99.33% for the Support Vector Machine and 100% for the Random Forest algorithm. The sensitivities, specificities, and accuracies for the individual tumor entity subsets were higher than 98% for the Support Vector Machine (SVM) and 100% for the Random Forest (RF) algorithm (Table 3). Applying the classifiers to the test set yielded an overall accuracy of 82.74% for the Support Vector Machine and 81.18% for the Random Forest algorithm. The individual sensitivities were mostly above 80%. The sensitivity for the hepatocellular carcinoma sample subset was lower for both classification algorithms. For the gastric cancer sample subset, it was slightly below 80% for the Random Forest algorithm (Table 3). The individual specificities for all tumor entity subsets and both classifiers were higher than 90%, mostly even higher than 95% (Table 3). The individual accuracies were higher than 85%, in most of the cases even higher than 95% (Table 3).

Figure 1. Schematic display of the classification of the six adenocarcinoma entities. First, a model is generated using a training set. Then, this model is validated on a test set. For reducing the sampling errors, the classification results are the average of 50 independent samplings (Hepat. Carcinoma = Hepatocellular Carcinoma).

tumor samples and colon cancer liver metastasis samples, all noncolon cancer samples were split up into training and test set as described above while colon cancer primary tumor samples were only used for the training set and colon cancer liver metastasis samples were only used for the test set. Since the distribution of the intensities of the m/z species is unknown and a normal distribution cannot be assumed, the m/z species used for classification were identified by pairwise comparison of the spectra data of the tumor entities in the training set using a nonparametric Wilcoxon rank-sum test (p < 0.05 and AUC > 0.8; R packages stats and ROCR). All resultant m/z species were united. Two classifiers were used during the study: Support Vector Machine (R package e1071) and Random Forest (R package randomForest). Due to the nondeterministic nature of the Random Forest algorithm, each Random Forest classification was repeated 100 times and the average of the results was calculated. The classifiers were established using the previously identified m/z species and the training set. Afterward, the classifiers were applied to the test set. To minimize sampling errors, the selection for the training and test sets was repeated 50 times. The classification results given in the tables are the average of the 50 single classification results. For correlation of the misclassification rate with the respective tumor depth (T) or grading (G), a Spearman’s rank correlation was used (R package stats). The frequency of misclassifications in all samplings and in the case of the Random Forest algorithm for all repeats was counted for each sample within the test set. The resulting numbers were correlated with the tumor depth (T) or grading (G). For the combined analysis, a weighted average of the frequency of both classifications was used. Since

Classification of Colon Cancer Liver Metastasis

After it became clear that the different tumor entities located in different organ sites could be discriminated with high confidence, we tested whether the proteomic classification approach is even capable of distinguishing primary tumors from distant metastasis. For this we added colon cancer liver metastasis samples (n = 19) and generated two cohorts for classifications. The additional patient samples were analyzed by MALDI imaging, and the spectra specific for cancer cells were extracted in the same fashion as before. At first, we tested whether it is possible to discriminate colon cancer primary tumors (n = 21), colon cancer liver metastases (n = 19) and hepatocellular carcinomas (n = 15) (Table 4). Regarding the 50 repeats of sampling, the classification was based in average on 50 m/z species (36−63 m/z species, see Supplementary Data, Supporting Information). For the training set, the overall accuracy was 94.92% for the Support Vector Machine (SVM) and 100% for the Random Forest (RF) algorithm. The individual sensitivities, specificities, and accuracies for the three 1998

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

Figure 2. Histological stainings (Hematoxylin&Eosin) of representative samples of each tumor entity (A, Barrett’s Cancer; B, Breast Cancer; C, Colon Cancer; D, Gastric Cancer; E, Hepatocellular Carcinoma; F, Thyroid Cancer) and their average proteomic spectra. Regarding the 50 repeats of sampling, the classification was based on average on 117 m/z species (112−123 m/z species). Exemplarily, three regions are highlighted in gray and enlarged. Six masses (m/z 6979, m/z 7003, m/z 8452, m/z 8567, m/z 11607, and m/z 11647) that display differential expression by manual inspection are marked with arrows.

Table 3. Classification of Six Tumor Entities Located in Different Organ Sites, Classification Results Barrett’s cancer

hepat. carcinoma

thyroid cancer

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

100 100 100

100 100 100

99.70 99.81 99.79

100 100 100

98.43 100 99.81

100 100 100

98.97 99.98 99.72

100 100 100

98.20 99.79 99.65

100 100 100

100 99.64 99.70

100 100 100

99.33

100

85.82 98.26 95.86

84.96 94.51 92.66

80.40 96.85 93.96

85.27 97.80 95.60

82.86 98.56 96.63

88.69 97.13 96.09

81.00 90.00 87.79

76.25 92.36 88.40

73.20 98.31 96.11

54.36 99.08 95.16

88.80 96.47 95.12

87.96 95.81 94.44

82.74

81.18

SVM Training Set Sensitivity [%] Specificity [%] Accuracy [%] Test Set Sensitivity [%] Specificity [%] Accuracy [%]

breast cancer

colon cancer

gastric cancer

subsets were higher than 95% for the Support Vector Machine. Only the sensitivity for the hepatocellular carcinoma subset (87.20%) and the specificity for the colon cancer liver metastasis

overall result

subset (92.42%) were slightly lower. The individual sensitivities, specificities and accuracies for the three subsets were 100% for the Random Forest algorithm (Table 5). Applying the classifiers 1999

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

individual accuracies were higher than 90% for Barrett’s cancer, breast cancer, and thyroid cancer for both classifiers. For gastric cancer the accuracy was slightly lower than 90%, for hepatocellular carcinoma ∼85%, and for colon cancer higher than 80% for both classifiers (Table 7). The individual sensitivities were higher than 80% for the Barrett’s cancer and the thyroid cancer subset for both classifiers, and for breast cancer with the Random Forest algorithm and for gastric cancer for the Support Vector Machine. The breast cancer subset for the Support Vector Machine and the gastric cancer subset for the Random Forest algorithm yielded slightly lower sensitivities (>75%). The colon cancer subset displayed a sensitivity slightly lower than 50% for both classifiers and the hepatocellular carcinoma subset a sensitivity higher than 60% for the Support Vector Machine and one slightly lower than 50% for the Random Forest algorithm (Table 7). The individual specificities were higher than 95% for the Barrett’s cancer, breast cancer, colon cancer, and thyroid cancer subsets for both classifiers. For the gastric cancer subset, they were higher than 90% for both classifiers. For hepatocellular carcinoma, they were higher than 85% for both classifiers (Table 7).

Table 4. Classification of Colon Cancer Primary Tumors, Colon Cancer Liver Metastases and Hepatocellular Carcinomas, Training Set and Test Set Make-up colon cancer primary tumor

colon cancer liver metastasis

hepatocellular carcinoma

total number of samples

21

19

15

55

14

13

10

37

7

6

5

18

Total Number of Samples Samples in Training Set Samples in Test Set

to the test set yielded an overall accuracy of 84.11% for the Support Vector Machine and 82.32% for the Random Forest algorithm. The individual sensitivities for the colon cancer primary tumor and for the colon cancer liver metastasis were higher than 80% for both classifiers. The sensitivities for hepatocellular carcinoma were higher than 70%. The individual specificity and accuracy were higher than 85% for the three subsets and for both classifiers (Table 5). Since the discrimination of the two primary tumors and metastasis worked very well, we then tested whether it is possible to classify colon cancer liver metastasis samples correctly as being colon cancer samples. For this classification approach all available tumor samples were used (n = 190, Table 6). The training set comprised all colon cancer primary tumor samples and twothirds of each other primary tumor entity (n = 121). The test set comprised all colon cancer liver metastasis samples and the remaining thirds of the other primary tumor entities (n = 69). Regarding the 50 repeats of sampling, this classification was based in average on 118 m/z species (112−124 m/z species, see Supplementary Data, Supporting Information). For the training set, the overall accuracy was 99.59% for the Support Vector Machine (SVM) and 100% for the Random Forest (RF) algorithm. The individual sensitivities, specificities, and accuracies for the three subsets were higher than 98% for the Support Vector Machine and 100% for the Random Forest algorithm (Table 7). Applying the classifiers to the test set yielded an overall accuracy of 71.25% for the Support Vector Machine and 70.25% for the Random Forest algorithm. The



DISCUSSION The correct identification of the tumor origin is crucial for a personalized, individually tailored treatment regimen. Over the last decades, new molecular methods, such as gene and protein expression analysis, have been established for discriminating between different tissue types or tumor entities. Gene expression analyses could provide accurate classification results.4−10 However, proteomic analyses, which first used body fluids, mostly serum, and later turned to tissue samples, were able to discriminate tumor samples from healthy progenitor samples with high accuracy as well.11,13,15,18,21 Furthermore, the clinically more relevant and technically more challenging problem of discriminating various tumor entities or molecularly distinct tumor subgroups could also be addressed successfully.12,14,16,19 Gene-expression profiling and proteomic methods that use tissue homogenates face two problems. First, more sample material is needed for analysis than can be procured in pretherapeutic diagnostics and second, the necessary assumption of

Table 5. Classification of Colon Cancer Primary Tumors, Colon Cancer Liver Metastases and Hepatocellular Carcinomas, Classification Results colon cancer primary tumor Training Set Sensitivity [%] Specificity [%] Accuracy [%] Test Set Sensitivity [%] Specificity [%] Accuracy [%]

colon cancer liver metastasis

hepatocellular carcinoma

overall result

SVM

RF

SVM

RF

SVM

RF

SVM

RF

95.71 100 98.38

100 100 100

100 92.42 95.08

100 100 100

87.20 99.78 96.38

100 100 100

94.92

100

85.71 93.09 90.02

86.93 89.62 88.57

91.33 85.50 87.44

81.42 86.99 85.14

73.20 97.23 90.56

76.94 96.30 90.92

84.11

82.32

Table 6. Classification of Six Primary Tumor Entities and Colon Cancer Liver Metastasis, Training Set and Test Set Make-up

Total Number of Samples Samples in Training Set Samples in Test Set

Barrett’s cancer

breast cancer

colon cancer

gastric cancer

hepat. carcinoma

thyroid cancer

colon cancer liver metastasis

total number of samples

33

30

21

43

15

29

19

190

22 11

20 10

21 0

29 14

10 5

19 10

0 19

121 69

2000

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

Table 7. Classification of Six Primary Tumor Entities and Colon Cancer Liver Metastasis, Classification Results Barrett’s cancer

hepat. carcinoma

thyroid cancer

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

SVM

RF

100 100 100

100 100 100

99.60 99.80 99.77

100 100 100

99.24 100 99.87

100 100 100

99.79 100 99.95

100 100 100

98.00 99.86 99.70

100 100 100

100 99.86 99.88

100 100 100

99.59

100

82.73 97.34 95.01

84.13 96.18 94.26

78.80 96.20 93.68

88.05 97.24 95.90

43.26 96.88 82.11

47.98 95.28 82.26

85.14 90.84 89.68

75.29 92.31 88.86

64.80 87.25 85.62

46.44 87.66 84.68

88.00 97.80 96.38

87.19 96.76 95.37

71.25

70.25

SVM Training Set Sensitivity [%] Specificity [%] Accuracy [%] Test Set Sensitivity [%] Specificity [%] Accuracy [%]

breast cancer

colon cancer

gastric cancer

overall result

classified even if the primary tumor cannot be found. For this, we introduced colon cancer liver metastasis into the sample cohort and made two classifications. In the first we tested whether it is possible to discriminate colon cancer primary tumors, colon cancer liver metastases and hepatocellular carcinomas. Again the classification of the training set was close to perfect. Interestingly, the test set classification also yielded also a high accuracy (84.11% for the Support Vector Machine and 82.32% for the Random Forest algorithm). This result indicates that even closely related entities such as the primary tumor of colon cancer and its liver metastasis could be classified efficiently. As before, the rate of misclassifications displayed no significant correlation with the tumor depth (T) and grading (G), which was tested by Spearman’s rank correlation test (p < 0.05) (Supplementary Tables 3 and 4, Supporting Information). The more likely explanation for the misclassifications is again the imminent molecular heterogeneity within tumors of the same entity. Additionally, we tested if primary tumor samples can be used for correct classification of distant metastasis samples, in this case samples from the primary tumor and the liver metastasis of colon cancer. The classification yielded, as could be expected, close to perfect overall accuracy for both Support Vector Machine and Random Forest in the trainings set (99.59 and 100%, respectively), and a high overall accuracy in the test set (71.25 and 70.25%, respectively). The rate of misclassifications displayed no significant correlation with the tumor depth (T) and grading (G), which was tested by Spearman’s rank correlation test (p < 0.05) (Supplementary Tables 5 and 6, Supporting Information). The sensitivities especially for colon cancer were relatively low, but the specificities were very high. Thus, this classification approach worked but including distant metastases of know origin into the classifier would most likely improve the classification sensitivity, specificity, and accuracy. Then this method could become a useful, additional tool for clinical tumor diagnostics. Taken together, we were able to generate a classifier that was based on MALDI imaging-derived spectra data and that was well suited for accurate tumor classification. Therefore, MALDI imaging might open new fields in tissue sample classification. This proof-of-principle study shows for the first time that proteomic classification of solid tumor entities can be highly accurate while needing a minimal amount of tissue. Other applications can also be envisioned, for example, the classification of lymphomas,20 tumor subtypes,16,19 or morphologically similar, non-neoplastic conditions such as chronic inflammatory diseases. Thus, MALDI imaging represents a valuable future tool in clinical diagnostics.

tissue homogeneity can negatively influence the results. This might explain why these methods have not been implemented in diagnostics. In the last years, first mass spectrometric tissue profiling, then MALDI imaging (mass spectrometry) have emerged to address these two issues. Both methods require very little tissue materiala single tissue section from an endoscopic biopsy is enough for analysis26and retain morphology during analysis. As a result, the acquired proteomic pattern can be compared with the histological staining to check the cellular composition of the given tissue.29 For these reasons, we chose MALDI imaging for tumor classification. In our study, the initial classification of six tumor entities located in different organ sites yielded a high accuracy in both the training and the test set. The training set could be classified nearly perfectly. The classification of the test set yielded a reduced but still high accuracy (82.74% for the Support Vector Machine and 81.18% for the Random Forest algorithm). Both classifiers yielded comparable results indicating robustness. Misclassifications in this study are probably not due to tumor depth or grading since the Spearman’s rank correlation test of the rate of misclassification with the tumor depth (T) and grading (G) could not determine a significant (p < 0.05) correlation (Supplementary Tables 1 and 2, Supporting Information). A reasonable explanation for the misclassifications is the existence of molecular subtypes within tumor entities. In breast cancer, five distinct subtypes are recognized that express different molecular features and display a different clinical outcome.30 Such molecular heterogeneity is likely to exist in all tumor entities. So far, most proteomic studies have been concerned with the discrimination of normal, “healthy” tissue from tumor tissue. Of those studies concerned with the discrimination of tumor entities, Villanueva et al. were able to distinguish three tumor entities, prostate, bladder and breast cancer, using serum samples.14 Bloom et al. were able to generate a classifier with an overall accuracy of 82% for a patient cohort consisting of six tumor entities: breast, colon, gastric, kidney, lung, and ovary cancer.12 While these two studies proved that tumor entities can be discriminated by classification, our study furthers these results. Apart from also containing six tumor entities it consists of a training set for generation of the classifier and a test set for its validation. This feature is highly relevant for assessing the power of the classification. If a classifier is employed on a new, independent test cohort, the risk for data overfitting of the classifier to the initial data set is reduced and thus the classification results become more reliable.31 After proving that MALDI imaging derived proteomic patterns can be used for accurate discrimination of primary tumors we wanted to test if metastases can be successfully classified as well. This was done to address an important challenge in diagnostics where metastases have to be correctly 2001

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research



Technical Note

of unknown primary and correlation with clinical evaluation. J. Clin. Oncol. 2008, 26 (27), 4442−8. (11) Adam, B. L.; Qu, Y.; Davis, J. W.; Ward, M. D.; Clements, M. A.; Cazares, L. H.; Semmes, O. J.; Schellhammer, P. F.; Yasui, Y.; Feng, Z.; Wright, G. L. Jr. Serum protein fingerprinting coupled with a patternmatching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 2002, 62 (13), 3609−14. (12) Bloom, G. C.; Eschrich, S.; Zhou, J. X.; Coppola, D.; Yeatman, T. J. Elucidation of a protein signature discriminating six common types of adenocarcinoma. Int. J. Cancer 2007, 120 (4), 769−75. (13) Scarlett, C. J.; Smith, R. C.; Saxby, A.; Nielsen, A.; Samra, J. S.; Wilson, S. R.; Baxter, R. C. Proteomic classification of pancreatic adenocarcinoma tissue using protein chip technology. Gastroenterology 2006, 130 (6), 1670−8. (14) Villanueva, J.; Shaffer, D. R.; Philip, J.; Chaparro, C. A.; Erdjument-Bromage, H.; Olshen, A. B.; Fleisher, M.; Lilja, H.; Brogi, E.; Boyd, J.; Sanchez-Carbayo, M.; Holland, E. C.; Cordon-Cardo, C.; Scher, H. I.; Tempst, P. Differential exoprotease activities confer tumor-specific serum peptidome patterns. J. Clin. Invest. 2006, 116 (1), 271−84. (15) Lee, N. P.; Chen, L.; Lin, M. C.; Tsang, F. H.; Yeung, C.; Poon, R. T.; Peng, J.; Leng, X.; Beretta, L.; Sun, S.; Day, P. J.; Luk, J. M. Proteomic expression signature distinguishes cancerous and nonmalignant tissues in hepatocellular carcinoma. J. Proteome Res. 2009, 8 (3), 1293−303. (16) Balluff, B.; Elsner, M.; Kowarsch, A.; Rauser, S.; Meding, S.; Schuhmacher, C.; Feith, M.; Herrmann, K.; Rocken, C.; Schmid, R. M.; Hofler, H.; Walch, A.; Ebert, M. P. Classification of HER2/neu status in gastric cancer using a breast-cancer derived proteome classifier. J. Proteome Res. 2010, 9 (12), 6317−22. (17) Groseclose, M. R.; Massion, P. P.; Chaurand, P.; Caprioli, R. M. High-throughput proteomic analysis of formalin-fixed paraffinembedded tissue microarrays using MALDI imaging mass spectrometry. Proteomics 2008, 8 (18), 3715−24. (18) Le Faouder, J.; Laouirem, S.; Chapelle, M.; Albuquerque, M.; Belghiti, J.; Degos, F.; Paradis, V.; Camadro, J. M.; Bedossa, P. Imaging mass spectrometry provides fingerprints for distinguishing hepatocellular carcinoma from cirrhosis. J. Proteome Res. 2011, 10 (8), 3755−65. (19) Rauser, S.; Marquardt, C.; Balluff, B.; Deininger, S. O.; Albers, C.; Belau, E.; Hartmer, R.; Suckau, D.; Specht, K.; Ebert, M. P.; Schmitt, M.; Aubele, M.; Hofler, H.; Walch, A. Classification of HER2 receptor status in breast cancer tissues by MALDI imaging mass spectrometry. J. Proteome Res. 2010, 9 (4), 1854−63. (20) Schwamborn, K.; Krieg, R. C.; Jirak, P.; Ott, G.; Knuchel, R.; Rosenwald, A.; Wellmann, A. Application of MALDI imaging for the diagnosis of classical Hodgkin lymphoma. J. Cancer Res. Clin. Oncol. 2010, 136 (11), 1651−5. (21) Schwartz, S. A.; Weil, R. J.; Johnson, M. D.; Toms, S. A.; Caprioli, R. M. Protein profiling in brain tumors using mass spectrometry: feasibility of a new technique for the analysis of protein expression. Clin. Cancer Res. 2004, 10 (3), 981−7. (22) Yanagisawa, K.; Shyr, Y.; Xu, B. J.; Massion, P. P.; Larsen, P. H.; White, B. C.; Roberts, J. R.; Edgerton, M.; Gonzalez, A.; Nadaf, S.; Moore, J. H.; Caprioli, R. M.; Carbone, D. P. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 2003, 362 (9382), 433−9. (23) Seeley, E. H.; Caprioli, R. M. Molecular imaging of proteins in tissues by mass spectrometry. Proc. Natl. Acad. Sci. 2008, 18126−31. (24) Seeley, E. H.; Caprioli, R. M. MALDI imaging mass spectrometry of human tissue: method challenges and clinical perspectives. Trends Biotechnol. 2011, 29 (3), 136−43. (25) Walch, A.; Rauser, S.; Deininger, S.-O.; Höfler, H. MALDI imaging mass spectrometry for direct tissue analysis: a new frontier for molecular histology. Histochem. Cell Biol. 2008, 130 (3), 421−34. (26) Kim, H. K.; Reyzer, M. L.; Choi, I. J.; Kim, C. G.; Kim, H. S.; Oshima, A.; Chertov, O.; Colantonio, S.; Fisher, R. J.; Allen, J. L.; Caprioli, R. M.; Green, J. E. Gastric cancer-specific protein profile identified using endoscopic biopsy samples via MALDI mass spectrometry. J. Proteome Res. 2010, 9 (8), 4123−30.

ASSOCIATED CONTENT

S Supporting Information *

Supplementary data and materials. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Axel Walch, Institute of Pathology, Helmholtz Zentrum München; Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany; e-mail: [email protected]; phone +49 (0)89 3187 2739; fax +49 (0)89 3187 3349.



ACKNOWLEDGMENTS A.W. gratefully acknowledges the financial support of the BMBF (grants no. 01EZ0803, no. 0315508A and no. 01IB10004E) and the Deutsche Forschungsgemeinschaft (SFB 824 TP B1, SFB 824 TP Z2, and WA 1656/3-1). We thank Ulrike Buchholz, Claudia-Mareike Pflüger and Andreas Voss for excellent technical assistance.



REFERENCES

(1) Pavlidis, N.; Fizazi, K. Cancer of unknown primary (CUP). Crit. Rev. Oncol. Hematol. 2005, 54 (3), 243−50. (2) Pavlidis, N.; Fizazi, K. Carcinoma of unknown primary (CUP). Crit. Rev. Oncol. Hematol. 2009, 69 (3), 271−8. (3) Bugat, R.; Bataillard, A.; Lesimple, T.; Voigt, J. J.; Culine, S.; Lortholary, A.; Merrouche, Y.; Ganem, G.; Kaminsky, M. C.; Negrier, S.; Perol, M.; Laforet, C.; Bedossa, P.; Bertrand, G.; Coindre, J. M.; Fizazi, K. Summary of the Standards, Options and Recommendations for the management of patients with carcinoma of unknown primary site (2002). Br. J. Cancer 2003, 89 (Suppl 1), S59−66. (4) Bloom, G.; Yang, I. V.; Boulware, D.; Kwong, K. Y.; Coppola, D.; Eschrich, S.; Quackenbush, J.; Yeatman, T. J. Multi-platform, multisite, microarray-based human tumor classification. Am. J. Pathol. 2004, 164 (1), 9−16. (5) Ma, X. J.; Patel, R.; Wang, X.; Salunga, R.; Murage, J.; Desai, R.; Tuggle, J. T.; Wang, W.; Chu, S.; Stecker, K.; Raja, R.; Robin, H.; Moore, M.; Baunoch, D.; Sgroi, D.; Erlander, M. Molecular classification of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay. Arch. Pathol. Lab. Med. 2006, 130 (4), 465−73. (6) Ramaswamy, S.; Tamayo, P.; Rifkin, R.; Mukherjee, S.; Yeang, C. H.; Angelo, M.; Ladd, C.; Reich, M.; Latulippe, E.; Mesirov, J. P.; Poggio, T.; Gerald, W.; Loda, M.; Lander, E. S.; Golub, T. R. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. U.S.A. 2001, 98 (26), 15149−54. (7) Su, A. I.; Welsh, J. B.; Sapinoso, L. M.; Kern, S. G.; Dimitrov, P.; Lapp, H.; Schultz, P. G.; Powell, S. M.; Moskaluk, C. A.; Frierson, H. F. Jr.; Hampton, G. M. Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 2001, 61 (20), 7388−93. (8) Talantov, D.; Baden, J.; Jatkoe, T.; Hahn, K.; Yu, J.; Rajpurohit, Y.; Jiang, Y.; Choi, C.; Ross, J. S.; Atkins, D.; Wang, Y.; Mazumder, A. A quantitative reverse transcriptase-polymerase chain reaction assay to identify metastatic carcinoma tissue of origin. J. Mol. Diagn. 2006, 8 (3), 320−9. (9) Tothill, R. W.; Kowalczyk, A.; Rischin, D.; Bousioutas, A.; Haviv, I.; van Laar, R. K.; Waring, P. M.; Zalcberg, J.; Ward, R.; Biankin, A. V.; Sutherland, R. L.; Henshall, S. M.; Fong, K.; Pollack, J. R.; Bowtell, D. D.; Holloway, A. J. An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Res. 2005, 65 (10), 4031−40. (10) Varadhachary, G. R.; Talantov, D.; Raber, M. N.; Meng, C.; Hess, K. R.; Jatkoe, T.; Lenzi, R.; Spigel, D. R.; Wang, Y.; Greco, F. A.; Abbruzzese, J. L.; Hainsworth, J. D. Molecular profiling of carcinoma 2002

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003

Journal of Proteome Research

Technical Note

(27) Balluff, B.; Schone, C.; Hofler, H.; Walch, A. MALDI imaging mass spectrometry for direct tissue analysis: technological advancements and recent applications. Histochem. Cell Biol. 2011, 136 (3), 227−44. (28) Rauser, S.; Deininger, S. O.; Suckau, D.; Hofler, H.; Walch, A. Approaching MALDI molecular imaging for clinical proteomic research: current state and fields of application. Expert Rev. Proteomics 2010, 7 (6), 927−41. (29) Chaurand, P.; Sanders, M. E.; Jensen, R. A.; Caprioli, R. M. Proteomics in diagnostic pathology: profiling and imaging proteins directly in tissue sections. Am. J. Pathol. 2004, 165 (4), 1057−68. (30) Cianfrocca, M.; Gradishar, W. New molecular classifications of breast cancer. CA: Cancer J. Clin. 2009, 59 (5), 303−13. (31) Ransohoff, D. F. Rules of evidence for cancer molecular-marker discovery and validation. Nat. Rev. Cancer 2004, 4 (4), 309−14.

2003

dx.doi.org/10.1021/pr200784p | J. Proteome Res. 2012, 11, 1996−2003