Development of Mass Spectrometry-Based Shotgun Method for

Jan 21, 2010 - Microproteomics with microfluidic-based cell sorting: Application to 1000 and 100 immune cells. Kie Kasuga , Yasutake Katoh , Keisuke N...
1 downloads 15 Views 1MB Size
Anal. Chem. 2010, 82, 2262–2271

Development of Mass Spectrometry-Based Shotgun Method for Proteome Analysis of 500 to 5000 Cancer Cells Nan Wang, Mingguo Xu, Peng Wang, and Liang Li* Department of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada A shotgun proteome analysis method and its performance for protein identification from 500 to 5000 cells are described. Sample preparation, which was done in one tube, involved the use of a surfactant (NP-40) for cell lysis, followed by acetone precipitation of the proteins. The resulting protein pellet was washed with cold acetone to remove remaining surfactant, and the pellet was then solubilized in NH4HCO3. After trypsin digestion of the proteins, the digest was analyzed by the use of nanoflow liquid chromatography (LC) quadrupole time-offlight mass spectrometry (QTOF MS). Sample injection and gradient speed in running LC QTOF MS were optimized. It was shown that this method could identify an average (n ) 3) of 167 ( 21, 237 ( 30, 491 ( 63, and 619 ( 59 proteins from 500, 1000, 2500, and 5000 MCF-7 breast cancer cells, respectively. To demonstrate the potential use of this method for generating proteome profile from circulating tumor cells (CTCs) isolated from human blood, a healthy human blood sample was spiked with MCF-7 cells, and this mixture was processed and then subjected to antibody tagging of the MCF-7 cells. The tagged cells were sorted and collected using flow cytometry. The proteome profiles of small numbers of cells isolated in this way were found to be similar to those of the original MCF-7 cells, suggesting the possibility of the use of this method for cell typing of CTCs. Proteome analysis of a small number of cells presents a significant opportunity for discovery of new biomarkers of diseases, such as cancer, and for developing new diagnosis and prognosis tools based on the use of proteome profiles. One potential application is in the area of analyzing the proteome of primary cells procured from a tissue sample using laser capture microdissection (LCM).1,2 In a tissue sample containing both normal and transformed (e.g., cancer) cells, the number of transformed cells may be very limited.3 This is particularly true for tissue samples from patients at an early stage of cancer * Corresponding author. E-mail: [email protected]. (1) Gutstein, H. B.; Morris, J. S. Expert Rev. Proteomics 2007, 4, 627–637. (2) Gu, Y.; Wu, S. L.; Meyer, J. L.; Hancock, W. S.; Burg, L. J.; Linder, J.; Hanlon, D. W.; Karger, B. L. J. Proteome Res. 2007, 6, 4256–4268. (3) Espina, V.; Wulfkuhle, J.; Calvert, V.; VanMeter, A.; Zhou, W.; Coukos, G.; Geho, D.; Petricoin, E.; Liotta, L. Nat. Protoc. 2006, 1, 586–603.

2262

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

development.4,5 Even if a large number of tumor cells are present in a tissue section, using a state-of-the-art LCM system and automated cell type recognition from a stained tissue, only about 1000 cells may be procured within 2 h, although, in some special cases (e.g., a tissue containing >80% homogeneous tumor cells of one type), up to 5000 cells may be procured within 2 h. For most cases, procurement of 5000 individual tumor cells in about 10 h may be feasible in small scale proteome analysis. Any further increase in the number of cells required for proteome analysis would prolong the cell procurement time to an unpractical level for high throughput analysis of a large number of samples. Another very exciting opportunity is for the characterization of the proteome of a small number of circulating tumor cells (CTCs) in the blood of patients with cancer.6-8 These cells can be isolated by immunoaffinity techniques with antibodies specific for cell-surface proteins restricted to epithelial cells. For example, in a recent study by Nagrath et al.,9 CTCs were identified in 115 of 116 patient samples with nonsmall-cell lung (55 samples), metastatic prostate (19 samples), localized prostate (7 samples), pancreatic (15 samples), breast (10 samples) and colorectal cancers (10 samples). The ranges of CTCs isolated per milliliter of blood were 5 to 20 for 17 samples, 20 to 50 for 27 samples, 50 to 100 for 22 samples, and >100 for 49 samples. This work indicates that, with a blood collection of 10 mL from a patient, 71 out of 115 patients (62%) would have more than 500 CTCs that could be isolated. Analyzing the proteome of CTCs represents one of the noninvasive strategies for the molecular profiling of cancer.8 Studies have shown that the number of CTCs does not correlate with the degree of metastasis, but their molecular profiles should be more informative.7,10-13 One potential application of proteome profiling is to monitor the proteome changes of (4) Hutter, G.; Sinha, P. Proteomics 2001, 1, 1233–1248. (5) Ladanyi, A.; Sipos, F.; Szoke, D.; Galamb, O.; Molnar, B.; Tulassay, Z. Cytometry, Part A 2006, 69A, 947–960. (6) Cristofanilli, M.; Budd, G. T.; Ellis, M. J.; Stopeck, A.; Matera, J.; Miller, M. C.; Reuben, J. M.; Doyle, G. V.; Allard, W. J.; Terstappen, L. W. M. M.; Hayes, D. F. N. Engl. J. Med. 2004, 351, 781–791. (7) Klein, C. A. Eur. J. Cancer 2008, 44, 2721–2725. (8) Sawyers, C. L. Nature 2008, 452, 548–552. (9) Nagrath, S.; Sequist, L. V.; Maheswaran, S.; Bell, D. W.; Irimia, D.; Ulkus, L.; Smith, M. R.; Kwak, E. L.; Digumarthy, S.; Muzikansky, A.; Ryan, P.; Balis, U. J.; Tompkins, R. G.; Haber, D. A.; Toner, M. Nature 2007, 450, 1235–1239. (10) Wong, N. S.; Kahn, H. J.; Zhang, L. Y.; Oldfield, S.; Yang, L. Y.; Marks, A.; Trudeau, M. E. Breast Cancer Res. Treat. 2006, 99, 63–69. (11) Sotiriou, C.; Piccart, M. J. Nat. Rev. Cancer 2007, 7, 545–553. (12) Ma, P. C.; Blaszkowsky, L.; Bharti, A.; Ladanyi, A.; Kraeft, S. K.; Bruno, A.; Skarin, A. T.; Chen, L. B.; Salgia, R. Anticancer Res. 2003, 23, 49–62. 10.1021/ac9023022  2010 American Chemical Society Published on Web 01/21/2010

CTCs during cancer treatment, along with genomic analysis and specific antibody profiling work, to increase the prognosis values of CTCs. Another potential application of analyzing CTCs is based on the hypothesis that, when tumor metastasis starts from a specific organ, the cancer cells entering into the bloodstream will have a similar proteome profile to that of the cancer cells found in the organ. Tissue biopsy or surgery allows procurement of tissue samples from which the cancer cells can be isolated using antibody recognition combined with LCM. The proteome profiles of CTCs and tumor cells from tissue samples may be compared to determine whether they are the same type and, if so, the metastasis site may be positively identified. It is clear that, in most cases, there are less than 5000 tumor cells procured from tissue or CTCs isolated from blood available for proteome analysis. Analyzing the proteome of this small number of cells is a challenging task at present, and a suite of special analytical tools is required to deal with these types of samples. There are several reports of technical advances in analyzing a small number of cells or even a single cell.14-23 Most MS work for single cell analysis has been focused on analyzing special types of cells, such as neuron cells where the cell size is very large compared to other common mammalian cells of less than 20 µm, and red blood cells where hemoglobin is the single dominate protein in a cell. For example, MS has been combined with protein digestion and peptide mass mapping to identify a single amino acid mutation of hemoglobin in a single red blood cell.22 This work demonstrated the possibility of carrying out single cell sample preparation that includes cell lysis, protein extraction, and protein digestion for MS detection of resulting peptides: the required steps for shotgun proteome analysis. However, current mass spectrometric techniques are not sufficiently sensitive to analyze many proteins present in a single cell. Analysis of proteins from small numbers of cells, either in isolation or directly in a tissue sample, has been attempted by surface-enhanced laser desorption ionization (SELDI) and matrixassisted laser desorption ionization (MALDI) MS.24–27 In some cases, tens or hundreds of peptide/protein masses can be detected. However, protein identification remains a challenge for these techniques. An alternative approach for analyzing small (13) Shaffer, D. R.; Leversha, M. A.; Danila, D. C.; Lin, O.; Gonzalez-Espinoza, R.; Gu, B.; Anand, A.; Smith, K.; Maslak, P.; Doyle, G. V.; Terstappen, L. W. M. M.; Lilja, H.; Heller, G.; Fleisher, M.; Scher, H. I. Clin. Cancer Res. 2007, 13, 2023–2029. (14) Wang, H. X.; Qian, W. J.; Mottaz, H. M.; Clauss, T. R. W.; Anderson, D. J.; Moore, R. J.; Camp, D. G.; Khan, A. H.; Sforza, D. M.; Pallavicini, M.; Smith, D. J. J. Proteome Res. 2005, 4, 2397–2403. (15) Gutstein, H. B.; Morris, J. S.; Annangudi, S. P.; Sweedler, J. V. Mass Spectrom. Rev. 2008, 27, 316–330. (16) Wang, W. J.; Guo, T.; Rudnick, P. A.; Song, T.; Li, J.; Zhuang, Z. P.; Zheng, W. X.; Devoe, D. L.; Lee, C. S.; Balgley, B. M. Anal. Chem. 2007, 79, 1002– 1009. (17) Umar, A.; Luider, T. M.; Foekens, J. A.; Pasa-Tolic, L. Proteomics 2007, 7, 323–329. (18) Wang, Z.; Han, J.; Schey, K. L. J. Proteome Res. 2008, 7, 2696–2702. (19) Rubakhin, S. S.; Greenough, W. T.; Sweedler, J. V. Anal. Chem. 2003, 75, 5374–5380. (20) Ethier, M.; Hou, W. M.; Duewel, H. S.; Figeys, D. J. Proteome Res. 2006, 5, 2754–2759. (21) Li, L.; Golding, R. E.; Whittal, R. M. J. Am. Chem. Soc. 1996, 118, 11662– 11663. (22) Whittal, R. M.; Keller, B. O.; Li, L. Anal. Chem. 1998, 70, 5344–5347. (23) Boardman, A. K.; McQuaide, S. C.; Zhu, C.; Whitmore, C. D.; Lidstrom, M. E.; Dovichi, N. J. Anal. Chem. 2008, 80, 7631–7634.

numbers of cells is to use the shotgun method where proteins can be identified and potentially quantified (e.g., using isotope labeling); however, the technical challenge lies not only in MS detection but also in the sample preparation process. Many techniques commonly used to handle large numbers of cells with good outcomes cannot be directly applied to a small number of cells. For example, it has been pointed out that surfactant-based methods were ineffective for handling small amounts of complex proteomic samples due to sample loss during cleanup and sample transfer steps.14 A shotgun method using trifluoroethanol (TFE) mixed with hypotonic aqueous buffer to lyse the cells, followed by trypsin digestion and liquid chromatography (LC) electrospray ionization (ESI) linear ion trap (LTQ) tandem mass spectrometry (MS/MS) or LC-ESI Fourier-transform ion-cyclotron resonance (FTICR) MS, has been developed.14 This method allowed the identification of 83 to 133 proteins in four replicate experiments from 5000 MCF-7 breast cancer cells using the LTQ instrument. In a separate experiment, over 5000 unique features were detected in the FTICR MS analysis of the digest of 5000 cells by the accurate mass and time tag (AMT) peptide identification method, leading to the identification of 525 different peptides corresponding to 224 proteins. A recent report indicated that, using the AMT method, a total of 590 peptides and 341 proteins could be identified from 3000 tumor cells laser-captured from breast cancer tissue.17 Using a larger AMT database built from running multiple cell lines, they reported the identification of 1003 proteins from the same LC-FTICR MS run. Although the false discovery rate (FDR) in that study was reported to be 10%,17 the accuracy of peptide or protein identification for the AMT method remains to be thoroughly investigated (e.g., how many peptides or proteins could be identified with FDR of 30% in a normalized spectrum). Most of the high intensity fragment ions (i.e., top 5) must also belong to y-, b-, or a-ions, not internal fragment ions. Peptide matches which failed to meet these criteria were removed from the protein lists. Typically, this manual verification process eliminated about 3% of the low score matches. A protein was considered to be identified even if a single peptide match was found. The QTOF instrument generally provided high quality MS/MS data, and a single-hit match could be used for protein identification with high confidence.

(29) Wang, N.; Xie, C. H.; Young, J. B.; Li, L. Anal. Chem. 2009, 81, 1049– 1060. (30) Wang, N.; Li, L. Anal. Chem. 2008, 80, 4696–4710.

(31) Elias, J. E.; Gygi, S. P. Nat. Methods 2007, 4, 207–214. (32) Fournier, M. L.; Gilmore, J. M.; Martin-Brown, S. A.; Washburn, M. P. Chem. Rev. 2007, 107, 3654–3686.

RESULTS AND DISCUSSION The shotgun method for proteome analysis is a relatively sensitive technique, compared to other methods, such as gel-based techniques.32 For example, about 1 µg of a cell extract digest injected into LC-ESI MS/MS can result in the identification of about 300-800 proteins, depending on the complexity of the sample.29,30 In the shotgun method, the sample workup process includes cell lysis, protein extraction, protein digestion, and injection of peptides into the LC-MS/MS system for analysis. Any one of the steps can potentially lose some proteins or peptides. In working with a large quantity of sample, this sample loss may not be very significant, so long as sample loss is not biased toward a particular group of proteins. If a bias (i.e., selective sample loss) does occur, that group of proteins will be under-represented in the final list of proteins identified. If sample loss is unbiased, as long as a sufficient amount of peptides is available for LC-MS/ MS analysis, the same proteome coverage would be expected. However, in handling small numbers of cells, sample loss of any type can be detrimental to the proteome coverage. This is due to the small number of cells generating a limited amount of sample which does not meet the optimal sample amount required for peptide sequencing in LC-MS/MS. In a recent report, we have shown that the amount of sample injected is very important in determining the outcome of peptide and protein identification.29 Injection of less than the optimal amount results in fewer peptides and proteins being identified. For the nano-LC QTOF MS platform used in this work, the optimal amount of peptides for injection is about 1 µg and exceeding this amount does not result in a significant increase in peptide and protein numbers. With the above considerations in mind, we developed the sample preparation protocol shown in Figure 1. MCF-7 cells, derived from breast cancer, were chosen in this study as they

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

2265

are representative of many different types of cancerous cells in terms of size and proteome complexity. Thus, the method developed from analyzing MCF-7 cells should be applicable to other cancerous cells. In addition, MCF-7 cells were also used in the work reported by Smith and co-workers for analyzing 5000 cells,14 rendering the possibility of comparing the relative analytical performance. In our work, instead of taking an aliquot from a stock suspension containing a large number of cells to make a sample of a small number of cells, the cultured MCF-7 cells were sorted into tubes containing different numbers of cells using a flow cytometer. Flow cytometry is an accurate method for counting cells and can be used for sorting cells according to cell properties, such as size and shape.33 In our initial method development of searching for the optimal lysis solution, a relatively larger number of cells (105 cells) were used and a BCA assay was carried out for protein quantification to compare the cell lysis and protein extraction efficiencies among different reagents tested. For the first three lysis reagents tested, i.e., 0.5 M acetate buffer, 0.05 M Tris buffer, and 0.1% SDS, it was found that both the Tris and SDS buffers yielded about 4 times more proteins than the acetate buffer did. Using NP-40 with no acetone wash (see below), about 46 ± 3 µg of proteins (n ) 3) could be extracted from 105 cells according to the BCA assay results, which was at least 2-fold more than that extracted using the Tris buffer. In searching for a suitable sample preparation protocol to handle small numbers of cells, we used an LC-UV technique to measure the amount of peptides produced by individual protocols tested and compare the peptide amounts to determine which protocol yielded the highest peptide amount. We did not put too much emphasis on protein quantification for handling small numbers of cells for the following three reasons. One is related to the accuracy of protein quantification using techniques such as the BCA assay. Samples of smaller numbers of cells cannot be quantified with good accuracy, as they do not contain sufficient amounts of proteins. Second, the ultimate goal of sample preparation for the shotgun method is to generate a maximum amount of peptides for LC/MS, and thus, gauging the amount of peptides produced from a sample is better than measuring the protein amount. Measuring the peptide amount allows us to track the integrity of the entire sample preparation process. For example, measuring the protein amount will not gauge the sample loss during the digestion process. Finally, our method of determining the peptide amount using LC-UV is noninvasive and can also be used to remove the salts present in the sample. In contrast, if protein quantification were done using techniques such as the BCA assay, the protein sample (or a large portion of the sample prepared from a few cells) would be unusable for shotgun analysis. We note that trypsin autolysis peaks were very weak in all LC/MS runs, suggesting that peptides from trypsin autolysis should not affect the LC-UV quantification of the protein sample digests. Three surfactant-based methods using sodium dodecyl sulfate (SDS), acid labile surfactant (ALS) from Waters, or a cell lysis solution containing NP-40 detergent as well as two buffers containing Tris or trifluoroethanol (TFE) were examined for their performance in cell lysis and downstream sample workup. Sur(33) Pappas, D.; Wang, K. Anal. Chim. Acta 2007, 601, 26–35.

2266

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

factant-based methods are widely used for efficient cell lysis in proteome analysis work involving large numbers of cells.32,34,35 In the case of SDS, after cell lysis and trypsin digestion, SDS had to be removed by a strong cation exchange column to reduce its interference with LC-ESI MS.36 For ALS, the tryptic digest was acidified to degrade ALS and the hydrophobic products were carefully removed, prior to MS analysis. The use of Tris buffer or NP-40 cell lysis solution was straightforward: the solution was mixed with the sample with subsequent intermittent sonication, followed by trypsin digestion. TFE was used according to the reported protocol.14 Among the five protocols tested, we found that the SDS, ALS, Tris, or TFE method each yielded less than 1 µg of peptides from a sample containing 5000 cells. In contrast, using NP-40 lysis solution with acetone washing, the average amount (n ) 3) of peptides from the 5000 cell sample was found to be 1.40 ± 0.12 µg. However, one major problem initially encountered using this polyethylene glycol-based detergent for cell lysis was that, after acetone precipitation of proteins from the lysate, the pellet still contained a small amount of NP-40 that caused severe interference in LC-ESI MS analysis of the cell lysate protein digest. To eliminate this interference, the pellet was carefully washed with cold acetone. This simple step was found to be very effective in reducing the NP-40 content to a level that did not cause interference in LC-ESI MS. We note that the process of protein precipitation and resolubilization as well as acetone washing can cause sample loss. However, the benefit of this approach is that we can use the NP-40 surfactant to more efficiently lyse the cells, and the surfactant can be removed after protein precipitation and washing. As Figure 1 shows, the cold acetone washed pellet was dissolved in NH4HCO3, followed by trypsin digestion. Overnight digestion, commonly used in handling larger amounts of samples, was found to be problematic in this work, as strong trypsin autolysis peaks were observed in LC-ESI MS/MS. This is not surprising as the concentrations of individual proteins in the extract of a few cells should be very low. Thus, the digestion time was reduced and optimized to be 4 h. The tryptic digest was desalted and quantified by LC-UV and then injected into the LC-ESI QTOF instrument for MS/MS sequencing of the peptides. It should be noted that the use of a 4.6 mm desalting column did not cause significant sample loss, as shown in our previous work with diluted peptide mixture solutions.29 The average amount of peptides produced from a cell lysate of the 2500 cell sample was 0.83 ± 0.12 µg, which is not exactly half of the amount of peptides produced from the 5000 cell sample. However, within the experimental errors, the amount of peptides produced appears to decrease proportionally as the cell number decreases. If this proportionality held true for the 1000 and 500 cell samples, then the amount of peptides produced would be less than 0.28 µg for the 1000 cell sample and 0.14 µg for the 500 cell sample. The low limit of the UV-LC system used to measure the peptide amount is about 0.25 µg.29 We attempted to measure the peptide amounts for the 1000 and 500 cell samples, but the results were not reliable as they generated UV absorbance signals with intensities similar to that of (34) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Yates, J. R. Nat. Biotechnol. 2003, 21, 532–538. (35) Speers, A. E.; Wu, C. C. Chem. Rev. 2007, 107, 3687–3714. (36) Wang, N.; MacKenzie, L.; De Souza, A. G.; Zhong, H. Y.; Goss, G.; Li, L. J. Proteome Res. 2007, 6, 263–272.

the blank. The failure to quantify the 1000 cell sample suggests that the amount of peptides produced from this sample was less than 0.25 µg. Thus, sample loss may be more severe for this sample (and the 500 cell sample), compared to the 2500 or 5000 cell sample. This is understandable, as the same protocol was applied to these samples and the same amount lost (e.g., via adsorption to the container walls) would result in a greater percentage of sample loss for the 1000 or 500 cell samples. Besides sample preparation, optimization of the LC-ESI MS runs is also critical in analyzing samples of a few cells. In our work, a trap column was used to facilitate the peptide loading to the nano-LC QTOF MS instrument. For sample injection, the minimum volume of residual sample required to be present in the sample vial is about 1 µL. In our experiment, after drying the desalted samples, each sample was reconstituted to 11 µL by adding 0.1% formic acid, from which two injections of 5 µL were carried out for one LC/MS run. After two injections, 1 µL of sample remained in the sample vial, and thus, two injections should, in theory, load about 91% of the sample to the column. Because of the small number of cells used, only one LC/MS run could be carried out from a sample. Any replicate runs discussed in this work were from the experimental replicates, i.e., individual solutions containing the same number of cells were prepared, followed by the workflow shown in either Figure 1 or 2. After sample injection, peptides are separated by a solvent gradient optimized for chromatographic resolution. However, the gradient speed can significantly affect the detectability of peptides in LC-MS/MS. If a fast gradient is used, a peptide elutes quickly to form a fast rising peak in an ion chromatogram, resulting in intense signals in both MS and MS/MS spectra. However, in this case, only a few MS and MS/MS spectra can be acquired within the peak elution time. If a slow gradient is used, the same peptide would elute more slowly to form a broader peak and the mass spectral signal of the peptide would be less intense. If a sufficient amount of sample is injected, the peptide signal intensity may be adequate to generate a database-searchable MS/MS spectrum. One major advantage of the use of a slow gradient for peptide elution is that a greater number of MS and MS/MS spectra can be acquired over this broad peak. For the analysis of a complex peptide sample, coelution of different peptides cannot be avoided and one always tries to sequence as many coeluting peptides as possible; slow gradients provide this opportunity. However, if the amount of sample injected is small, the peptide signal may not be sufficiently intense to produce a databasesearchable MS/MS spectrum. Thus, the gradient speed needs to be optimized according to the sample amount injected to the LCMS/MS instrument. We have investigated how the gradient speed affects the number of peptides identified by LC-ESI MS/MS. It was found that the optimum gradient time increased as the number of cells in a sample increased. In addition, within a group of samples (e.g., the 500 cell samples), there was an optimal gradient time for detecting peptides. A too slow gradient resulted in the identification of fewer peptides. Thus, the gradient time was adjusted according to the number of cells used for proteome analysis. Specifically, for the 500 cell samples, a 90 min gradient was used. The gradient time was increased to 150 min for the 1000 cell samples. The gradient time was 180 and 270 min for the 2500

Figure 3. Base peak chromatograms from nano-LC QTOF MS/MS analysis of the trypsin digests from cell lysates of different numbers of cells.

cell and 5000 cell samples, respectively. Figure 3 shows the representative ion chromatograms generated from the trypsin digests of whole cell lysates of 500, 1000, 2500, and 5000 cells. Table 1 summarizes the peptide and protein identification results from the 500, 1000, 2500, and 5000 cell samples. Lists of the identified proteins and peptides along with the MASCOT Table 1. Unique Proteins and Peptides Identified from Different Numbers of Cells number of cells

unique peptidesa

unique proteins

500

369 386 389 574 485 481 1036 1531 1358 1630 2161 1883

168 187 145 271 226 215 422 546 504 552 665 640

1000 2500 5000

a

From three experimental replicates.

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

2267

Table 2. Overlaps of Proteins Commonly Identified in Different Numbers of Cells comparative cell numbers

number of overlapped proteins

%a

500 vs 1000 500 vs 2500 500 vs 5000 1000 vs 2500 1000 vs 5000 2500 vs 5000

57 65 67 106 109 248

80 92 94 93 96 84

a Percent ratio between the number of overlapped proteins and the number of proteins found in the sample of smaller number of cells.

Figure 4. Protein and peptide identification results under optimized sample preparation and LC-MS/MS conditions.

search results are shown in Tables S1-S4 in the Supporting Information. In each group, three replicate experiments were carried out. The numbers of peptides and proteins identified from these samples are plotted in Figure 4. As Table 1 and Figure 4 show, the numbers of both peptides and proteins increase as the cell number increases and the number change is not in linear proportion to cell numbers. An average of 1891 ± 266 peptides or 619 ± 59 proteins (n ) 3) were identified from the 5000 cell sample. These numbers compared favorably to 305, 211, 290, and 179 peptides or 113, 85, 133, and 83 proteins identified in four replicate runs of 5000 cell samples, as reported by others.14 The significant difference can be attributed to several factors, including differences in sample handling, LC-ESI MS instrumentation and MS running conditions. In the case of 500 cell samples, 381 ± 11 peptides or 167 ± 21 proteins were identified using our method. Although the cell number decreases by 10-fold, compared to the 5000 cell sample, the number of peptides and proteins identified decreases by only about 5.0- and 3.7-fold, respectively. However, the peptide/protein ratio decreases from 3.05 for the 5000 cell sample to 2.14 for the 500 cell sample. These results indicate that we can identify an average of 167 proteins from 500 cells, 237 proteins from 1000 cells, 491 proteins from 2500 cells, and 619 proteins from 5000 cells. In all cases, the experiment-to-experiment reproducibility in terms of the numbers of peptides and proteins identified was good, indicating that the experimental protocol used in this study can be used to generate reproducible results from as few as 500 cells. To examine which set of proteins were consistently identified among all these samples, the proteins commonly identified in the three experimental replicate runs of the 5000, 2500, 1000, or 500 cell samples were grouped as common proteins. These common proteins were then compared to determine the level of protein overlaps among different samples. Table 2 shows a summary of the comparison results. Among the 71 common proteins identified in the 500 cell samples, 80% of them overlapped with the 144 common proteins found in the 1000 cell samples. This overlapping rate increased to 92%, compared to the 297 common proteins found in the 2500 cell samples, and 94%, compared to the 380 common proteins found in the 5000 cell samples. The common proteins found in the 1000 cell samples had a similar overlapping rate (96%) when compared to the 5000 cell samples, whereas this rate decreased when the common proteins found in the 2500 cell samples were compared to those from the 5000 cell samples (84%). 2268

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

The common proteins found in different samples were also categorized in two ways according to their cellular component shown in Figure 5A and molecular function shown in Figure 5B after extracting the gene ontology (GO) information from ExPASy. Figure 5A shows that the largest proportions of proteins are from cytoplasm and the nucleus. Proteins localized in cytoskeleton, ribosome, cell membrane, and mitochondria also take up relatively high proportion. A lower percentage of cell membrane proteins detected might be due to their relatively lower abundance and/ or insufficient protein solubilization in the protein purification step for the whole cell lysate. On the classification of protein molecular functionality, Figure 5B shows that binding activities, particularly protein binding and DNA/RNA binding activities, take up the largest proportions. Among the rest of the proteins, many have catalytic activity and structural molecule activity. From the two bar-diagrams shown in Figure 5, no dramatic change was found in the distributions of the GO term categorizations, indicating that our sample preparation protocol has no apparent bias toward certain types of proteins identified in samples of different cell numbers. Changes in some of the individual categories are not unexpected. For example, the percentage of nucleus proteins decreased from 27% to 17% as the cell number decreased, whereas the percentage of cytoskeleton proteins increased slightly. This may be due to the fact that, when the sample amount was reduced, some of the low abundance nucleus proteins became difficult to identify in the LC/MS analysis, but the relatively high abundance cytoskeleton proteins from the 500 cell samples were still identifiable. There are 54 proteins consistently identified from all the samples, and a list of these proteins are shown in Table S5 in the Supporting Information. Of interest, tumor protein D54 was found in all runs except two runs for 500 cells. However, even in these two runs, one peptide from this protein with a score of 17 which was lower than the threshold score 24 could be detected on the basis of its retention time, precursor mass, and a few major fragment ions, compared to retention time and MS/MS spectra of the same peptide confidently identified in other samples. As discussed below, for our future work, we will develop spectral matching strategies based on the use of a MS/MS spectral library of peptides, instead of a MASCOT search against a proteome, for peptide/protein identification, which is believed to improve the sensitivity and should be able to identify tumor protein D54 in 500 cells. As a member of the D52-like family adaptor proteins, tumor protein D54 has been found to be significantly overexpressed in various tumor tissues especially in malignant breast

Figure 5. Cluster analysis results of proteins commonly identified in different numbers of cells: distributions of the identified proteins according to (A) GO cellular component and (B) GO molecular function categories.

cancer tissues.37-39 Although the function of this overexpressed protein in cancer cells has not been well characterized, it has been predicted to play a crucial role in promoting malignant proliferation as it shares functional characteristics with tumor protein D52.40 Tumor protein D54 has been reported to have significant prognostic value in pancreatic cancer, colon cancer, and especially breast carcinoma.41,42 (37) Shehata, M.; Weidenhofer, J.; Thamotharampillai, K.; Hardy, J. R.; Byrne, J. A. Crit. Rev. Oncog. 2008, 14, 33–55. (38) Boutros, R.; Fanayan, S.; Shehata, M.; Byrne, J. A. Biochem. Biophys. Res. Commun. 2004, 325, 1115–1121. (39) Ramaswamy, S.; Tamayo, P.; Rifkin, R.; Mukherjee, S.; Yeang, C. H.; Angelo, M.; Ladd, C.; Reich, M.; Latulippe, E.; Mesirov, J. P.; Poggio, T.; Gerald, W.; Loda, M.; Lander, E. S.; Golub, T. R. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 15149–15154. (40) Byrne, J. A.; Nourse, C. R.; Basset, P.; Gunning, P. Oncogene 1998, 16, 873–881. (41) Nakamura, T.; Furukawa, Y.; Nakagawa, H.; Tsunoda, T.; Ohigashi, H.; Murata, K.; Ishikawa, O.; Ohgaki, K.; Kashimura, N.; Miyamoto, M.; Hirano, S.; Kondo, S.; Katoh, H.; Nakamura, Y.; Katagiri, T. Oncogene 2004, 23, 2385–2400. (42) Miller, D. V.; Leontovich, A. A.; Lingle, W. L.; Suman, V. J.; Mertens, M. L.; Lillie, J.; Ingalls, K. A.; Perez, E. A.; Ingle, J. N.; Couch, F. J.; Visscher, D. W. Mod. Pathol. 2004, 17, 756–764.

While identification of one or more specific tumor biomarkers from small numbers of cells may prove to be useful for tumor diagnosis and progression monitoring, the clinical utility of proteome analysis from small numbers of cells may lie in the proteome profiling work. The ability to detect hundreds of proteins from as few as 500 cells using the current protocol opens the possibility of studying the proteome of a small number of cells, such as CTCs isolated from blood of patients with cancer. Proteome profile may be used as a signature or fingerprint to identify a specific type of cancer cell in human blood for detection, diagnosis, and monitoring of cancer. To mimic the scenario of analyzing CTCs in blood, we used a model system where fresh human blood was spiked with MCF-7 cells, followed by isolation of these cells using density separation, antibody recognition, and flow cytometry. In this work, the erythrocytes were removed from peripheral blood leukocytes (PBL) and MCF-7 cells by the use of the FicollHypaque technique. This is a commonly used centrifugation technique for separating lymphocytes from other components in blood according to their density differences. Studies have shown Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

2269

Figure 7. Protein and peptide identification results of the MCF-7 cells isolated from a spiked blood sample.

Figure 6. Flow cytometry results of the MCF-7 cells labeled with anti-HEA-FITC and the PBL cells. (A) 2D dot plot of MCF-7 and PBL mixtures where x-axis parameter (FSC) indicates the size of cells, and y-axis (SSC) indicates granularity of cells. (B) The fluorescence response of both cells in the suspension where x-axis parameter (HEA-FITC) indicates intensity of FITC fluorescence in log scale.

that the MCF-7 cells preferentially sediment with the PBL at the plasma and Ficoll interface, on the basis of their density.43,44 As a result, the spiked MCF-7 cells can be collected through the isolation of PBL from the interface, the buffy coat layer, after centrifugation. The buffy coat layer was then washed and resuspended in a PBS buffer. The PBL cells are physically smaller than MCF-7 cells (averaging 8 µm compared to 18 µm),43,45 and their cell contents are also less complex than the cancer cells. These physical differences should be adequate for a flow cytometer to differentiate the MCF-7 cells from PBL in cell sorting. However, to enhance the confidence of collecting the MCF-7 cells, FITC conjugated mouse antihuman HEA, an antibody specific to a human epithelial marker, was used.28 In this case, the MCF-7 cells were stained with the fluorescent antibody whereas the PBL were not. Figure 6A shows the two-dimensional (2D) scatter plot of the flow cytometry analysis of the cell mixture. It clearly shows two populations, i.e., these cells could be separated on the basis of size and cell content. Population A represents the MCF-7 cells, and population B represents PBL. To further guarantee only the cancer cells were collected, instead of the debris or aggregated cells, the gate for MCF-7 cell sorting was conservatively shrunk. Figure 6B presents the log scale fluorescence histogram of all the cells in the suspension. Given that only the MCF-7 cells are (43) Lara, O.; Tong, X. D.; Zborowski, M.; Chalmers, J. J. Exp. Hematol. 2004, 32, 891–904. (44) Partridge, M.; Phillips, E.; Francis, R.; Li, S. R. J. Pathol. 1999, 189, 368– 377. (45) Chosy, E. J.; Nakamura, M.; Melnik, K.; Comella, K.; Lasky, L. C.; Zborowski, M.; Chalmers, J. J. Biotechnol. Bioeng. 2003, 82, 340–351.

2270

Analytical Chemistry, Vol. 82, No. 6, March 15, 2010

fluorescently labeled, population D should be the MCF-7 cells. A very small percentage of nonspecific binding of the antibody to PBL was expected. However, with both gating strategies, shown in Figure 6A,B, applied simultaneously during the flow cytometry analysis, the cancer cells were confidently sorted and collected. The proteome profile of the isolated cells was then generated by the method described above and compared to those of the MCF-7 cell lines. The entire workflow for the isolation of the MCF-7 cells in blood is shown in Figure 2 and has been described in the Experimental Section. Figure 7 shows the numbers of peptides and proteins identified from different numbers of cells isolated from the blood samples. The numbers are very similar to those obtained from the samples prepared directly from the cultured cells. Moreover, the proteome profiles are very similar, judging from the common proteins obtained from the two comparative samples (see Table 3). In Table 3, the results of intraand intersample comparison (i.e., percentage of common proteins found in two samples) are listed. For example, in the case of 500 cells, three replicate experiments were carried out for the 500 cell samples (Table 3 refers to them as A, B, and C). Likewise, three replicate experiments were done for the 500 cell samples from blood spiked with MCF-7 cells (A′, B′, and C′ in Table 3). Within the data set of A, B, and C, the average percentage of common proteins found in two samples is 57% ± 10%. For the A′, B′, and C′ samples, the average is 65% ± 11%. The average common protein percentage from the comparison of A vs A′, B vs B′, and C vs C′ is 60% ± 14%. The difference of these data is not significant. Thus, these proteome profiles are considered to be indistinguishable. This example illustrates that it is possible to generate a proteome profile from as few as 500 cells isolated from a blood sample and the proteome profile may be used for cell typing. For the real world applications of this technique for generating proteome profiles of CTCs isolated from blood of patients with cancer, based on the study by Nagrath et al.,9 the low limit of 500 cells would render the current technique useful for 71 out of 115 patients (62%) with 10 mL blood collection per patient. This level of applicability should be useful in clinical situations, such as in longitudinal monitoring of cancer during treatment. However, if the technique could be made more sensitive to generate a similar level of proteome coverage (∼167 proteins) when 200 cells were used, it might be applied to 98 out of 115 patients (85%). We attempted to analyze 250 cells using the protocol described above, and only about 50 proteins were identified, a dramatic reduction in

Table 3. Summary of Protein Identification Results from Different Runs 500 cells sample A and B B and C C and A A′ and B′ B′ and C′ C′ and A′ A and A′ B and B′ C and C′

1000 cells

overlap (%)a

average

overlap (%)a

64 57 41 49 73 51 48 49 62

57 ± 10

59 60 56 69 54 59 51 59 61

58 73 48 75 67 72 47 82 73

65 ± 11 60 ± 14

70 63 70 58 61 56 66 53 60

2500 cells average 63 ± 6 60 ± 5 58 ± 5

overlap (%)a 72 64 60 69 71 67 57 62 70

78 77 78 70 70 66 74 76 70

5000 cells average 72 ± 7 69 ± 2 68 ± 7

overlap (%)a 74 63 64 72 72 67 66 68 77

76 74 77 75 76 73 72 73 77

average 71 ± 6 72 ± 3 72 ± 4

a Percentage of common proteins found in two comparative runs. A, B, and C refer to the samples of three replicate experiments from the MCF-7 cells. A′, B′, C′ refer to the samples of three replicate experiments from the cells isolated from blood spiked with the MCF-7 cells.

protein number considering the trend of gradual protein number decrease from 5000 cells to 500 cells. Careful inspection of the MS/MS spectra generated from 250 cell samples revealed that many of the spectra had some characteristic fragment ions similar to those shown in their corresponding MS/MS spectra collected from a larger number of cells. Unfortunately, the MASCOT database search scores were below the identity threshold scores. Thus, MS/MS database searching did not match any peptides. Our future work on technical development will focus on the sensitivity improvement of the current method. As pointed out earlier, the sample preparation protocol needs to be further developed to avoid sample loss in dealing with