Anal. Chem. 2004, 76, 1002-1007
Relative Quantitation of Intact Proteins of Bacterial Cell Extracts Using Coextracted Proteins as Internal Standards Tracie L. Williams,* John H. Callahan, Steven R. Monday, Peter C. H. Feng, and Steven M. Musser
Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740-3835
A method for quantitating protein expression using LC/ MS of whole proteins is described. This method is based on the fact that some proteins present in cells are abundant universal proteins whose expression levels exhibit little variation. This method demonstrates that these coextracted proteins can be used as internal standards to which the other proteins in the sample can be compared. By comparing the intensities of a selected protein to marker proteins, or internal standards, a relative ratio is obtained. This ratio can then be used to determine the relative amount of protein expression between cellular extracts. The validity of this approach is described for a standard protein mixture, as well as, E. coli cells that were known to differentially express green fluorescent protein. The proteome is defined as the set of all expressed proteins in a cell, tissue, or organism.1 One goal of proteomic analysis is the identification and quantification of all expressed proteins encoded in the genome and assessment and location of any posttranslational modifications. Although the focus of the majority of the published literature related to this subject has been associated with eukaryotic cells,2-5 the same tools can provide important information in the study of bacterial cells.6,7 Microbial contamination seriously affects millions of people annually and contributes to deaths of thousands.8 Proteomic studies of bacteria can be used for identifying differences between pathogenic and nonpathogenic bacterial strains as reflected in expression of new proteins, differences in the amount of expressed proteins, protein sequence mutations, * Corresponding author: (e-mail)
[email protected]. (1) Pennington, S. R.; Wilkins, M. R.; Hochstrasser, D. F.; Dunn, M. J. Trends Cell Biol. 1997, 7, 168-173. (2) Bini, L.; Magi, B.; Marzocchi, B.; Arcuri, F.; Tripodi, S.; Cintorino, M.; Sanchez, J.-C.; Frutiger, S.; Hughes, G.; Pallini, V.; Hochstrasser, D. F.; Tosi, P. Electrophoresis 1997, 18, 2832-2841. (3) Page, M.; Amess, B.; Townsend, R. R.; Parekh, R.; Herath, A.; Brusten, L.; Zvelebil, M.; Stein, R. C.; Waterfield, M. D.; Davies, S. C.; O’Hare, M. J. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 12589-12594. (4) Hathout, Y.; Riordan, K.; Gehrmann, M.; Fenselau, C. J. Proteome Res. 2002, 1, 435-442. (5) Yan, F.; Balanehru, S.; Nakeff, A.; Barder, T. J.; Parus, S. J.; Lubman, D. M. Anal. Chem. 2003, 75, 2299-2308. (6) Ogorzalek Loo, R. R.; Cavalcoli, J. D.; VanBogelen, R. A.; Mitchell, C.; Loo, J. A.; Moldover, B.; Andrews, P. C. Anal. Chem. 2001, 73, 4063-4070. (7) Rosen, R.; Ron, E. Z.Mass Spectrom. Rev. 2002, 21, 244-265. (8) Altekruse, S. F.; Cohen, M. L.; Swerdlow, D. L. Emerging Infect. Dis. 1997, 3, 285-293.
1002 Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
or differences in posttranslational protein modifications, such as phosphorylation and acetylation. More importantly, from a public health perspective, quantitative measurement of protein expression can be a useful tool in the analysis and identification of specific virulence factors, which can be used to differentiate pathogenic strains of bacteria from nonpathogenic strains and serve as markers for development of other analytical methods. Two-dimensional polyacrylamide gel electrophoresis (2DPAGE) is the standard method used in comparative proteomic studies for measuring changes in protein expression levels by comparing spot intensities.9,10 However, its limited dynamic range, lack of reproducibility, and inability to resolve proteins of similar molecular weight are significant drawbacks to this approach. In addition, identification of the protein of interest in a 2D-PAGE experiment is best done using mass spectrometrysa process that requires digestion of the protein to remove it from the gel prior to mass spectrometry analysis. One attempt to overcome the limitations of 2D-PAGE is the application of liquid chromatography/mass spectrometry (LC/ MS) to the separation and analysis of cell lysates.11-21 This approach provides more accurate mass information on the intact proteins of the cell lysate. Although LC/MS overcomes many of the limitations found in 2D-PAGE, the method does not easily lend itself to provide a useful protein “profile” of the bacterial strain. (9) Issaq, H. J. Electrophoresis 2001, 22, 3629-3638. (10) Rabilloud, T. Proteomics 2002, 2, 3-10. (11) Xiang, F.; Anderson, G. A.; Veenstra, T. D.; Lipton, M. S.; Smith, R. D. Anal. Chem. 2000, 72, 2475-2481. (12) Liang, X.; Zheng, K.; Qian, M. G.; Lubman, D. M. Rapid Commun. Mass Spectrom. 1996, 10, 1219-1226. (13) Opitek, G. J.; Lewis, K. C.; Jorgenson, J. W.; Anderegg, R. J. Anal. Chem. 1997, 69, 1518-1524. (14) Chen, Y.; Wall, D.; Lubman, D. M. Rapid Commun. Mass Spectrom. 1998, 12, 1994-2003. (15) Chong, B. E.; Lubman, D. M.; Miller, F. R.; Rosenspire, A. J. Rapid Commun. Mass Spectrom. 1999, 13, 1808-1812. (16) Wall, D. B.; Lubman, D. M.; Flynn, S. J. Anal. Chem. 1999, 71, 38943900. (17) Chong, B. E.; Kim, J.; Lubman, D. M.; Tiedje, J. M.; Kathariou, S. J. Chromatogr., B 2000, 167-177. (18) Chong, B. E.; Hamler, R. L.; Lubman, D. M.; Ethier, S. P.; Rosenspire, A. J.; Miller, F. R. Anal. Chem. 2001, 1219-1227. (19) Wall, D. B.; Kachman, M. T.; Gong, S.; Hinderer, R.; Parus, S.; Misek, D. E.; Hanash, S. M.; Lubman, D. M. Anal. Chem. 2000, 72, 1099-1111. (20) Chong, B. E.; Yan, F.; Lubman, D. M.; Miller, F. R. Rapid Commun. Mass Spectrom. 2001, 15, 291-296. (21) Kachman, M. T.; Wang, H.; Schwartz, D. R.; Cho, K. R.; Lubman, D. M. Anal. Chem. 2002, 74, 1779-1791. 10.1021/ac034820g Not subject to U.S. Copyright. Publ. 2004 Am. Chem. Soc.
Published on Web 01/10/2004
We recently described an automated procedure for generating bacterial protein profiles from the LC/MS chromatogram of bacterial cell lysates.22 The method translates the chromatographic and multiply charged protein information into one comprehensive mass versus intensity spectrum. While it was clear that this method could be used to differentiate the bacterial strains based on differences in the molecular weights of proteins, it was not obvious whether differences in protein intensity could be correlated with levels of expression. Since accurately measuring protein expression is fundamentally important to understanding cell function, a number of mass spectrometric-based methods have recently been introduced for proteomic quantitation. Among them, proteolytic 18O labeling,23-25 isotope-coded affinity tag,26,27 global internal standard technology,28 and absolute quantification29 analysis have recently gained popularity as a means for high-throughput quantitative proteome analysis. These techniques employ tryptic digestion of complex protein mixtures followed by quantification by differential isotope labeling of the resulting tryptic peptides. This paper presents an alternative, complementary technique for the quantitative measurement of protein expression using coextracted proteins as internal standards. Unlike other LC/MSbased methods for measuring protein expression, prior digestion of the proteins is not required. This method was first demonstrated using a model system of well-defined proteins. It was then applied to Escherichia coli cells that were known to differentially express green fluorescent protein (GFP). This system was used to demonstrate the feasibility of this approach to differentiate pathogenic and nonpathogenic strains of bacteria based on not only significant differences in protein mass but also on variations in levels of protein expression. EXPERIMENTAL SECTION Acetonitrile, ethanol, acetone, and HPLC grade water were purchased from J. T. Baker (Phillipsburg, NJ). Insulin (bovine pancreas), cytochrome c (horse heart), apomyoglobin (horse skeletal muscle), carbonic anhydrase (bovine erythrocytes), Trisma base, and formic acid were purchased from Sigma-Aldrich Chemical Co. (St. Louis, MO). All were used without further purification. Insulin, cytochrome c, apomyoglobin, and carbonic anhydrase stock solutions were made in 10 mM Trisma base (pH 7.4). The final concentration of each was 74 nmol/mL. An E. coli ATCC 43890 (Stx1-producing O157:H7) parental strain, an ATCC 43890 GFP-expressing strain, and E. coli DH5R (pKK223-3:GFP) strain were used for these experiments. The parental ATCC 43890 strain is unaltered and does not express GFP, while its GFP-expressing counterpart was transformed such (22) Williams, T. L.; Leopold, P. E.; Musser, S. M. Anal. Chem., 2002, 74, 58075813. (23) Yao, X.; Freas, A.; Ramirez, J.; Demirev, P. A.; Fenselau, C. Anal. Chem. 2001, 73, 2836-2842. (24) Yao, X.; Afonso, C.; Fenselau, C. J. Proteome Res. 2003, 2, 147-152. (25) Wang, Y. K.; Quinn, D. F.; Ma, Z.; Fu, E. W. J. Chromatogr., B 2002, 782, 291-306. (26) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (27) Wang, S.; Regnier, F. E. J. Chromatogr., A 2001, 924, 345-357. (28) Chakraborty, A.; Regnier, F. E. J. Chromatogr., A 2002, 949, 173-184. (29) Gerber, S. A.; Rush, J.; Stemman, O.; Kirschner, M. W.; Gygi, S. P. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 6940-6945.
that it contained the GFP and aphA3 (kanamycin resistance) open reading frames (ORFs) within the low copy number virulence plasmid (pO157) of this strain. The E. coli DH5R (pKK223-3:GFP) strain carried the GFP ORF inserted into the high copy number plasmid pKK223-3 (Pharmacia). Thus, GFP was produced in the cells in varying amounts. For analysis, confluent cultures of each strain were prepared on trypticase soy agar (TSA) plates. E. coli ATCC 43890 was grown on TSA containing no antibiotics, while E. coli ATCC 43890 (GFP/ aphA3) and E. coli DH5R (pKK223-3-GFP) were grown on TSA containing kanamycin (50 µg/mL) and TSA containing ampicillin (100 µg/mL), respectively. All plates were incubated overnight at 37 °C. Following incubation, bacteria were collected for subsequent processing. Upon harvesting, the cells were placed in 1 mL of 70% ethanol and stored at 4 °C until needed. For experiments, a slurry of the cells in ethanol was made and a 200-µL aliquot was taken. The whole cells were centrifuged to form a loosely packed cell pellet, and the ethanol was removed. One milliliter of acetone was added to dissolve cell wall lipids. After vortexing for 5 min, the cells were centrifuged to a pellet and the acetone was removed. To extract proteins from the cells, 1 mL of 100 mM Trisma base (pH 7.4) was added to the cell pellet. The microtubes containing the cells and extraction solution were placed in an ultrasonic water bath (Sonicor, Copiague, NY) for 30 min for gentle sonication in the extraction solvent. The cells were then centrifuged to a pellet, and the solution extract was removed. The extraction solution was analyzed without further purification. A TBS-380 Mini-Fluorometer with a minicell adapter (Turner BioSystems, Inc., Sunnyvale, CA) was used to quantitate GFP in the cells. The fluorometer was custom fitted (by the manufacturer) with a filter that permits detection of GFP. Excitation occurred at 365-395 nm, and light emitted at 515-575 nm was detected. Recombinant GFPuv (Clontech Laboratories, Inc., Palo Alto, CA) was used as a calibration standard. An Agilent 1100 HPLC system (Palo Alto, CA) fitted with a 20 cm × 1.0 mm inner diameter LC column, packed in-house with Poros 10 R1 packing (Applied Biosystems, Framingham, MA), was used to separate the proteins of the whole cell bacterial extract. A sample was injected onto the column, and the separation was carried out at a flow of 200 µL/min with a shallow gradient (10-50% B in 50 min). Solvent A contained 5% formic acid in water, and solvent B was 5% formic acid in acetonitrile. A Micromass (Manchester, U.K.) QTOF II was used to acquire the data in full-scan continuum mode with an m/z range from 100 to 2000. Data were processed using Masslynx v.4.0 software, and the multiply charged protein spectrum was deconvoluted into a molecular weight spectrum using Micromass’s MaxEnt 1 program. Automated analysis of the data files was performed with ProteinTrawler (formerly known as Retana22), custom software written for this purpose by BioAnalyte, Inc. (Portland, ME). The function of this program is to automate data processing subroutines within the data processing program and to produce a combined time and intensity text output file. Briefly, the program sums all data within a specified time interval, uses MaxEnt 1 to deconvolute the multiply charged ions, centers the result, Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
1003
Figure 1. Total ion chromatograms of 1:1:1:1, 1:1:2:1, and 1:1:5:1 mixtures of insulin, cytochrome c, myoglobin, and carbonic anhydrase.
performs a threshold selection, and reports the mass, intensity, and retention time of the protein in a text file. It continues this process across sequential portions of the chromatogram. All aspects of the subroutines including retention times, mass windows, number of MaxEnt 1 iterations, and spectra to combine can be controlled through ProteinTrawler. Upon completion of the ProteinTrawler program, the text file contains a cumulative list of all the masses that were observed upon deconvolution of the individual summed spectra. This text file records mass, intensity, and retention time. The retention time information is held in the text file for the user to reference if a protein is singled out or deemed significant for further study and thereby facilitates the isolation and purification process. It can also be used to verify that proteins of the same mass are actually two unrelated proteins as indicated by their different retention times. The mass and intensity list is converted into a file that Masslynx is able to read via the Databridge program. Alternatively, a graphing program such as Grapher can read the text file. In this paper, Grapher v.3 (Golden Software, Inc., Golden, CO) was used to manipulate and display the data. RESULTS AND DISCUSSION There are a number of variables that must be controlled in order to conduct a quantitative experiment of this type. Sample preparation including cell lysis and protein extraction procedures, chromatographic methods, and ProteinTrawler postprocessing parameters must remain consistent in all samples in which comparisons are to be made. Also, MaxEnt 1 must be allowed to proceed to convergence for the output to have quantitative 1004 Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
significance. In most instances, convergence is achieved by 20 iterations; however, we set this parameter to 30 iterations in order to ensure convergence. A mixture of protein standards was used to evaluate the precision and accuracy of the method. Insulin, cytochrome c, myoglobin, and carbonic anhydrase were mixed in 1:1:1:1, 1:1:2: 1, 1:1:5:1, 1:1:10:1, and 1:1:20:1 molar ratios (stock solutions were each 74 nmol/mL). The chromatograms of three mixtures are shown in Figure 1. The peak areas of the different proteins differ even though equal molar amounts were injected onto the column. However, the peak area ratio of proteins whose molar ratio is unchanged remains constant in all chromatograms. In this example, carbonic anhydrase has a peak area 4 times larger than cytochrome c and 12 times larger than insulin. Comparing peak areas of chromatograms is generally not acceptable due to peak area contributions from coeluting proteins. This problem is evident as the amount of myoglobin is increased to the point that its chromatographic peak overlaps that of carbonic anhydrase, making it difficult to determine the peak area of the two proteins. However, when the multiply charged spectra are deconvoluted with MaxEnt 1, each coeluting protein is resolved. When the chromatographic and multiply charged protein information is translated into one comprehensive mass versus intensity spectrum by ProteinTrawler, the reported “intensity” is actually the peak area obtained from the MaxEnt 1 program. To maintain continuity with data produced by mass spectrometers, we choose to report this value as intensity rather than confuse the subject by labeling the y-axis as a peak area.
Figure 2. Increments of myoglobin plotted versus the normalized intensity ratio of myoglobin with respect to carbonic anhydrase. The slope of the regression line is 1.02 with an R 2 value of 0.9885. Standard deviation around individual determinations did not exceed 10%.
Five replicate chromatograms were produced of the five protein mixtures and processed using ProteinTrawler in 30-s intervals from 20 to 40 min. In this experiment, in which the amount of protein that has been injected is known and well-controlled, it could be argued that the intensity of myoglobin could be directly compared to the intensity of any of the unchanged proteins within a single sample. However, this would be problematic in a real sample, such as a cell lysate, in which extraction efficiency and solubility of individual proteins in the lysate is unknown. Instead, to account for such experimental variables, one protein should be identified as a “marker” protein (i.e., the internal standard) and the ratio of the differentially expressed protein to the marker protein noted. This ratio can then be compared between samples to determine expression differences. In this case, carbonic anhydrase was singled out as the marker protein. In the 1:1:1:1 mixture, myoglobin/carbonic anhydrase had an intensity ratio of 0.90. This intensity ratio increased to 2.1, 4.7, 8.5, and 20.7 in the 1:1:2:1, 1:1:5:1, 1:1:10:1, and 1:1:20:1 samples, respectively. These ratios can now be directly compared between samples to determine that there is 2.3 times the amount of myoglobin in the 1:1:2:1 sample and 5.2 times the amount in the 1:1:5:1 sample and so on, as expected. When plotted as molar ratio of myoglobin versus myoglobin/carbonic anhydrase intensity ratio (Figure 2), the linearity of the experiment indicates that this method could be used to compare the samples to one another for relative quantitation purposes. We extended this method to the quantitative analysis of GFP in E. coli cells. The protein profile is obviously affected by the method used for extracting proteins from the cells. An extraction solution is used in which organic solvent in combination with acid (formic acid or trifluoroacetic acid) provides for more proteins to be extracted from the cells and, consequently, observed by LC/ MS (data not shown). However, GFP was not soluble in organic
solvents and denatured under acidic conditions. Since the intent of this work is to focus on GFP expression rather than the number of different proteins that could be extracted from the cells, the extraction was performed using a 100 mM Trisma base (pH 7.4) solution. Fluorescent detection was used to determine the concentration of GFP extracted from the cells. Using recombinant GFPuv (rGFPuv) as a standard, it was determined that the E. coli 43890 GFP extraction contained 23.0 µg/mL GFP and the E. coli PKK223 GFP extraction contained 68.6 µg/mLsnearly 3 times the amount of GFP in the extraction solution. Since fluorescence detection requires an active chromophore (e.g., the protein must be in its native state), we suspect that extraction treatment and time prior to analysis may result in a lower than expected measurement of the amount of GFP in the cell lysate. Also, this reading reflects on the number of cells analyzedsif the two samples do not contain the exact number of cells, then comparing the fluorescent signal of the two samples has no real meaning. Fortunately, mass spectrometry is not influenced by the state of the protein, and by using this method to quantitate protein expression, issues such as number of cells analyzed and extraction method are irrelevant, provided that all samples intended for comparison are treated in the same manner. The total ion chromatograms from the LC/MS experiments are shown in Figure 3. The top chromatogram is the E. coli ATCC 43890 strain transformed such that it contained the GFP open reading frames within the low copy number virulence plasmid (pO157) of this strain. The two largest peaks are of proteins that have masses of 18.2 and 28.5 kDa, respectively. These two proteins are present in all samples and have been labeled as marker proteins. They were isolated, digested with trypsin, and identified by MS/MS as hyperosmotically inducible periplasmic protein (Accession number NP_290989) and D-ribose periplasmic binding protein (Accession number NP_290390), respectively. Processing of the chromatograms with ProteinTrawler in 30-s intervals from 13 to 45 min generated the protein profiles shown in Figure 4. The 28.5-kDa protein has a peak intensity ∼3.0 times larger than that of the 18.2-kDa protein in the two samples that contain GFP (43890 GFP and PKK223 GFP). Since these ratios are consistent, it can be concluded that their expression level is unchanged in the two E. coli strains and that they are suitable for use as internal standards. The ratio of GFP to either of these two marker proteins can be used to determine expression differences between the two samples. Spectra could likewise be normalized using the more intense marker protein (in this case, the 28.5kDa protein) and the intensity of the GFP in the plasmid sample directly compared to the single-copy GFP sample. In this manner, it was determined that PKK223 GFP had 16 times more GFP than 43890 GFP. To confirm that protein expression in a complex sample containing numerous proteins can be quantitated, carbonic anhydrase was spiked into the 43890 GFP extract. Since the concentration of the individual proteins in the extract was not known, the amount of carbonic anhydrase to add required experimental determination. The 74 nmol/mL stock solution used for the previous experiment was diluted to 7.4 nmol/mL and 2 µL added to 25-µL aliquot of the cell extract. All 27 µL (14.8 pmol of carbonic anhydrase) was injected onto the HPLC column. A Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
1005
Table 1. Intensity Data Obtained from Spiking E. coli 43890 GFP with Known Amounts of Carbonic Anhydrasea
baseline 8× 16×
carbonic anhydrase (pmol) added to cell extract
carbonic anhydrase (intensity)
28.5-kDa marker protein (intensity)
carbonic anhydrase/ marker protein (intensity ratio)
ratio of intensity ratios to “baseline” sample
14.8 118.4 236.8
11 241 239 900 539 810
21 464 55 955 63 753
0.524 4.29 8.47
1 8.19 16.2
aBy comparing the intensity ratios of carbonic anhydrase to the 28.5-kDa marker protein, the relative amount of carbonic anhydrase in a complex protein sample can be determined.
Figure 3. Total ion chromatograms of a cellular lysate of E. coli in which GFP is differencially expressed.
small peak corresponding to the carbonic anhydrase eluted just after the 28.5-kDa marker protein (data not shown). This sample mixture is herein referred to as the “baseline” sample. Aliquots of the cell extract, in which 118.4 (8×) and 236.8 pmol (16×) of carbonic anhydrase were added, were analyzed by LC/MS followed by postprocessing with ProteinTrawler. The intensities of carbonic anhydrase and the 28.5-kDa marker protein in each of the spiked samples are listed in Table 1. Clearly, absolute intensities cannot be used for direct comparison of samples. However, using the 28.5-kDa marker protein to normalize the spectra allows sample-to-sample comparison of the relative intensity ratio of carbonic anhydrase to marker protein. In this manner, it was determined that the linearity of the experiment and the resulting quantitative results are maintained in a complex protein sample. CONCLUSIONS
Figure 4. Protein profile of the E. coli cells obtained from postprocessing of the total ion chromatogram with ProteinTrawler. 1006 Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
We have previously shown that an automated postprocessing method of LC/MS can be used for comparing cell lysates via their protein mass differences. This method has been extended to look
for differences in protein amounts with the intention of using it to monitor protein expression in cells. One of the more striking outcomes of this investigation was the repeatability of the relative peak areas of the proteins marked for use as internal standards. While this was inferred from the experiments with the model system of proteins, the ratios found in the bacterial cell extracts also showed a high degree of repeatability, with ratios not varying by more than 10% in the two types of engineered bacteria. The high degree of correlation between the proteins used as internal standards adds an additional level of quality assurance and confidence in the quantitative results. In many ways, using the coextracted proteins as a means of standardizing and then quantitating proteins in a cellular extract is comparable to that used in image analysis of 2D gels. The advantage to using mass spectrometry in this case is that it is
faster, unaffected by changes of retention time, and provides more accurate mass information on the proteins of interest. We have demonstrated the validity of using coextracted proteins as internal standards for the quantitation of proteins in a model system, as well as in a bacterial cell extract. Based on our previous method for producing protein profiles of cell extracts, quantitation is accomplished by normalizing the signals for coextracted marker proteins, making it possible to accurately measure the relative levels of protein expression in cellular extracts. We believe this method provides a simple, complimentary method for accessing protein expression in biological systems. Received for review July 18, 2003. Accepted December 3, 2003. AC034820G
Analytical Chemistry, Vol. 76, No. 4, February 15, 2004
1007