Characterization of the Protein Subset Desorbed by MALDI from

This study characterizes various features of the proteins that are detected in MALDI mass spectra when whole bacteria cells are analyzed, in an effort...
2 downloads 6 Views 68KB Size
Anal. Chem. 2001, 73, 746-750

Characterization of the Protein Subset Desorbed by MALDI from Whole Bacterial Cells Victor Ryzhov† and Catherine Fenselau*

Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742

This study characterizes various features of the proteins that are detected in MALDI mass spectra when whole bacteria cells are analyzed, in an effort to understand why some proteins are successfully detected and many others are not. Forty peaks observed in the mass range 400020 000 Da in the spectra of Escherichia coli K-12 and 11775 are tentatively assigned to proteins in a protein database, and these proteins are characterized by cell location, copy number, pI, and hydropathicity. Those detected originate in the cytosol and generally share the traits of high abundance within the cell, strong bacisity, and medium hydrophilicity. Matrix-assisted laser desorption/ionization (MALDI) time-offlight (TOF) mass spectrometry (MS) is widely used for rapid identification of microorganisms.1-5 The speed of MALDI-MS analysis makes this technique especially attractive for many important applications involving bacteria, such as detection of biological warfare agents, control of food poisoning, and blood screening. Protein biomarkers from intact bacterial cells have been shown to be readily accessible with MALDI, and spectral fingerprints have been proposed to identify microorganisms or distinguish between different strains.3,6-8 A phyloproteomics approach based on searching genome/proteome databases has also been proposed.9,10 While there is continuing interest in using MALDI-MS for characterization and identification of microorganisms, relatively † Present address: Department of Chemistry and Biochemistry, Northern Illinois University, DeKalb, IL 60115. (1) Claydon, M. A.; Davey, S. N.; Edwards-Jones, V.; Gordon, D. B. Nat. Biotechnol. 1996, 14, 1584-1586. (2) Krishnamurthy, T.; Ross, P. L. Rapid Commun. Mass Spectrom. 1996, 10, 1992-1996. (3) Arnold, R.; Reilly, J. Rapid Commun. Mass Spectrom. 1998, 12, 630-636. (4) Hathout, Y.; Demirev, P. A.; Ho, Y. P.; Bundy, J. L.; Ryzhov, V.; Sapp, L.; Stutler, J.; Jackman, J.; Fenselau, C. Appl. Environ. Microbiol. 1999, 65, 4313-4319. (5) Holland, R. D.; Wilkes, J. G.; Rafii, F.; Sutherland, J. B.; Persons, C. C.; Voorhees, K. J.; Lay, J. O., Jr. Rapid Commun. Mass Spectrom. 1996, 10, 1227-1232. (6) Krishnamurthy, T.; Ross, P. L.; Rajamani, U. Rapid Commun. Mass Spectrom. 1996, 10, 883-888. (7) Wang, Z.; Russon, L.; Li, L.; Roser, D. C.; Long, S. R. Rapid Commun. Mass Spectrom. 1998, 12, 456-464. (8) Jarman, K. H.; Cebula, S. T.; Saenz, A. J.; Petersen, C. E.; Valentine, N. B.; Kingsley, M. T.; Wahl, K. L. Anal. Chem. 2000, 72, 1217-1223. (9) Demirev, P. A.; Ho, Y. P.; Ryzhov, V.; Fenselau, C. Anal. Chem. 1999, 71, 2732-2738. (10) Pineda, F.; Linn, J.; Fenselau, C.; Demirev, P. A. Anal. Chem., submitted.

746 Analytical Chemistry, Vol. 73, No. 4, February 15, 2001

little is known about the origin and nature of the protein biomarkers in MALDI spectra of whole bacterial cells. Arnold and Reilly11 characterized the protein fraction of isolated Escherichia coli ribosomes by MALDI-TOF MS. They were able to detect virtually all of the 58 ribosomal proteins, some of them posttranslationally modified. Ribosomal proteins account for more than 20% of total cell protein,12 and the molecular masses of many of the proteins in the extract match those of biomarkers observed in MALDI spectra of whole E. coli cells. Dai et al.13 performed an HPLC separation of E. coli cell extracts and characterized the major components of three fractions by tryptic digestion and protein database search. The search resulted in matches to two cold-shock and one DNA-binding protein. Isolation strategies could be extended to identify the protein biomarkers observed in MALDI mass spectra of whole cells. However, they are very time- and effort-consuming. In the meantime, the genomes of E. coli K12 and nearly 30 other microorganisms have been completely sequenced.14 Thus, the vast portion of the protein biomarkers in mass spectra of these bacteria is expected to be present in the genome/protein database. Consequently, the molecular masses of proteins desorbed by MALDI from whole cells may be matched against those predicted by the organism proteome, to identify tentatively many protein biomarkers. In addition to the molecular weight, the SwissProt/ TreMBL protein database15 contains information on each protein’s sequence and its pI. This information can also be useful in characterizing biomarkers. The genome of E. coli K12 predicts more than 2000 proteins with molecular masses between 4000 and 20 000 Da. Only a small subset of these is observed in MALDI mass spectra. It is important for the design of improved sensors to understand the nature of the proteins that are favored by MALDI. For this study we chose E. coli, because it is the best-studied bacterium. Numerous MALDI-TOF spectra of E. coli from our laboratory9 as well as from other laboratories3,7,11,13,16,17 aid in addressing the reproducibility question and making interstrain comparisons. Both the sequenced K12 strain and a related 11775 strain were studied. (11) Arnold, R. J.; Reilly, J. P. Anal. Biochem. 1999, 269, 105-112. (12) Bremer, H.; Dennis, P. P. In Escherichia coli and Salmonella; Neidhardt, F. D., Ed.; ASM Press: Washington, DC, 1996; Vol. 1, pp 167-182. (13) Dai, Y.; Li, L.; Roser, D. C.; Long, S. R. Rapid Commun. Mass Spectrom.1999, 13, 73-78. (14) http://www.ncbi.nlm.nih.gov/Entrez/Genome/org.html. (15) http://www.expasy.ch/srs5/. 10.1021/ac0008791 CCC: $20.00

© 2001 American Chemical Society Published on Web 01/18/2001

Figure 1. MALDI mass spectra of whole cells of E. coli strain K-12 in sinapinic acid.

EXPERIMENTAL SECTION Microorganisms. E. coli strains ATCC 11775 and 25404 (K12) were purchased from ATCC, Rockville, MD. These bacteria were grown in 8 g/L nutrient broth (Difco Labs, Detroit, MI). After harvesting, the material was centrifuged at 10000g for 10 min and the pellet was washed with water three times. The bacterial cells were lyophilized and stored at -20° C. Mass Spectrometry. MALDI mass spectra were obtained on a Kompact MALDI 4 (Kratos Analytical Instruments, Chestnut Ridge, NY) time-of-flight instrument in the linear mode at 20-kV accelerating voltage with a 0.3-µs delay time. Laser fluence was typically 10 mJ/cm2. Each spectrum was an average from 50 laser shots. Whole bacterial cells were suspended (5 mg/mL) in acetonitrile/0.1% trifluoroacetic acid (TFA; 70:30, v/v) and 0.2 µL was deposited in a well of a Kratos sample slide. Then 0.2 µL of matrix solution (50 mM sinapinic acid (SA) or 50 mM R-cyano4-hydroxycinnamic acid (CHCA) in the same solvent mixture) was added to the sample. Internal (bovine insulin, bovine ubiquitin, equine cytochrome c) and external mass calibrations were used to provide mass accuracy of 1 part in 3000. The calibrants were purchased from Sigma Chemical Co. (St. Louis, MO_ and added in small amounts to the matrix solution (5-20 µm final concentration) to give signals comparable with the bacteria biomarkers in intensity. The matrix compounds and TFA were purchased from Aldrich Chemical Co., Inc. (Milwaukee, WI). Database Search. Searches were conducted in the SwissProt/ TrEMBL database (Expasy, Swiss Bioinformatics Institute) using the Sequence Retrieval System module at http://www.expasy.ch/ srs5/. The only parameters used in the search were the organism (E. coli) and the mass (experimental mass allowing for a mass window corresponding to the experimental mass accuracy of 1 part in 3000). Details of a database search procedure were described elsewhere.9 When no database protein was found in the mass window, the search was conducted with a mass corresponding to N-terminal methionine loss.

Hydropathicity values were obtained for each protein from the NiceProt View section of Swiss Prot/TrEMBL, following the links to “ProtParam”(in Tools) and “Submit”. Proteins for Figure 4a were selected as first and last 200 proteins in the alphabetical list of all E. coli proteins in the mass range of 4-20 kDa. RESULTS AND DISCUSSION A MALDI mass spectrum of a whole-cell suspension of E. coli strain K-12 in sinapinic acid as a matrix is shown in Figure 1. Forty peaks were selected for study in the mass range 400020 000 Da. These peaks are highly reproducible if the same experimental protocol is observed. Results of the database search using the experimental masses as search parameters are summarized in Table 1. The table also incorporates five additional peaks found in MALDI spectra of E. coli K-12 intact cells using CHCA as the matrix (Figure 2). The MALDI spectra of E. coli 11775 were very similar (data not shown) displaying 30 out of 32 peaks shown in Figure 1. All of the database proteins matching the experimentally observed masses correspond to proteins from the inside of the bacterial cell. The only possible exception is the (M + H)+ peak at 8325 Da. One candidate for its assignment is the major outer-membrane lipoprotein precursor (8324 Da). Arnold et al.16 assigned this peak to the 8.3 kDa protein coded by the DINF-QOR intergenic region (YJBJ protein) based on its pI value, characterized by liquid-phase isoelectric focusing of extracts. The thin cell walls of vegetative bacteria are lysed by the organic solvents and acidic conditions used in the analysis, releasing cytoplasmic proteins for detection in MALDI spectra. When bacterial cells were suspended in pure water (with no TFA or organic solvent), MALDI analysis detected only three major peaks, corresponding to biomarker masses 5752, 7707, and 9064 Da. Of these three peaks, two (5752 and 7707 Da) could not be (16) Arnold, R. J.; Karty, J. A.; Ellington, A. D.; Reilly, J. P. Anal. Chem. 1999, 71, 1990-1996.

Analytical Chemistry, Vol. 73, No. 4, February 15, 2001

747

Table 1. SwissProt/TrEMBL Database Proteins Matching Biomarker Masses in MALDI MS of E. coli K-12 MH+ (exp)

Ia

4364*c 5096* 5380 5598* 6255* 6315* 6410 6505 6696 6856 7159 7273

H M M

H

7333 7476 7869 8324

M H

8370 8545

M

8896 8875 9064

M

9191

M

H

M (match) 4364 4364 5096 5098 5380 5597 6254 6254 6315 6411 6507 6694 6856 6855 6855 7158 7273 7272 7271 7333 7332 7332 7474 7871 8324 8324 8325 8369 8371 8544 8542 8897 8875 9065 9064 9190 9193

description

Hb

pI

RL36 Q53518 RS22 LPPY RL34 Q51954 Q51622 RL33 methylated RL32 RL30 RMF P77370 CSRA Q47655 YCAR RL35 RL29 CSPA CSPC TRBK CSPE P77098 KEB2 loss of Met RL31 MOMLP YPGA YJBJ RS21 FEOA DAAF TRAY RS18 acetylated RL28 P 76358 KOC1 M-loss RS16 YFGI

-0.611 0.083 -1.269 0.141 -1.111 -0.686 0.362 -0.854 -1.025 -0.172 -1.136 0.098 -0.257 0.324 -0.127 -0.656 -0.657 -0.283 -0.221 0.082 -0.244 -0.386 -0.200 -0.649 -0.314 -0.436 -1.526 -1.133 0.053 -0.552 -0.587 -0.811 -0.683 -0.158 -0.104 -0.329 -0.127

10.7 4.2 11.0 11.4 13.0 9.4 11.3 10.3 11.0 11.0 10.9 11.0 8.2 9.2 4.9 11.8 10.0 5.6 6.8 8.9 8.1 11.8 8.4 9.5 9.3 3.2 5.4 11.2 9.4 9.3 8.1 10.6 11.4 4.5 6.2 10.5 6.3

MH+ (exp)

Ia

9224

M

9270 9537 9553* 9575

M M

9737 10300

H M

10650 10694

M M

11187 11214 11450 11736 11772 12226 14726 15419 15707 16019 18773

M

M (match)

description

Hb

pI

9225 9227 9272 9535 9553 9573 9572 9737 10299 10300 10303 10651 10693 10689 11185 11186 11192 11214 11449 11451 11454 11736 11737 11735 11772 12226 12224 14725 15416 15419 15704 15706 16018 16016 18773

DBHB KEC2 CHPS DBHA RL26/RS20 RS17 ILVM AFAF RS19 ACYP YJCB IHFB RL25 CILG RL24 Q 47546 Q47627 REL1 RS14 Q9Z498 Q 46693 RS10 YFJZ P 76525 Y182 RL22 TRPR RS9 MARA URF4 RS6 ARDR RL13 UP03 RL6

-0.042 -0.934 -0.277 -0.228 -0.735 -0.319 -0.024 -0.596 -0.647 -0.393 0.782 -0.784 -0.468 0.028 -0.403 -0.325 -0.307 -0.555 -0.822 0.616 -0.890 -0.352 -0.235 -0.030 -0.380 -0.349 -0.446 -0.707 -0.919 0.340 -0.827 -0.699 -0.540 0.022 -0.239

9.7 6.6 4.7 9.6 11.2 9.6 8.9 9.2 10.5 8.7 5.5 9.3 9.6 4.6 10.2 5.3 5.3 9.9 11.2 9.1 9.8 9.7 6.0 4.4 9.7 10.2 5.4 10.9 9.4 6.3 4.9 6.6 9.9 5.6 9.7

a Intensity in MALDI spectrum: H, high; M, medium, no entry, lower intensity. b Total hydropathicity (Bull-Breese). c Present only when CHCA is used as a matrix.

matched to a protein in the database (with or without N-terminal methionine loss). Furthermore, these two peaks are not observed when the suspension is treated with TFA and/or organic solvent. If they are located on the outside of the cell, the genome may not always account for them18 since they can be plasmid encoded. They may also be secondary metabolite products.19 The third peak, at 9064 Da, could be an internal P 76358 protein or an internal KOC1 protein with an N-terminal methionine loss. Holland et al.20 assigned this peak to a deletion-induced protein A (HdeB) and a peak at 9739 to the deletion-induced protein HdeA; however, these proteins were not present in the SwissProt/TrEMBL database at the time of our search. Ribosomal Proteins. About half of the peaks in the MALDI spectra were matched by mass to ribosomal proteins. These have very high abundance since almost a half of the mass of growing cells corresponds to ribosomes.12 Some of the peaks are assigned (17) Chong, B. E.; Wall, D. B.; Lubman, D. M.; Flynn, S. J. Rapid Commun. Mass Spectrom. 1997, 11, 1900-1908. (18) Evason, D. J.; Claydon, M. A.; Gordon, D. B. Rapid Commun. Mass Spectrom. 2000, 14, 669-672. (19) Leenders, F.; Stein, T. H.; Kablitz, B.; Franke, P.; Vater, J. Rapid Commun. Mass Spectrom. 1999, 13, 943-949. (20) Holland, R. D.; Duffy, C. R.; Rafii, F.; Sutherland, J. B.; Heinze, T. M.; Holder, C. L.; Voorhees, K. J.; Lay, J. O., Jr. Anal. Chem. 1999, 71, 3226-3230.

748 Analytical Chemistry, Vol. 73, No. 4, February 15, 2001

as posttranslationally modified ribosomal proteins in agreement with the work of Arnold and Reilly.11 All of the ribosomal proteins matching biomarker masses observed in MALDI spectra of whole cells are very basic (see pI column in Table 1). Basic proteins are well known to be more favorable for protonation during the MALDI process.21 In addition to matches corresponding to ribosomal subunit proteins, we observe a peak at 6506 Da that matches within 1 Da the ribosome modulation factor protein (RMF). DNA-Binding Proteins HU. Dai and co-workers13 identified three proteins in fractionated bacterial extracts from E. coli strain 11775. One of the identified proteins was the 9535 Da R-subunit of a DNA-binding protein HU (DBHA), one of the most abundant proteins in E. coli.22 The β-subunit of this protein (DBHB) has a molecular mass of 9225 Da. We observe peaks within 2 Da of both of these masses in MALDI spectra from both of the E. coli strains studied. While for the 9537 peak there are no candidates other than DBHA in the protein database, there are two potential matches for the 9224 peak, DBHB and KEC2. Since both chains (21) Krause, E.; Wenschuh, H.; Jungblut, P. R. Anal. Chem. 1999, 71, 41604165. (22) Kohno, K.; Wada, M.; Kano, Y.; Imamoto, F. J. Mol. Biol. 1990, 213, 2736.

Figure 2. MALDI mass spectra of whole cells of E. coli strain K-12 in R-cyano-4-hydroxycinnamic acid.

of the DNA-binding protein must be present in equal amounts and their sequences are similar, one might expect to detect both of them in MALDI analysis. Therefore, it is more likely that the 9224 peak belongs to DBHB rather than to KEC2. Cold-Shock Proteins. Another group of proteins potentially found in MALDI spectra of E. coli belongs to the family of abundant23 cold-shock proteins. They include the cold-shock protein A (CSPA) at 7272 Da, cold-shock protein C (CSPC) at 7271 Da, and the cold-shock-like protein E (CSPE) at 7332 Da. There are also other potential matches in the protein database corresponding to these masses. For example, a ribosomal protein RL29 has a mass of 7273 Da. For the 7332 peak, there are two other possible matches in addition to CPSE (see Table 1). With the relatively low mass resolution of the MALDI-TOF instrument used, it is not possible for us to conclude whether more than one protein contributes to the peak. pI of the Proteins. Table 1 lists pI values for every possible match in the database. It can be seen that most of the proteins are very basic. The pI distribution of all E. coli proteins from 4 to 20 kDa is bimodal with the maximums at about 5.5 and 9.5 (see Figure 3A). The pI distribution of the potential matches to the experimental peaks is shown in Figure 3B. While it conserves the bimodal character, proteins on the basic end are clearly dominating. The fact that most of the proteins detected by MALDI in positive ion mode are from the basic end is in agreement with observations that the conditions used for positive ion MALDI favor ionization of basic proteins and peptides.21 The acidic conditions of the bacterial suspension (0.1% TFA has a pH of ∼2) also favor detection of more basic proteins. Hydropathicity Index. This information is summarized for each entry in Table 1. The values show how (on average) hydrophilic or hydrophobic the protein is. The hydropathicity index H is more negative when a protein is more hydrophilic.

The average indexes should be used with caution since they do not reflect the secondary structure of the protein. Figure 4A gives a comparison of the hydropathicity distribution of 400 E. coli proteins with molecular masses below 20 000 Da, the first 200 and the last 200 entries in an alphabetical list of proteins. Figure 4B summarizes the hydropathicity of the tentatively assigned MALDI biomarkers (Figure 4B). Most of the proteins detected, especially in the lower mass range, are moderately hydrophilic. The average H value of the proteins detected (∼-0.5) is below the average H value for all E. coli proteins (-0.2). The presence of many hydrophilic proteins in MALDI spectra represents a contrast to spectra produced by fast atom bombardment, an ionization technique that favors more hydrophobic proteins and peptides.24 Hydrophilic proteins are more soluble in the waterbased solvent used here, which makes it easier for them to cocrystallize with the matrix.

(23) Goldstein, J.; Pollitt, N. S.; Inouye, M. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 283-287.

(24) Naylor, S.; Findeis, A. F.; Gibson, B. W.; Williams, D. H. J. Am. Chem. Soc. 1986, 108, 6359-6363.

Figure 3. Distribution of pI values of E. coli proteins: (A) all E. coli proteins from 4 to 20 kDa; (B) tentatively assigned MALDI biomarkers.

Analytical Chemistry, Vol. 73, No. 4, February 15, 2001

749

m/z 11 000 in the instrument used here. This may be due in part to the fact that CHCA induces substantially more fragmentation than SA.25 CONCLUSIONS Although more than 2000 proteins with molecular masses below 20 000 are predicted from the sequenced genome of E. coli K12, only between 30 and 40 were desorbed and detected in this MALDI experiment. Those detected originate in the cytosol and generally share the traits of high abundance, strong basicity, and medium hydrophilicity. The understanding that these are the characteristics of biomarkers that are readily detected may direct efforts in the development of more sensitive and more reliable biosensors based on MALDI mass spectrometry.

Figure 4. Distribution of hydropathicity values of E. coli proteins: (A) 400 E. coli proteins from 4 to 20 kDa; (B) tentatively assigned MALDI biomarkers.

MALDI Matrix Effects. One important factor affecting the MALDI spectra of bacterial cells is the matrix.24 We recorded MALDI spectra of two strains of E. coli in two matrixes, SA and CHCA. Table 1 shows that five of the six lower mass peaks are detected only using CHCA as a matrix (these peaks are starred in Table 1). Four of these five peaks are tentatively assigned as ribosomal proteins. The lower-mass end of MALDI spectra of whole E. coli cells in both matrixes contains many doubly charged ions. The use of CHCA as matrix did not provide peaks above

750 Analytical Chemistry, Vol. 73, No. 4, February 15, 2001

ACKNOWLEDGMENT The authors thank Danying Zhu and Kim Dudley for culturing the bacteria used in these studies. Plamen Demirev is gratefully acknowledged for providing the pI plot of total E. coli proteins and for valuable discussions. Scott Robinson of the University of Delaware is also acknowledged for his help with pI plots. This work was supported by contracts from the Applied Physics Laboratory of the Johns Hopkins University and Defense Advance Research Project Agency. Received for review July 31, 2000. Accepted December 4, 2000. AC0008791 (25) Ho, Y. P.; Fenselau, C. Anal. Chem. 1998, 70, 4890-4895.