Anal. Chem. 1996, 68, 1822-1828
Protein Identification by Capillary Zone Electrophoresis/Microelectrospray Ionization-Tandem Mass Spectrometry at the Subfemtomole Level Daniel Figeys, Inge van Oostveen, Axel Ducret, and Ruedi Aebersold*
Department of Molecular Biotechnology, University of Washington, Box 357730, Seattle, Washington 98195-7730
A method for the identification of proteins by their amino acid sequence at the low-femtomole to subfemtomole sensitivity level is described. It is based on an integrated system consisting of a capillary zone electrophoresis (CZE) instrument coupled to an electrospray ionization triplequadrupole tandem mass spectrometer (ESI-MS/MS) via a microspray interface. The method consists of proteolytic fragmentation of a protein, peptide separation by CZE, analysis of separated peptides by ESI-MS/MS, and identification of the protein by correlation of the collisioninduced dissociation (CID) patterns of selected peptides with the CID patterns predicted from all the isobaric peptides in a sequence database. Using standard peptides applied to a 20-µm-i.d. capillary, we demonstrate an ESI-MS limit of detection of less than 300 amol and CID spectra suitable for searching sequence databases obtained with 600 amol of sample applied to the capillary. Successful protein identification by the method was demonstrated by applying 50 and 38 fmol of a tryptic digest of the proteins β-lactoglobulin and bovine serum albumin, respectively, to the system.
The identification of proteins by their amino acid sequence is an essential step in many projects in biological research. Traditionally, this has been achieved by partial “de novo” sequencing of isolated proteins. Alternatively, proteins can be identified if the information contained in the amino acid sequence is correlated with a sequence in a sequence database. Now, as the size of sequence databases is rapidly increasing and genome sequences of selected species have been completed1 or are approaching completion,2,3 protein identification by correlation with sequence databases is becoming the preferred method. Several groups have developed algorithms which use specific parameters of proteins to search sequence databases (reviewed in ref 4). Search parameters include the amino acid composition,5,6 the mass of peptides generated by specific proteolysis7-11 either alone or in (1) (2) (3) (4) (5)
Fleischmann, R. D.; et al. Science 1995, 269, 468-70. Williams, S. Science 1995, 268, 1560-61. Collins, F.; Galas, D. Science 1993, 262, 43-6. Patterson, S. C.; Aebersold, R. Electrophoresis 1995, 16, 1791-814. Cordwell, S. J.; Wilkins, M. R.; Cerpa-Poljak, A.; Gooley, A. A.; Duncan, M.; Williams, K. L.; Humphery-Smith, I. Electrophoresis 1995, 16, 43843. (6) Shaw, G. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 5138-42. (7) Henzel W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 5011-15.
1822 Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
combination with secondary criteria such as sequence tags,12 the presence of specific functional groups13 or the number of exchangeable hydrogens,14 and CID spectra of selected peptides.15,16 With the exception of protein identification by amino acid composition, all of the above methods rely on data generated by mass spectrometry and/or tandem mass spectrometry of peptide(s) derived from the protein under investigation. Electrospray ionization mass spectrometry (ESI-MS) is a popular method for the analysis of peptide mixtures because it is easily interfaced with common high-resolution separation methods such as high-performance liquid chromatography17,18 or capillary electrophoresis.19-30 While ESI-MS is also easily interfaced with continuous infusion systems, connection on-line to a peptide separation system is preferable in cases in which complex or contaminated samples are encountered. Separation of peptides (8) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res. Commun. 1993, 195, 58-64. (9) Yates, J. R., III; Speicher, S.; Griffin, P. R.; Hunkapiller, T. Anal. Biochem. 1993, 214, 397-408. (10) Pappin, D. J. C.; Hojrup, P.; Bleasby, A. J. Curr. Biol. 1993, 3, 327-32. (11) Cottrell, J. S. Pept. Res. 1994, 7, 115-24. (12) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-9. (13) Pappin, D. J. C.; Rahman, D.; Hansen, H. F.; Bartlet-Jones, M.; Jeffery, W.; Bleasby, A. J. Mass Spectrometry in the Biological Sciences; Humana Press: Totowa, NJ, 1995; pp 135-50. (14) James, P.; Quadroni, M.; Carafoli, E.; Gonnet G. Protein Sci. 1994, 3, 134750. (15) Yates, J. R., III; Eng, J. K.; McCormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-36. (16) Eng, J.; McCormack, A. L.; Yates, J. R., III J. Am. Soc. Mass. Spectrom. 1994, 5, 976-89. (17) Loo, J. A.; Edmonds, C. G.; Smith, R. D. Anal. Chem. 1991, 63, 2488-99. (18) Tomer, K. B.; Moseley, M. A.; Deterding, L. J.; Parker, C. E. Mass Spectrom. Rev. 1994, 13, 431-57. (19) Licklider, L.; Kuhr, W. G.; Lacey, M. P.; Keough, T.; Purdon, M. P.; Takigiku, R. Anal. Chem. 1995, 67, 4170-77. (20) Banks, J. F., Jr. J. Chromatog., A 1995, 712, 245-52. (21) Locke, S. J.; Thibault, P. Anal. Chem. 1994, 66, 3436-46. (22) Pleasance, S.; Thibault, P.; Kelly, J. F. J. Chromatogr. 1992, 591, 325-39. (23) Pleasance, S.; Ayer, S. W.; Laycock, M. V.; Thibault, P. Rapid Commun. Mass. Spectrom. 1992, 6, 14-24. (24) Moseley, M. A.; Deterding, L. J.; Tomer, K. B.; Jorgenson, J. W. Anal. Chem. 1991, 63, 109-14. (25) Johansson, I. M.; Huang, E. C.; Henion, J. D.; Zweigenbaum, J. J. Chromatogr. 1991, 554, 311-27. (26) Niessen, W. M. A.; Tjaden, U. R.; van der Greef, J. J. Chromatogr. 1993, 636, 3-19. (27) Takada, Y.; Nakayama, K.; Yoshida, M.; Sakairi, M. Anal. Sci. 1994, 107137. (28) Varghese, J.; Cole, R. B. J. Chromatogr., A 1993, 652, 369-76. (29) Amankwa, L. N.; Harder, K.; Jirik, K.; Aebersold, R. Protein Sci. 1995, 4, 113-25. (30) Tomlinson, A. J.; Braddock, W. D.; Benson, L. M.; Oda, R. P.; Naylor, S. J. Chromatogr., B 1995, 669, 67-73. S0003-2700(96)00191-6 CCC: $12.00
© 1996 American Chemical Society
from contaminants is particularly important for obtaining conclusive results from peptide analyses at high sensitivity. The separation of peptide mixtures by capillary zone electrophoresis (CZE) offers a number of advantages over LC. Analytes are separated according to their mass-to-charge ratio, with a higher number of theoretical plates, in a shorter time, and at a constant buffer composition. Furthermore, even very short and polar peptides are separated by CZE and only a small fraction of the total sample volume is consumed. CZE/ESI-MS has been used successfully to analyze peptides and protein digests,19-30 most commonly using a CZE/MS interface which relies on a sheath flow to provide the solvent flow necessary for achieving stable ionization conditions. Unfortunately, the sheath flow also dilutes the analyte, thus reducing the sensitivity of the method and requiring high analyte concentrations in the sample. The development of microspray ESI-MS interfaces31-35 has in part alleviated this limitation. The electroosmotic flow generated by CZE is directly compatible with the flow rate typically used in microspray interfaces. Therefore, no sheath liquid is required and analytes are directly sprayed and ionized from the tip of the capillary without further dilution. Also, a bigger fraction of the sample enters the mass spectrometer compared to conventional ion sources and ions desolvate and decluster more efficiently. Finally, the ionization efficiency appears to be increased compared to conventional ESI. This was demonstrated by Smith et al.,35-38 who showed an improvement in both the absolute limit of detection and the concentration limit of detection for peptides by using the microspray approach. The objective of this study was to develop a method for the identification of proteins at high sensitivity by correlation of data generated by CZE/microspray-MS/MS with sequence databases. Proteins were identified by searching the OWL composite protein sequence database39 and a protein database of bovine sequences using the SEQUEST program.15,16 We present results from the analysis of peptide standards and protein digests and discuss the strengths and limitations of the approach. EXPERIMENTAL SECTION Materials. 3-Aminopropylsilane, (3-mercaptopropyl)trimethoxysilane, 2-propanol, and toluene were purchased from Aldrich (Milwaukee, WI). Acetic acid, hydrofluoric acid, and methanol were from J. T. Baker (Phillipsburg, NJ). The peptide standards, proteins, and ammonium bicarbonate were obtained from Sigma (St. Louis, MO). The gold epoxy was bought from Epoxy Technology Inc. (Billerica, MA). Ultra pure carrier grade helium was from Air Products (Allentown, PA). The capillary tubing was from Polymicro Technologies (Phoenix, AZ). The water was distilled and deionized (18 MΩ) using a Milli-Q system from Millipore (Bedford, MA). (31) Wilm, M. S.; Mann, M. Int. J. Mass Spectrom. Ion Processes 1994, 136, 167-80. (32) Wilm, M.; Mann, M. Anal. Chem. 1996, 1, 1-8. (33) Emmett, M. R.; Caprioli, R. M. J. Am . Soc. Mass. Spectrom. 1994, 5, 60513. (34) Andren, P. E.; Emmett, M. R.; Caprioli, R. M. J. Am . Soc. Mass. Spectrom. 1994, 5, 867-9. (35) Hofstadler, S. A.; Swanek, F. D.; Gale, D. C.; Ewing, A. G.; Smith, R. D. Anal. Chem. 1995, 67, 1477-80. (36) Wahl, J.; Hofstadler, S. A.; Smith, R. D. Anal. Chem. 1995, 67, 462-5. (37) Wahl, J.; Gale, D. C.; Smith, R. D. J. Chromatogr. A 1994, 659, 217-22. (38) Wahl, J. H.; Smith, R. D. J. Capillary Electrophor. 1994, 1, 62-71. (39) Bleasby, A. J.; Wootton, J. C. Protein Eng. 1990, 3, 3, 153-59.
Coating of the Capillary. The coating procedure for the inner wall of the capillaries was adapted from refs 40 and 41. The capillary walls were modified by passing the following solutions through in sequence: sodium hydroxide (1 M) for 30 min, water for 10 min, hydrochloric acid (3 M) for 30 min and water for 30 min. Washing of the capillary with a base followed by washing with an acid ensured that a maximum amount of silanol groups were available for reaction. The capillaries were then dried at 110 °C for 19 h by maintaining a flow of helium (ultrapure). To derivatize the surface, a 10% (w/v) solution of (3-aminopropyl)trimethoxysilane in toluene was passed through the capillaries for 2 h. Following removal of the reagent, the capillaries were heated at 103 °C for 18 h with a flow of helium and finally rinsed with electrophoresis buffer before use. To generate the lead for the application of the spraying potential, the polyimine coating of the capillaries was burned off 1-2 cm from one end and washed off with acetone. This end was then etched for 13 min in hydrofluoric acid (48-51%) with helium flowing through the capillary to make sure that the hydrofluoric acid did not etch the inner walls. The etched end of the capillary was then prepared for application of a layer of gold using a modified version of the procedure previously described.42,43 The ends to be coated with gold were dipped for 1 min into a boiling mixture of 1 mL of water, 31.5 mL of 2-propanol, and 1.12 g of (3-mercaptopropyl)trimethoxysilane and then baked for 10 min at 102 °C. This procedure was repeated three times. Gold was sputtered on the pretreated end of the capillaries for 2 min at 18 mA using a SPI sputter coater (Structure Probe, West Chester, PA). Capillaries prepared according to this procedure were used for CZE/microspray-MS /MS for several weeks without failure of the gold coating. Tryptic Digest. β-Lactoglobulin and BSA were digested in solution at a concentration of 20 pmol/µL. A 2-µL aliquot of trypsin was mixed with 3 µL of 100 mM ammonium bicarbonate pH 7.8/acetonitrile (90:10 v/v) and added to 5 µL (1 µg/µL) of the protein solution in the same solvent for a total volume of 10 µL. The mixture was incubated at 37 °C for 2-3 h; 1.5 µL of 10% TFA was added to the solution to stop the reaction. CZE/MS. CZE was performed in 20-µm and 50 µm i.d. × 184 µm o.d. (3-aminopropyl)silane-coated capillaries. The gold-coated end, called the microsprayer end, was installed in a homemade microsprayer mount toward the orifice of the mass spectrometer. The uncoated end of the capillary, called the injection end, was kept in a buffer reservoir. The capillary was filled with acetic acid (10.5 mM pH 3.2) containing 10% (v/v) methanol. At this pH, the amino groups of the covalently attached (3-aminopropyl)silanes are protonated. The presence of positive charges on the capillary wall helped to reduce protein and peptide adsorption. A negative high-voltage was applied to the injection end (see figures for values). The microspray end of the capillary was inserted into 190-µm-i.d. stainless steel tube of 3-cm length from which the etched, gold-coated end extruded by ∼1 cm. Gold epoxy was used to make contact between the stainless steel tube and the gold coat at the microsprayer end. The positive high-voltage power supply was connected through a series of five 5-MΩ high-voltage (40) Thorsteinsdottir, M.; Isaksson, R.; Westerlund, D. Electrophoresis 1995, 16, 557-63. (41) Bruin, G. J. M.; Huisden, R.; Kraak, J. C.; Poppe, H. J. Chromatogr. 1989, 480, 339-49. (42) Goss, C. A.; Charych, D. H.; Majda, M. Anal. Chem. 1991, 63, 1, 85-88. (43) Kriger, M. S.; Cook, K. D.; Ramsey, R. S. Anal. Chem. 1995, 67, 385-89.
Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
1823
resistors to the stainless steel tube. The resistors were used to limit the maximum amount of current that the mass spectrometer high-voltage power supply could generate to less than 10 µA. This precaution was necessary to limit the occurrence of arcing and damage to the boards of the mass spectrometer. Using an XYZ translation stage, the tip of the microspray end of the capillary was placed ∼1-5 mm away from the capillary entrance of the mass spectrometer. Positive high-voltage of ∼+1.9 kV compared to the MS capillary entrance was applied to the microspray end in order to produce a stable electrospray with a measured flow of 200 nL/min (total potential differential over the capillary was 17 kV). Care was taken to ensure that the CZE reservoir and the microspray end were at the same height in order to minimize any gravity flow. Mass spectrometry was performed on a model TSQ 7000 triple quadrupole mass spectrometer (Finnigan Mat, San Jose, CA) as previously described.44 The mass spectrometer was first used in the full-scan mode using the third quadrupole to detect the analytes coming out of the capillary. All the experiments were performed with 1 amu resolution. The second quadrupole was filled with 3.5-4.0 mTorr of argon but the energy supplied to the ions was too low to induce CID. The mass spectrometer was automatically switched to the daughter ion scan mode when the ion current for a particular peptide ion reached a preset threshold of 10 000 counts. The ion of interest was then selected by the first quadrupole. The energy in the second quadrupole was increased in order to fragment the precursor ions by CID. The masses of the product ions were recorded by scanning the third quadrupole. Four to five full-scans at different energies were obtained for each selected ion before the mass spectrometer automatically returned to the full-scan mode. Sequence Database Search Strategy. Sequence databases were searched using the SEQUEST search program.15,16 For each of the generated collision-induced dissociation (CID) spectra, SEQUEST generates a table containing the best peptide matches extracted from the protein database, a correlation coefficient and a ∆ correlation factor. The correlation coefficient indicates the quality of the match between the experimental CID spectrum and the predicted CID spectra of any isobaric peptide from the sequence database. The ∆ correlation factor indicates the difference between the top scored peptide and the next best match. From experience we know that a correlation coefficient of 2 or higher indicates a highly significant match while a ∆ correlation factor higher than 0.1 indicates a good distinction between the top and the next best match. In cases in which a homogeneous protein was analyzed, the CID spectrum of each peptide was initially analyzed and matched individually to the sequence database. Then, the matches of all the analyzed peptides derived from one protein were collectively scored for the identification of the protein. To obtain the composite score, we summed up the individual scores from each peptide by using a simple algorithm that assigned an arbitrarily chosen score of 10 for a first position peptide identification, 8 for second position, 6 for third, 4 for fourth, and 2 for fifth position. RESULTS Limit of Detection. Using a calibrated sample of a single peptide we initially determined the detection sensitivity of the (44) Ducret, A.; Bruun, F.; Bures, E. J.; Marhaug, G.; Husby, G.; Aebersold, R. Electrophoresis, in press.
1824 Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
Figure 1. Injection and CZE/MS of 300 amol of Leu-enkephalin on a 20-µm-i.d. × 155 µm o.d. × 84 cm long capillary with -25 kV at the injection end and +1.9 kV at the microspray end. The injection was performed at -4 kV for 10 s from a 0.21 pmol/µL solution of Leu-enkephalin. Top panel is the signal at m/z ) 556.5 Da versus time, and bottom panel is the base peak versus time.
system equipped with a 20-µm-i.d. capillary and operated in the CZE/MS and CZE/MS/MS mode, respectively. Figure 1 shows the injection of 300 amol of leucine-enkephalin. The applied amount was calculated from the peptide concentration and the injected volume. The signal-to-noise ratio was ∼5, indicating that the limit of detection of the system in the CZE/MS mode was close to 300 amol. This experiment was repeated at least three times using different solution concentrations and different times of injection. The results were always similar. In order to evaluate the ability of the system to screen a sequence database using the CID spectrum of subfemtomole amounts of a peptide, we added the sequence of the peptide bradykinin to a sequence database of all known bovine sequences. Decreasing amounts of the peptide were injected into a 20-µmi.d. capillary, and CID spectra were generated as the peak migrated out of the capillary. These CID spectra were used to search the sequence database using SEQUEST. The peptide was positively identified with a correlation coefficient of 2.7 and a ∆ correlation factor of 0.17 with as little as 600 amol of bradykinin injected. Analysis of a Peptide Mixture. We next performed CZE/ MS and CZE/MS/MS experiments with a peptide mixture to demonstrate the separation capability of CZE under these conditions and the ability to automatically generate CID spectra of sufficient quality to search sequence databases. Figure 2A displays the base peak (representing the most abundant ion in a peak) of the separation of a mixture of eight peptides ranging in length from 5 to 14 amino acids by CZE. All the components were resolved and detected in less than 9 min with peak widths of 10 s or less. The amount of each peptide injected was 33 pg (∼40 fmol). Figure 2B details the CID spectrum obtained of methionine-enkephalin and the interpretation of the fragment ions. It is evident that good fragmentation was obtained with ions generated by the microspray ion source. Fragmentation patterns of similar quality were obtained for every component in the mixture (data not shown). Collectively, these data show that under the conditions used in this experiment, the CZE/microspray-MS/MS system achieved a good separation of peptide samples and a high detection sensitivity and that the CID spectra obtained were of a quality suitable for de novo sequence interpretation or database searches.
Figure 3. Separation of a β-lactoglobulin tryptic digest on a 50µm-i.d. × 155 µm o.d. × 103 cm long capillary with -20 kV at the injection end and +2.1 kV at the microspray end. The injection was 10 s of a 4 pmol/µL solution at -10 kV at the injection end and ground at the microspray end, resulting in a total amount of sample applied of 38 fmol. The ions from Table 1A are indicated on the spectrum in the charge state in which they were observed in the mass spectrometer. Each peak that correlates as resulting from β-lactoglobulin tryptic digest (Table 1A) was labeled with the observed mass value and the charge. Table 1a
Figure 2. (A, top) Separation of a mixture of peptides on a 50-µmi.d. × 155 µm o.d. × 71 cm long capillary with -15 kV at the injection end and +2.3 kV at the microspray end. The injection was 10 s at -10 kV at the injection end and ground at the microspray end. The sample was 1.67 ng/µL of methionine-enkephalin (574.11+), leucineenkephalin (556.21+), oxytocin (504.12+), bombessin (810.72+), bradykinin 1-5 (287.32+), Luteinizing hormone releasing hormone (591.92+), [Arg8]vasopressin (542.82+), and bradykinin (530.82+). (B, bottom) CID spectrum of methionine-enkephalin with the theoretical fragmentation. The Y′′ ions are the C-terminal ions from a peptide bond breakage, the B ions are the N-terminal ions from a peptide bond breakage, A ions are the B ions less CO, and the * indicates the loss of NH3.
Analysis of Protein Digests. We next examined whether this system was capable of identifying proteins by their amino acid sequence at high sensitivity. The first protein we studied was bovine β-lactoglobulin. The protein was digested with trypsin and 50 fmol of the resulting peptide mixture was applied to a 50µm-i.d. capillary and separated. Peptides eluting from the capillary were detected by microspray ESI-MS, selected ions were subjected to CID, and the resulting product ion spectra were used to search sequence databases. Figure 3 shows the base peak representation of the tandem mass spectra. All the detectable peptides eluted in less than 16 min. Rapid scanning in both the MS and MS/MS mode allowed the detection and acquisition of MS/MS spectra for peptides coeluting in the same peak. The switch to the selection of a second ion eluting in a peak or the switch to acquire a second set of MS/MS spectra within the same peak is indicated by a discontinuity or shoulder on the peak displayed. A total of 12 different CID spectra were generated. SEQUEST was used to search the OWL composite protein sequence database39 and a database containing all bovine sequences extracted from the OWL database with the uninterpreted MS/MS spectrum of each selected ion. Table 1 summarizes the data obtained from this
MH1+
corr coeff
∆ corr factor
673.3 2314.7
2.77 3.3
0.28 0.3
573.3 903.5 916.5 1066.2 1194.4
2.30 1.68 3.09 2.84 2.52
0.04 0.10 0.12 0.38 0.475
sequence
position
GLDIQK VYVEELKPTPEGDLE ILLQK IIAEK TKIPAVFK IDALNENK VLVLDTDYK VLVLDTDYKK
25 f 30 57 f 76 87 f 91 92 f 99 100 f 107 108 f 116 108 f 117
number in protein
total score
1
2
3
4
5
β-lactoglobulin U01206 prothrombin precursor outer capsid protein V4 actin-like protein (actin-2)
70 18 16 10 10
7 1 0 1 1
0 0 2 0 0
0 1 0 0 0
0 0 0 0 0
0 1 0 0 0
protein β-lactoglobulin CEF33H1 S48515 maturation protein (IVA2) ferrichrome-iron receptor precursor (E. coli)
number in total score 1 2 3 4 5 56 16 10 10 10
5 1 1 1 1
0 0 0 0 0
1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
a (A, top section) Results of the search of the bovine sequence database with CID spectra from peptides of the β-lactoglobulin tryptic digest. All the masses are presented in their M1+. The values in italic are for the M3+ database search. The sequence position of the peptides within the β-lactoglobulin sequence (178 amino acids) is indicated. (B, middle section) Total score and breakdown for each rank for the protein identified from the CID spectra and the bovine sequence database. (C, botom section) Total score and breakdown for each rank for the protein identified from the CID spectra and the OWL composite sequence database.
experiment. Seven of the CID spectra resulted in matches with the bovine β-lactoglobulin sequence with significant correlation coefficients and ∆ correlation factors. Six of the spectra were generated from peptide ions in the 1+ or 2+ state and one from a Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
1825
peptide ion in the 3+ state (results in table). All of the matches had a ∆ correlation higher than 0.1 except for the peptide ion with m/z ) 573.7 Da for which the value was 0.04. The low ∆ correlation factor suggested that peptides different from the one matched to bovine β-lactoglobulin were matched with a similar correlation coefficient. Closer inspection of the results indicated that the sequence database contained four peptide sequences which differed by a leucine to isoleucine exchange or by a glutamic acid to lysine or glutamine substitution, respectively, from the β-lactoglobulin peptide Ile-Ile-Ala-Glu-Lys (residues 87-91). These substitutions result either in isobaric peptides or in peptides with a mass difference of 1 Da which were not differentiated in the mass spectrometer under the conditions used. Of the CID spectra that were not matched to β-lactoglobulin, only one had a correlation coefficient higher than 2.0. Since we deliberately set a low threshold to switch the mass spectrometer from MS mode to MS/MS mode, some of the CID spectra could have been generated from background noise. The result obtained by CZE/ MS/MS are comparable to those we previously obtained with the same sample using HPLC/MS/MS, except that two of the matches that were found by CZE/MS/MS were not identified by the LC/MS/MS experiment and that only 50 fmol of the digest was used in this study. The cumulative score for the identification of β-lactoglobulin using a sequence database containing all the bovine sequences was calculated as described in the Experimental Section and is shown in Table 1B. Seven CID spectra matched most strongly to β-lactoglobulin whereas no other protein in the sequence database correlated with more than one top-ranked individual CID spectrum. The identification of β-lactoglobulin is therefore highly significant. The same data were used to identify β-lactoglobulin using the OWL database (Table 1C). Even in this bigger database, β-lactoglobulin came out with the highest cumulative score. The peptide ions with masses of m/z ) 903.5, 916.5, 1066.2, 1194.4, and 2314.7 Da, respectively, scored in the first position, whereas no other protein sequence in the database was correlated with more than one top match. Furthermore, the β-lactoglobulin fragment at m/z ) 673.3 Da scored in third rank but the difference in scores among the top three matches for this peptide was not statistically significant. Similarly, the sequence for the ion at m/z ) 573.3 Da was ranked in second position together with 13 identical sequences from other proteins and isobaric peptides differing in isoleucine to leucine which were present in 67 proteins. Because this peptide shared its ranking position with peptides potentially derived from so many proteins, its score was not considered by the program. The arbitrary scoring system used here was sufficient to quickly and unambiguously identify β-lactoglobulin using the CID spectra automatically generated by the CZE/MS/MS experiment. The results obtained by searching the OWL sequence database suggested that the significance of the protein identification can be further increased if a weighted scoring matrix was implemented. Such a weighted scoring matrix would take into account that peptides matched with high correlation coefficients are qualitatively better matches than those with low correlation coefficients. It would also consider the fact that the chance of observing an identical or different but isobaric peptide from different proteins decreases with increasing peptide length. To test our method with a more complex peptide sample, we subjected 38 fmol of a BSA tryptic digest to CZE/MS/MS (Figure 1826 Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
Figure 4. Separation of a BSA tryptic digest on a 50-µm-i.d. × 155 µm o.d. × 70 cm long capillary. For the first 4 min the injection end was at -15 kV and after at -5 kV with the microspray end at +1.9 kV. The injection was 10 s of a 1.8 pmol/µL solution at -10 kV at the injection end and ground at the microspray end, resulting in a total sample applied of 38 fmol. The ions from Table 2A are indicated on the spectrum in the charge state in which they were observed in the mass spectrometer.
4). The resulting CID spectra were used to identify the protein by searching the OWL sequence database or a sequence database containing all known bovine sequences. A total of 32 different CID spectra was generated during the experiment. Using the bovine sequence database, 11 of these peptides showed a highly significant correlation coefficient with bovine serum albumin (BSA) or its precursor and one showed the best match with BSA, albeit with a correlation coefficient of lower significance (peptide 598-607, correlation coefficient 1.64) (Table 2A). The remaining 20 CID spectra generated in the experiment showed correlation coefficients below 2.0. Nineteen of those spectra showed correlation coefficients lower than 1.8. We found that most of these spectra did not correlate with a protein in the sequence database but represented background signals which were subjected to CID. This was caused by the threshold selected to switch the instrument from the MS to the MS/MS mode, which was set low enough so that a fluctuation in the background signal or a change in matrix composition could trigger the MS/MS mode. These results indicate that the presence of peptides from the digest can be discerned from the background-generated CID spectra, suggesting that the level of the threshold is not critical and that results are not significantly affected by contaminants eluting from the column. Using the same composite scoring system described above for β-lactoglobulin, BSA was clearly identified by searching both the bovine sequence database (Table 2B) and the OWL database (Table 2C). With both databases BSA showed the highest composite score as well as the highest number of first-rank matches for individual peptides. As expected, there were differences in the results obtained from searching the two databases that reflect the different database sizes and the fact that the OWL database contains homologous sequences from different species. The observed differences did not, however, affect the identification of BSA as the protein under investigation. Searching the OWL database, the m/z ) 1003.2 Da ion was assigned to BSA in the third position and the peptide at m/z ) 818.9 was not matched to
Table 2a MH1+
corr coeff
∆ corr factor
sequence
position
975.9 1164.3 886.9 923.1 1440.7 1306.5 1480.7 1640.9 1143.4 1015.2 818.9 1003.2
2.31 3.53 2.17 2.56 5.36 4.42 4.77 3.47 2.89 2.83 2.22 1.64
0.34 0.38 0.40 0.21 0.52 0.55 0.66 0.51 0.06 0.35 0.18 0.31
DLGEEHFK LVNELTEFAK DDSPDLPK AEFVEVTK RHPEYAVSVLLR HLVDEPQNLIK LGEYGFQNALIVR KVPQVSTPTLVEVSR KQTALVELLK QTALVELLK ATEEQLK LVVSTQTALA
37 f 44 66 f 75 131 f 138 249 f 256 360 f 371 402 f 412 421 f 433 437 f 451 548 f 557 549 f 557 562 f 568 598 f 607 number in
protein
total score
1
2
3
4
5
bovine serum albumin bovads tryptophanyl-TRNA synthetase cylicin I CGMP-gated cation channel protein
120 18 16 16 14
12 1 1 0 1
0 1 0 2 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 2
number in protein
total score
1
2
3
4
5
bovine serum albumin rat serum albumin precursor A47391 S35270 PFU14189
96 10 10 10 10
9 1 1 1 1
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
a (A, top section) Results of the search of the bovine sequence database with CID spectra form peptides of a BSA tryptic digest. All the masses are presented in their M1+. The values in italic are for the M3+ database search. The position of the peptides within the BSA sequence (607 amino acids) is given. (B, middle section) Total score and breakdown for each rank for the protein identified from the CID spectra and the bovine sequence database. (C, bottom section) Total score and breakdown for each rank for the proteins identified from the CID spectra and the OWL composite sequece database.
BSA. The m/z ) 975.0 Da ion was assigned to the rat serum albumin precursor in the first position. Since the only difference in this peptide sequence between rat and bovine serum albumin is an exchange from glutamine to glutamic acid, which results in a mass difference of 1 Da, under the experimental parameters used in this experiment the peptides could not be differentiated. Collectively, these results show that with this method proteins can be conclusively identified at the low-femtomole level, even in cases in which a relatively complex peptide mixture is applied and large-sequence databases are searched. DISCUSSION We have described a method based on CZE/MS/MS using a microspray ion source for the rapid identification of proteins by their amino acid sequence at the low-femtomole sensitivity level. The absolute limits of detection (LODs) obtained with the system equipped with a 20-µm capillary were 300 amol for operation in the MS mode and 600 amol for operation in the MS/MS mode. These sensitivities are a marked improvement over results obtained by CZE/electrospray-MS using conventional ion sources. This LOD can be further improved by decreasing the size of the capillary to 5 or 10 µm. The concentration limit of detection achieved with this method was less impressive. The data in Figure 1 were obtained at a concentration of 250 fmol/µL. We
have successfully performed experiments at 50 fmol/µL (result not shown). This concentration limit of detection will have to be improved if the objective is the identification of low-femtomole amounts of proteins. We are currently exploring the use of preconcentration techniques to improve the concentration limit of detection.45-47 At less than 20 min separation time, CZE/MS/MS proved to be a rapid technique for the analysis of protein digests. Lowfemtomole amounts of sample (Figure 3 at 50 fmol, and Figure 4 at 38 fmol) were sufficient for the unambiguous identification of proteins by their amino acid sequence. In both cases, not all of the expected tryptic peptides were observed. It appears that for some peptides the signal was below the threshold value set to trigger the MS/MS mode of the mass spectrometer. In the case of a more complex peptide mixture as represented by the BSA digest, some peptides failed to produce CID spectra, even though the signal size exceeded the threshold value. This was due to parameters set in the data acquisition program. The program was set so that only the two most intense ions in every MS spectrum were selected for CID. Therefore, if a signal of lower intensity comigrated with two principal ions, no CID spectra were generated for that ion. This problem will be corrected in a future version of our data acquisition program. Also, because of the narrow peak width obtained by CZE, it is possible that there is not enough time to generate CID spectra for two coeluting peptides. This problem can be reduced by stepping the electric field (see Figure 4 for example). In this case, high-voltage is applied across the capillary until the analytes migrate out of the capillary, at which time the electric field strength is reduced. The reduced electric field strength decreases both electrophoretic mobility and electroosmotic flow. Therefore, more CID spectra can be generated within a single peak. However, decreasing the flow of analyte also decreases the intensity of the signal generated. In experiments performed in this laboratory using RP-HPLC/ ESI-MS/MS with the same protein digests (result not shown), the SEQUEST program matched a higher number of peptides to the protein under investigation than the method described in this study. Using the HPLC technique, the whole sample (20-50 µL) was applied to the column. The column concentrates the analytes by a factor of 5-10. In the present technique based on CZE, only a small fraction of the sample was injected and no concentrating techniques were used. The difference in the number of matches between the CZE- and HPLC-based technique was mainly due to the poorer concentration limit of detection using CZE. We are confident that using a precolumn concentration technique the CZE-based approach will generate results that exceed the ones obtained by HPLC. However, it is important to stress the complementarity of the two approaches. It appears that some of the ions identified by CZE were not identified by HPLC. This might be due to solvent and matrix peaks in HPLC experiments, which overshadow some peptides, in particular very early eluting species, or to the fact that very hydrophobic peptides are difficult to recover from RP-HPLC columns. Using the method described here, both techniques could be used in parallel without any need for additional samples. The data generated in this study were useful to clarify a sequence ambiguity in BSA sequence in Genbank. The database (45) Swartz, M. E.; Merion, M. J. J. Chromatogr. 1993, 632, 209-13. (46) Beattie, J. H.; Self, R.; Richards, M. P. Electrophoresis 1995, 16, 322-28. (47) Strausbauch, M. A.; Landers, J. P.; Wettstein, P. J. Anal. Chem. 1996, 68, 306-14.
Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
1827
indicates that the sequence for BSA around residue 40 is DLGEEQFK, whereas the sequence in the same region for BSA precursor is DLGEEHFK. The peptide fragment at m/z ) 975.0 from BSA (Table 2A) showed the best match to the sequence DLGEEHFK, which corresponds to the BSA precursor. We did not observe a match to the BSA sequence DLGEEQFK. Therefore, our data clearly identify the amino acid residue in position 42 of BSA as H and exclude Q in that position. Frequently, functionally related proteins share stretches of conserved sequence, particularly in regions of the protein that are functionally essential. The method described here appears to be suitable to identify related proteins in sequence families or homologous proteins from different species. This is illustrated by the triply charged peptide ion at m/z ) 771.5 which was derived by tryptic digestion of bovine β-lactoglobulin. This peptide matched with the highest score to bovine β-lactoglobulin and goat β-lactoglobulin. The only difference between the peptide from the two
1828
Analytical Chemistry, Vol. 68, No. 11, June 1, 1996
species is an exchange of aspartic acid (MW 113.6) in the bovine fragment with asparagine (MW 114.10) in the goat fragment. ACKNOWLEDGMENT This work was supported by the National Science Foundation Science and Technology Center for Molecular Biotechnology and the Canadian Genetics Disease Network of Centres of Excellence. The authors are grateful to J. Eng and J. Yates (University of Washington) for making advanced versions of the SEQUEST program available and to R. Figeys for editorial help. D.F. acknowledges a postdoctoral fellowship from NSERC (Canada). Received for review February 28, 1996. Accepted April 12, 1996. AC960191H X
Abstract published in Advance ACS Abstracts, May 1, 1996.