Method To Compare Collision-Induced Dissociation Spectra of

Ari M. Frank , Nuno Bandeira , Zhouxin Shen , Stephen Tanner , Steven P. Briggs ..... Henry Lam , Eric W Deutsch , James S Eddes , Jimmy K Eng , Steph...
0 downloads 0 Views 139KB Size
Anal. Chem. 1998, 70, 3557-3565

Method To Compare Collision-Induced Dissociation Spectra of Peptides: Potential for Library Searching and Subtractive Analysis John R. Yates, III,*,† Scott F. Morgan,† Christine L. Gatlin,† Patrick R. Griffin,‡ and Jimmy K. Eng†

Department of Molecular Biotechnology, University of Washington, Box 357730, Seattle, Washington 98195-7730, and Department of Chemistry, Merck and Co. Inc., P.O. Box 2000 R80-A23, Rahway, New Jersey 07065-0900

We report the development of a method to compare collision-induced dissociation (CID) spectra of peptides. This method employs a cross-correlation analysis of a CID spectrum to a reference spectrum and normalizes the cross-correlation score to the autocorrelation of the CID spectra. The query spectrum is compared by using both mass information and fragmentation patterns. Fragmentation patterns are compared to each other using a correlation function. To evaluate the specificity of the approach, a set of 2180 tandem mass spectra obtained from both triple-quadrupole tandem mass spectrometers (TSQ) and quadrupole ion trap mass spectrometers (LCQ) was created. Comparisons are performed between tandem mass spectra obtained on the same instrument type as well as between different instrument types. Accurate and reliable comparisons are demonstrated in both types of analyses. The scores obtained in the cross-comparison of TSQ and LCQ tandem mass spectra of the same peptide are found to be slightly lower than comparisons performed with spectra obtained on the same instrument type. The method appears insensitive to variations in day-to-day performance of the instrument, minor variations in fragment ion abundance, and instrumental differences inherent in the same instrument model. The use of this method of comparison is demonstrated for library searching and subtractive analysis of tandem mass spectra obtained during LC/MS/MS experiments. The marriage of electrospray ionization and tandem mass spectrometry has led to widespread use of these techniques for the analysis of biomolecules. In particular, tandem mass spectrometry has proven to be valuable for the determination of peptide sequences.1,2 Collision-induced dissociation (CID) of peptides produces a distinctive pattern that is characteristic of and diagnostic for an amino acid sequence. Manual de novo or computational methods can determine peptide sequences.3-5 Recent approaches have used protein and nucleotide databases * Corresponding author: (phone) 206 685-7388; (fax) 206 685-7301; (e-mail) [email protected]. † University of Washington. ‡ Merck and Co. (1) Biemann, K. Methods Enzymol. 1990, 193, 455-79. (2) Hunt, D. F.; Yates, J. R., III; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-8238. S0003-2700(98)00122-X CCC: $15.00 Published on Web 07/30/1998

© 1998 American Chemical Society

to determine the amino acid sequences of peptides represented by tandem mass spectra.6-8 The latter makes use of the sequence infrastructure created by genome projects to match CID spectra of peptides to protein or gene sequences in databases. A third approach to mass spectral data analysis, although not used for peptide CID mass spectra, is to accumulate mass spectra to form reference libraries.9 Over the years, a large reference set has been created for electron impact (EI) mass spectra of small molecules. Several different approaches have been developed for the comparison of query mass spectra to the mass spectra contained in reference libraries (see reviews in refs 9 and 10). Such comparisons are sensitive to factors such as mass accuracy and instrument tuning parameters.11 The need to standardize acquisition of electron impact mass spectra necessitated the definition of both a quality index12 and a set of standard conditions. Attempts at creating reference libraries of CID tandem mass spectra quickly recognized the need to define and standardize operating conditions.13,14 The appearance of CID spectra is sensitive to conditions such as collisional cross sections and collision energies. These conditions have been difficult to standardize to achieve uniform spectra suitable for searching reference databases. Recently, Sanders et al. demonstrated the ability to use ion trap tandem mass spectra of small organic molecules to search libraries of spectra created on an ion trap mass spectrometer.15 The creation of reference (3) Johnson, R. S.; Biemann, K. Biomed. Environ. Mass Spectrom. 1989, 18, 945-957. (4) Yates, J. R.; Zhou, J.; Griffin, P. R.; Hood. L. E. In Techniques in Protein Chemistry II; Villafranca, J. J., Ed.; Academic Press: San Diego, 1990, pp 477-485. (5) Hines, W. M.; Falick, A. M.; Burlingame, A. L.; Gibson, B. W. J. Am. Soc. Mass Spectrom. 1992, 3, 326-336. (6) Eng, J. K.; McCormack, A. L.; Yates, J. R., III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (7) Yates, J. R., III; McCormack, A. L.; Eng, J. K. Anal. Chem. 1996, 68, 534A540A. (8) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-9. (9) Martinsen, D. P.; Song, B.-H. Mass Spectrom. Rev. 1985, 4, 461-490. (10) Martinsen, D. P. Appl. Spectrosc. 1981, 35, 255-66. (11) Milne, G. W.; Budde, W. L.; Heller, S. R.; Martinsen, D. P.; Oldham, R. G. Org. Mass Spectrom. 1982, 17, 547-552. (12) Speck, D. D.; Venkataraghavan, R.; McLafferty, F. W. Org. Mass Spectrom. 1978, 13, 209-213. (13) Dillard, J. G.; Heller, S. R.; McLafferty, F. W.; Milne, G. W.; Venkataraghavan, R. Org. Mass Spectrom. 1981, 16, 48-49. (14) Crawford, R. W.; Brand, H. R.; Wong, C. M.; Gregg, H. R.; Hoffman, P. A.; Enke, C. G. Anal. Chem. 1984, 56, 1121-1127.

Analytical Chemistry, Vol. 70, No. 17, September 1, 1998 3557

sets of CID spectra has never gained the momentum of the similar efforts that created EI spectral reference libraries. A reference set of mass spectra for interpretation of peptide sequences would encompass such a large number of spectra that it would be difficult to acquire a suitable set that would be analytically useful. Recently, Eng et al. developed an approach to search protein and nucleotide databases using a cross-correlation function to compare model tandem mass spectra reconstructed from amino acid sequences to experimental tandem mass spectra.6,16,17 This approach has the form of a mass spectral pseudolibrary search and has been successfully applied to CID mass spectra of peptides obtained under low-energy conditions on triple-quadrupole mass spectrometers, high-energy CID mass spectra from tandem double-focusing mass spectrometers, and matrix-assisted laser desorption/ionization postsource decay mass spectra.6,18,19 Fully accurate reconstruction of tandem mass spectra is currently not possible, but the method employed by Eng et al. provides a sensitive comparison nevertheless. The results obtained by Eng et al. suggest correlation analysis may provide a suitable method to compare CID spectra of peptides.6 Correlation analysis techniques have been used to automate detection of spectral information in ICP-AES and to compare spectral similarity of infrared spectra and UV spectra.20-22 Powell and Heiftje developed a computer-based searching method for IR spectra by using the cross-correlation function to compare spectral files.23 They noted the method was relatively free from the effects of instrumental variation and chemical contamination. The major drawback to the method was the speed of the search since computers were relatively slow in 1974. More recently, Owens pioneered the use of the cross-correlation function to search electron impact mass spectra of small organic molecules through a reference library.24 In this paper, we describe a method employing cross-correlation analysis to compare tandem mass spectra of peptides and demonstrate its potential for library searching and comparison of LC/MS/MS analyses. EXPERIMENTAL SECTION Peptide and Protein Sources. Peptide tandem mass spectra were accumulated from a variety of sources and experiments. Some of the peptides acquired from proteins of known sequence were obtained from the following commercial sources. R-Casein, bovine (Catalog No. C-6780, Lot No. 78F-9555), R-lactalbumin, bovine (Catalog No. L-4379, Lot No. 75C-8110), cytochrome c, chicken (Catalog No. C-0761, Lot No. 11H-7030), amino acylase, porcine (Catalog No. A-7264, Lot No. 26F-9705), and ribonuclease A, bovine (Catalog No. R-5503, Lot No. 49F-8000) were obtained (15) Josephs, J.; Sanders, M.; DiDonato, G.; Hail, M.; Kerns, E.; Volk, K.; Lee, M. The 45th ASMS Conference on Mass Spectrometry and Allied Topics, Palm Springs, CA, 1997; p 681. (16) Yates, J. R., III; Eng, J. K.; McCormack, A. L. Anal. Chem. 1995, 67, 32023210. (17) Yates, J. R., III; Eng, J. K.; McCormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-1436. (18) Yates, J. R.; Eng, J. K.; Klausner, C.; Burlingame, A. L. J. Am. Soc. Mass Spectrom. 1996, 7, 1089-1096. (19) Griffin, P. R.; MacCoss, M. J.; Eng, J. K.; Blevins, R. A.; Aaronson, J. S.; Yates, J. R. Rapid Commun. Mass Spectrom. 1995, 9, 1546-51. (20) Reid, J. C.; Wong, E. C. Appl. Spectrosc. 1966, 20, 320-325. (21) Horlick, G. Anal. Chem. 1973, 45, 319-324. (22) Tanabe, K.; Saeki, S. Anal. Chem. 1975, 47, 118-122. (23) Powell, L. A.; Heiftje, G. M. Anal. Chim. Acta 1978, 100, 313-327. (24) Owens, K. G. Appl. Spectrosc. Rev. 1992, 27, 1-49.

3558 Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

from Sigma Chemical Co. (St. Louis, MO). A protein mixture containing six proteins, phosphorylase b (rabbit muscle), serum albumin (bovine), ovalbumin (hen), carbonic anhydrase (bovine), trypsin inhibitor (soybean), and lysozyme (hen) (SDS-PAGE low molecular weight protein standards (Lot No. 27074)) was obtained from BioRad (Richmond, CA). Sequencing grade trypsin (Catalog No. 1047-841, Lot No. 12676220-10) was obtained from Boehringer Mannheim (Indianapolis, IN). Peptides from Saccharomyces cerevisiae, Escherichia coli, and Haemophilus influenzae were generated as previously described.6,25,26 Peptides were generated from intact proteins by digestion with the enzyme trypsin in 50 mM Tris-HCl, pH 8.6, or 50 mM ammonium bicarbonate, pH 8.6, for 4-8 h at 37 °C. Hemoglobin (mutant and normal) was digested using separate aliquots of proteins in the presence of endoproteinase Glu-C, subtilisin, chymotrypsin, and trypsin. All digestion procedures used 50 mM ammonium bicarbonate, pH 8.6. Digestion times ranged from 1 to 8 h. Peptides from the digested protein were combined into one container and analyzed together. Microcolumn High-Performance Liquid Chromatography. Peptides were analyzed as single components or as mixtures using microcolumn high-performance liquid chromatography. Microcolumns were made by using the method of Kennedy and Jorgenson employing 98-µm-i.d. fused-silica capillary tubing.27 The columns were packed with Perseptive Biosystems (Boston, MA) POROS 10 R2, a 10-µm reversed-phase packing material, to a length of 15-20 cm. Samples were injected onto the column as previously described.27 During injection, the effluent from the end of the column was collected with a 1-5-µL graduated glass capillary to measure the amount of liquid displaced from the column. Once a sufficient volume had been injected, the column was connected to the HPLC pumps. HPLC was performed using Applied Biosystems (Foster City, CA) 140B Microgradient pump with dual syringe pumps and a 250-µL dynamic mixer. The flow from the pump was reduced from 100 to 1 µL/min using a splitting tee and a length of restriction tubing made from fused silica. The mobile phase used for gradient elution consisted of (A) 0.5% acetic acid and (B) acetonitrile/water 80:20 (v/v) containing 0.5% acetic acid. The gradient was linear from 0 to 80% B over 30 min. The fritted end of the column was inserted directly into the electrospray needle. A sheath liquid flowed concentrically around the end of the column at a flow rate of 2-3 µL/min and was a methanol/water (70:30) mixture containing 0.1% acetic acid. Electrospray Ionization and Tandem Mass Spectrometry. Tandem mass spectra were recorded on a Finnigan MAT (San Jose, CA) TSQ700 equipped with an electrospray ionization source as previously described and a Finnigan MAT LCQ ion trap mass spectrometer.6,16 Electrospray ionization was performed using the following conditions. The needle voltage was set at 4.6 kV. The sheath and auxiliary gases consisted of nitrogen gas (99.999%) and were set at 20 psi and 5 units, respectively. The heated (25) Link, A. J.; Hays, L. G.; Carmack, E. B.; Yates, J. R. Electrophoresis 1997, 18, 1314-1334. (26) Link, A. J.; Carmack, E.; Yates, J. R. Int. J. Mass Spectrom. Ion Processes 1997, 160, 303-316. (27) Yates, J. R.; McCormack, A. L.; Hayden, J. P.; Davey, M. P. In Cell Biology: A Laboratory Handbook; Celis, J., Ed.; Academic Press: San Diego, 1994, Vol. 3, pp 380-388. (28) Taylor, S. W.; Waite, J. H.; Ross, M. M.; Shabanowitz, J.; Hunt, D. F. J. Am. Chem. Soc. 1994, 116, 10803-10804.

capillary temperature was set at 150 °C and a potential of 70 V was placed on the capillary. Product ion tandem mass spectra were acquired “on the fly” during an LC/MS analysis by using a computer program written in the TSQ700’s instrument control language (ICL). At the start of the program, the instrument was configured to scan the second mass analyzer (Q3) from a m/z of 400 to 1400 at 1.5 s/scan. When the ion current for a particular m/z value was above a preset threshold value of ∼250 000 counts, the program switched the instrument to scan the product ion MS/ MS configuration. The m/z value measured in the main beam mode was used to set the precursor ion m/z value in the first mass analyzer. The mass range was set at 50 u to 2 times the precursor ion m/z value with a scan time of 1.5 s during acquisition of product ion tandem mass spectra. Five scans were acquired, and the instrument returned to the main beam mode of scanning. All analyses were performed with the collision cell filled with argon to a pressure of 3.5 mTorr. The collision energy for product ion scans was set by dividing the m/z value of the precursor ion by a constant (-35). This process was repeated throughout the LC analysis. Peak widths in the second mass analyzer ranged from 1.5 to 2.0 u. Tandem mass spectra were acquired on the LCQ ion trap mass spectrometer using the automated MS/MS program. Three microscans were acquired over the mass range of 400-2000. The most abundant ion above a threshold value of 5 × 105 was selected for MS/MS analysis using an isolation window 3 u wide. A relative excitation energy of 35% was used to dissociate the precursor ion. The ions are unit resolved across the mass range. Computer Algorithms. All computer programs were written in the C-programming language on a UNIX operating system. Output from the spectral library search program is formatted in hypertext markup language for display within a web browser. The first step in the library search analysis requires generating a spectral library from known samples. A program was created to parse through the automated MS/MS analysis file, pull out the spectral data, and store the information into separate library files. This process is performed by determining all related MS/MS scans for a given peptide, concatenating the multiple spectra into one spectrum, and storing the spectral information in raw form as mass/intensity pairs. Annotation of each spectrum can be performed by including information about the organism, peptide sequence, charge state, peptide mass, experiment, etc., within each library file. All query and reference spectra are normalized, smoothed, and peak-detected using the same process prior to the spectral comparison. The normalization and smoothing algorithm passes a 5-u window across each spectrum where at each 1-u data increment the overlapped peaks are multiplied by the appropriate window weight and the products summed and divided by a constant factor to give a single smoothed value. The window weights were set to 1, 4, 6, 4, 1, and the constant factor was set at 16.0 times the intensity of the most abundant ion in the spectrum. As the window travels across the whole spectrum, the smoothed spectrum is generated. The peak detection algorithm takes in the smoothed spectrum and generates a peak-detected spectrum. In the first pass, a 100-u window is passed across the spectrum, and the mean and standard deviation of the signals within the window are determined. Peaks that are a standard

deviation above the mean are extracted. The window is then passed across the spectrum a second time where peaks that are two standard deviations above the mean are extracted. The peakdetected spectrum is processed by normalizing ion intensities to a uniform maximum intensity (50) across a set of windows. We found the results improved considerably using this procedure relative to no preprocessing of the tandem mass spectrum. The spectral comparison consists of two steps. The processed input spectrum is searched through the reference set, and each reference spectrum with the appropriate precursor value ((5 u) is compared using a cross-correlation analysis. The crosscorrelation score is normalized to the autocorrelation of the input spectrum. Both the cross-correlation and autocorrelation are implemented via Fourier transform analysis, i.e., transforming both the input spectrum and the reference spectrum, multiplying one transform against the complex conjugate of the other, and inverse transforming the product. A modified correlation score is calculated and stored for each candidate library spectrum by subtracting out the average of the cross-correlation from (75 displacement from the correlation coefficient at zero value where a unit displacement corresponds to 1 u. A value of (75 was found empirically in prior studies to give the best results for these studies.6 The autocorrelation score, comparing the unknown input spectrum to itself, is representative of the best match that can be made with the unknown input spectrum and allows normalization of the score. The modified autocorrelation score is used to rank the output results by generating a normalized score, based on the following formula:

search score ) 10.0

CCmod precursorinput ACmod precursorlibrary

In the final output, spectral matches are ranked from best to worst by this normalized score where values near 0.0 represent a poor match and values near 10.0 represent a good match. A set of 2180 CID spectra of peptides was created for these studies. There were 1170 unique peptides represented in the reference set with (M + H)+ molecular weight values ranging from 478.6 to 2976.2 u. These spectra were obtained from the proteins described above. All of the spectra were acquired under normal operating conditions, and there was no special care used to standardize operating conditions beyond normal tuning and calibration of the instruments. RESULTS AND DISCUSSION The objective of this research was to develop a method to compare the CID spectra of peptides. Several applications of a method to compare tandem mass spectra can be envisioned. Reference sets of CID spectra as a consequence of or for use in biological experiments could be constructed as a data analysis resource. For example, a reference set of contaminants or peptide tandem mass spectra observed in a control experiment could be built without the need to know the structure of the molecule or sequence of the peptides. Additionally, CID spectra could be acquired of model peptides for the various outcomes of an experiment to aid in data analysis. A third possibility, premised on improvements in the ability to acquire tandem mass spectra comprehensively for a liquid chromatography experiment, is to Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

3559

Figure 1. Scoring results for 100 ion trap tandem mass spectra and 100 triple-quadrupole tandem mass spectra through a reference set of 2000 spectra. Score is plotted on the y-axis, and spectrum number is plotted on the x-axis.

compare CID spectra as part of a subtractive analysis technique of data acquired in LC/MS/MS analyses. Tandem mass spectra unique to an experiment could be targeted for database searching or de novo interpretation. Last, a CID spectrum contains a specific pattern that could be used as an informative marker to identify organisms or some other feature of an organism. To develop and test computer software for the comparison of CID spectra of peptides, several criteria were set. We wanted to be able to compare triple-quadrupole (TSQ) spectra against a reference set that contained CID spectra generated by both triplequadrupole and ion trap (LCQ) mass spectrometers. The method needed to tolerate variations in instrument performance to minimize the need to standardize operation. Last, a scoring method was required that would distinguish correct answers from false positives. A set of software tools was also developed to extract and store CID spectra in the reference format and to allow visual analysis of the results. A format was established for the reference file that included several different parameters. The first parameter is the m/z value of the precursor ion. The m/z value of the precursor ion was selected, as opposed to peptide mass, since it is a parameter that is independent of charge state and thus not complicated by the need to deconvolute the ion and assign a mass value. The second parameter is the CID fragmentation pattern, and the last piece of information is the annotation of the experiment. Annotations are added in text form and can include any relevant information about the experiment, e.g., growth conditions, peptide sequence, etc. A computer program is used to parse the MS/MS data from a data acquisition file for inclusion in a reference set. This program will currently work on any file generated by a Finnigan MAT TSQ or LCQ mass spectrometer. In the process of parsing the data acquisition file, all related MS/MS scans (generally consecutive 3560 Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

Figure 2. A search using a TSQ tandem mass spectrum resulting in a positive match to a tandem mass spectrum in the reference set of >2000 spectra. The tandem mass spectrum displayed in the upper panel represents a peptide, ILELAGFLDSYIPEPER, from E. coli and was used as the input query. The resulting match to the library spectrum is shown in the bottom panel. A score of 7.27 was obtained for this match.

scans of the same precursor ion) are extracted and the scans summed together and stored. Annotation information can be included with each individual MS/MS data file, or the same annotation can be included in the MS/MS data file for all spectra obtained from a single LC/MS/MS analysis. Once a reference set has been established in the proper format, a program, LIBQUEST, is used to search through the reference set to find the best match. Initially, all spectra in the library are filtered on the basis of the m/z value of the precursor ion in the unknown spectrum. If a reference spectrum has a precursor ion m/z value within a user-defined tolerance of the query value, that reference spectrum is further analyzed. All library spectra passing the precursor ion filter remain candidates to match the unknown spectrum. Key words can also be combined with the m/z value and tandem mass spectrum as an additional search filter. To identify an appropriate scoring value to judge the success of a search, the scores for a random selection of 100 TSQ and 100 LCQ spectra searched against TSQ and LCQ spectral libraries (2180 spectra), respectively. These plots are displayed in Figure 1. In each set of search inputs, a portion of the input data had the same spectrum in the library while the rest of the input data had no corresponding spectrum in the library. In general, a correct result for a search produces a score greater than 6.0.

Table 1. Results of Searches of Tandem Mass Spectra Obtained on a TSQ Located at Merck and Co., Rahway, NJ, Compared against a TSQ Spectral Library Generated at the University of Washingtona scan no.

peptide sequence

present in library

top score

182-187 196-201 204-209 224-229 248-253 256-261 264-269 285-290 312-317 320-325 340-345 351-356 359-364 367-372 387-392 398-403 406-411 422-427 469-474 525-530

LKECCDKPLLEK YLYEIAR EIWGVEPSR QRLPAPDEKIP GLAGVENVTELKK HLVDEPQNLIK GDFQFNISR HQQQFFQFR KVPQVSTPTLVEVSR LVNELTEFAK HLQIIYEINQR IGLNCQLAQVAER QLLTPLRDQFTR RHPEYAVSVLLR QTALVELLK SLHTLFGDELCK VLYPNDNFFEGKELR LGEYGFQNALIVR QIIEQLSSGFFSPK DAFLGSFLYEYSR

no yes no no yes yes no no yes yes no no no no yes no no yes no yes

0.54 7.52 2.05 1.03 7.37 6.80 2.11 1.80 6.48 6.02 1.53 1.67 2.17 2.02 8.82 1.63 1.53 6.69 2.65 8.07

a For each query peptide that had a corresponding spectrum in the library (indicated with a “yes” in the “present in library” column), the search results were positive.

Figure 3. Search results from a negative control. A copy of the tandem mass spectrum used as the query was not present in the reference set. The tandem mass spectrum displayed in the upper panel is from the peptide LVADSITSQLER and is used as the input spectrum. The search matched the input spectrum to the tandem mass spectrum of the peptide LSSESVIEQIVK. A score of 1.90 was obtained for this search.

Scores that fall in the range of 6.0-4.5 bear further investigation and scores falling below 4.5 are incorrect. The average score for an incorrect result of a TSQ search was 1.96 with a standard deviation of 1.02. For an incorrect result of an LCQ search, the average score was 1.96 with a standard deviation of 0.80. The average score for a correct result of a TSQ search was 7.53 with a standard deviation of 1.26. For a correct result of an LCQ search, the average score was 7.33 with a standard deviation of 1.32. These values are consistent for both small and large reference sets. Shown as the top spectrum in Figure 2 is a tandem mass spectrum acquired on a TSQ mass spectrometer for an E. coli peptide ILELAGFLDSYIPEPER that is known to be in the reference set. When searched against the reference set of TSQ spectra, the top library match is a tandem mass spectrum of the same peptide. This tandem mass spectrum is shown in the bottom spectrum of Figure 2. The spectra matched quite well with a score of 7.27. A search was then performed with a tandem mass spectrum that was not present in the reference set. The spectrum of this peptide, LVADSITSQLER from E. coli, and the top ranking result, peptide LSSESVIEQIVK from S. cerevisiae, are shown in Figure 3. The score for this search was 1.90. Ideally, a method to compare tandem mass spectra should be relatively independent of instrumental conditions. One would not want to rigorously standardize instrument conditions to acquire spectra for searching the reference set. To judge the ability of

the method to use tandem mass spectra acquired in a different laboratory, a comparison of spectra obtained on a TSQ instrument from another laboratory was undertaken (P. Griffin, Merck and Co.). No attempt was made to standardize instrument performance beyond normal tuning of the instrument for peptide analysis. The reference set of TSQ data was generated in the laboratory at the University of Washington over a period of two years using a TSQ700 triple-quadrupole mass spectrometer. Both the query data and the reference set are peptides derived from tryptic digestion of the low-molecular-weight protein standards used to calibrate molecular weight measurements in gel electrophoresis. The results from the search are shown in Table 1. Of the 20 query spectra, there are 8 spectra with a corresponding spectrum in the reference set. All of these 8 searches resulted in correct matches with average scores of 7.28 while the other 12 searches with no corresponding spectrum in the reference set had average scores of 2.12. By demonstrating that tandem mass spectra obtained from two different laboratories could be compared using this approach, the potential for laboratory-independent analyses is demonstrated. A more difficult challenge is instrument independence. A study was conducted to determine whether CID spectra from triple-quadrupoles could be matched to CID spectra obtained on an ion trap mass spectrometer. A major feature of TSQ tandem mass spectra of peptides is the precursor ion. In contrast, the LCQ ion excitation process results in fragmentation and/or ejection of the precursor from the ion trap so very little of the precursor ion is present in the resulting spectrum. While both CID methods employ low-energy-collision conditions, ion trap mass spectra of peptides created by trypsin digestion can exhibit more abundant b-type fragment ions. This situation is created because the excitation of ions in the ion trap is m/z specific. Once an ion fragments, its excitation frequency changes and no longer Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

3561

Table 2. Results Obtained Searching a Reference Set Containing Tandem Mass Spectra Obtained on Both Triple-Quadrupole Mass Spectrometers and Ion Trap Mass Spectrometersa scan no.

peptide sequence

in library?

type of spectrum

top score

632-635 674-677 741-744 747-750 789-792 885-888 921-924 933-936 957-960 993-996 999-1002 1059-1062 1071-1074 1191-1194 1209-1212 1257-1260 1299-1302 1329-1332 1389-1392 1419-1422 1473-1476 1587-1590 1665-1668 1689-1692 1719-1722 1767-1770 1810-1813 1864-1867 1924-1927

TSQ Input LYVSHIQVNQAPK TYAAEIAHNISAK IVQIMQNPTHYK VLEQLSGQTPVQSK AQRPITGASLDLIK FVQGLLQNAAANAEAK VGYTLPSHIISTSDVTR VLNSYWVNQDSTYK LSSESVIEQIVK YFEVILVDPQHK GSSSLYTLVINDAGK VAAVETLYQDMAAR YIQTEQQIEVPEGVTVSIK ALPDAVTIIEPKEEEPILAPSVK GNVGFVFTNEPLTEIK ASLNVGNVLPLGSVPEGTIVSNVEEKPGDR FADGFLIHSGQPVNDFIDTATR IPEIPLVVSTDLESIQK TVAVDSVFEQNEMIDAIAVTK GYLADDIDADSLEDIYTSAHEAIR TIAETLAEELINAAK NMIIVPEMIGSVVGIYNGK EFQIIDTLLPGLQDEVMNIKPVQK QAINLGQVVLTPLTFALPR DVAAQDFINAYASFLQR GAIVGPDLAVLALVIVK SYIFGGHVSQYMEELADDDEERFSELFK YGILSIDDLIHEIITVGPHFK HTLDIINVLTDQNPIQVVVDAITNTGPR

yes yes yes yes yes yes yes yes yes yes yes yes yes yes no yes yes yes yes yes yes yes yes yes yes yes yes yes yes

LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ TSQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ

6.09 6.79 4.90 5.73 6.39 5.69 6.11 5.44 5.11 7.08 3.81 5.38 4.23 4.69 3.99 3.73 4.64 8.41 4.92 3.93 6.40 6.39 3.14 5.71 6.10 4.41 5.32 4.93 4.50

1262-1268 1270-1270 1310-1316 1350-1354 1468-1472 1496-1500 1518-1526 1536-1554 1592-1592 1656-1662 1664-1664 1688-1696 1704-1712 1726-1726 1776-1780 1786-1786 1790-1796 1818-1826 1842-1842 1864-1864 1934-1948 2006-2008 2024-2036 2072-2094 2096-2098 2140-2140 2142-2164 2182-2204 2292-2292 2462-2502 2534-2562

LCQ Input LYVSHIQVNQAPK VLEQLSGQTPVQSK IVQIMQNPTHYK TYAAEIAHNISAK VLNSYWVNQDSTYK AQRPITGASLDLIK VGYTLPSHIISTSDVTR FVQGLLQNAAANAEAK YIQTEQQIEVPEGVTVSIK YFEVILVDPQHK GSSSLYTLVINDAGK VAAVETLYQDMAAR LSSESVIEQIVK ALPDAVTIIEPKEEEPILAPSVK GNVGFVFTNEPLTEIK ASLNVGNVLPLGSVPEGTIVSNVEEKPGDR ASLNVGNVLPLGSVPEGTIVSNVEEKPGDR FADGFLIHSGQPVNDFIDTATR TVAVDSVFEQNEMIDAIAVTK GYLADDIDADSLEDIYTSAHEAIR IPEIPLVVSTDLESIQK DVAAQDFINAYASFLQR VFSLDPQYLVDDLRPEFAGYSK TIAETLAEELINAAK NMIIVPEMIGSVVGIYNGK EFQIIDTLLPGLQDEVMNIKPVQK QAINLGQVVLTPLTFALPR SYIFGGHVSQYMEELADDDEERFSELFK GAIVGPDLAVLALVIVK YGILSIDDLIHEIITVGPHFK HTLDIINVLTDQNPIQVVVDAITNTGPR

yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

TSQ TSQ TSQ TSQ TSQ TSQ LCQ LCQ TSQ TSQ TSQ TSQ TSQ TSQ LCQ LCQ LCQ TSQ TSQ TSQ LCQ LCQ TSQ LCQ TSQ TSQ TSQ TSQ TSQ TSQ TSQ

7.04 7.37 5.93 6.67 6.25 4.39 8.58 7.27 6.32 8.12 5.31 4.87 6.71 6.58 8.47 6.78 5.47 5.32 6.14 6.11 9.34 7.61 6.15 8.15 6.44 6.72 5.71 6.17 6.79 4.81 5.29

a The peptide sequence of the represented by the query tandem mass spectrum is listed under the column heading “peptide sequence”. In the column heading “type of spectrum”, the type of tandem mass spectrum found as the top ranking match is listed. The score for the top-ranking match is listed in “top score”.

undergoes energetic collisions. In the correlation analysis, the precursor ion of TSQ tandem mass spectra will be a major contributor to the score if it matches to a fragment ion in the ion trap tandem mass spectrum. To reduce the number of false positives, the precursor ion is removed from the TSQ tandem mass spectrum prior to the search. 3562 Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

For this analysis, a set of TSQ and LCQ CID spectra was created at the University of Washington. The spectra were obtained under general operating conditions with no care taken to standardize conditions beyond normal tuning for MS/MS analysis of peptides. Twenty-nine spectra from one TSQ MS/ MS experiment and 31 spectra from one LCQ MS/MS experiment

Table 3. Results of Searches Using Tandem Mass Spectra Obtained from [Glu]1-Fibrinopeptide EGVNDNEEGFFSAR at Different Collision Energiesa collision energy used for MS/MS (eV)

top score

-10.0 -12.5 -15.0 -17.5 -20.0

5.35 6.14 6.92 8.36 9.34

a A search was conducted with each tandem mass spectrum obtained at different collision energies. The spectrum used as the query was removed from the reference set for the search. In each case, the topranking match was a tandem mass spectrum of [Glu]1-fibrinopeptide.

Figure 4. (top panel) A tandem mass spectrum obtained on an LCQ ion trap mass spectrometer for the peptide YFEVILVDPQHK. (bottom panel) The best match to a TSQ tandem mass spectrum. A score of 8.124 is obtained, and the peptide represented in the TSQ tandem mass spectrum is the same.

were used as the query data. Shown in Figure 4 are CID spectra for the same peptide, YFEVILVDPQHK from S. cerevisiae, acquired on an ion trap mass spectrometer and a triple-quadrupole mass spectrometer. The search of the ion trap spectrum resulted in the correct match to the corresponding triple-quadrupole spectrum with a score of 8.12. Results are tabulated in Table 2 for the set of 29 TSQ and 31 LCQ spectra searched against a library of spectra from both types of instruments. For all but one input spectrum, there was at least one corresponding spectrum in the reference library acquired on the other instrument and all of these matched spectra of the same peptide as the top-ranked hit. The average score of the TSQ query spectra was 5.31 with a standard deviation of 1.16. The average score of the LCQ query spectra was 6.54 with a standard deviation of 1.18. These numbers show that the algorithm generates scores that are generally lower when spectra acquired on the two different instruments are compared. When duplicate spectra were present from both instruments, spectra acquired on the same instrument scored higher than spectra acquired on different instruments. [Glu]1-fibrinopeptide B was used as a model to study the effect of collision energy on the search algorithm. Five tandem mass spectra of the same peptide, acquired using different collision energies on a TSQ mass spectrometer, were placed into the library. The tandem mass spectrum used as the query spectrum was removed from the reference set to determine how well the spectrum would match to the tandem mass spectra acquired at other energy values. The intent of this study was to examine the

effectiveness of the algorithm at matching spectra created with different collision energies. Shown in Table 3 are the search results of each of the five tandem mass spectra searched through a TSQ spectral library. In all five search results, the top matches are the other tandem mass spectra of [Glu]1-fibrinopeptide B contained in the library. The tandem mass spectra obtained with higher collision energies provided the highest scores to the majority of the set of [Glu]1-fibrinopeptide B library spectra while the spectra with the lowest collision energy had the lowest scores. Mass spectral library searching algorithms for EI spectra are able to match spectra from structurally related molecules. We tested the ability of this approach to match tandem mass spectra of peptides with amino acid sequence similarity. The previous examples performed spectral comparisons by first determining whether the precursor ions of the query and library spectra are within 5 u. Matching two spectra, where a larger molecular weight difference exists but there is significant sequence identity, is generally precluded by this approach. Only a few amino acid changes would be detected with such a narrow mass tolerance. By widening the mass tolerance to (100 u, the algorithm should be able to match more significant differences. Shown in Table 4 is an example of a tandem mass spectrum (representing the sequence FLSSVSTVLTSK) searched through the library using a (100-u mass tolerance. The tandem mass spectrum matches with a good score to one tandem mass spectrum. The top-ranking hit is to the tandem mass spectrum for the peptide FLASVSTVLTSK. A score of 6.370 is calculated for this match and the second highest score is 1.93. Interestingly, the two sequences FLSSVSTVLTSK and FLASVSTVLTSK differ near the N-terminus, which should alter the m/z values of the b-ion series, a fragment ion series that does not normally predominate in peptides generated by trypsin digestion. In Table 5, a second example illustrates a match between two peptides that differ at a single residue near the C-terminus. The search score of the LCQ tandem mass spectrum of the same sequence was 6.83 and the match to the related sequence was 4.6. This method shows potential to match tandem mass spectra of related sequences. We have not yet determined the limits of the ability to match related sequences, but an obvious requirement is overlap in the fragment ions present in the spectra. A potential application of this method of spectral comparison is to compare the tandem mass spectra acquired over the course of an automated LC/MS/MS analysis to identify differences. Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

3563

Table 4. Results Obtained Searching a Reference Set Containing Tandem Mass Spectra Using a (100-u Mass Tolerancea no.

score

precursor

type

organism

sequence

1 2 3 4 5 6 7 8 9 10 11

6.370 1.932 1.831 1.384 1.151 0.833 0.435 0.447 0.365 0.359 0.293

627.6 601.6 616.3 608.5 709.5 578.2 690.6 600.2 723.7 613.1 654.4

TSQ TSQ TSQ TSQ TSQ TSQ TSQ TSQ TSQ TSQ TSQ

human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin human hemoglobin

FLASVSTVLTSK unknown sequence unknown sequence unknown sequence unknown sequence unknown sequence EFTPPVQAAYQK KVLGAFSDGLAHLDNLK unknown sequence TYFPHFDLSHGSAQVK unknown sequence

a The top 12 matches are ranked by score. The peptide sequence represented by the query tandem mass spectrum was FLSSVSTVLTSK. The column headings are the following: No, rank of the match based on score; precursor, the precursor ion m/z value; type, the instrument used for tandem mass spectrometry (TSQ is a triple-quadrupole); organism, the source of the protein; sequence, the assigned sequence from the library.

Table 5. Results Obtained Searching a Reference Set Containing Tandem Mass Spectra Using a (100-u Mass Tolerancea no.

score

precursor

type

organism

1 2 3 4 5 6 7 8 9 10 11 12

6.830 6.687 4.686 4.347 3.101 3.048 2.693 2.454 2.333 2.143 2.137 1.741

617.8 617.6 624.5 624.4 610.9 598.9 618.9 598.7 598.6 639.0 616.8 680.3

LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ LCQ

N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

sequence LFGVTTLDVLR LFGVTTLDVLR LFGVTTLDIIR LFGVTTLDIIR FNSLTPEQQR IPYVGVDKDNLGDFLK SDVMSVDIDKK IPYVGVDKDNLGDFLK IPYVGVDKDNLGDFLK SDLFNVNAGIVK LFAGNATPELAK FIAETMYNELK

a The top 12 matches are ranked by score. The peptide sequence represented by the query tandem mass spectrum was LFGVTTLDVLR. The column headings are the following: No, rank of the match based on score; precursor, the precursor ion m/z value; type, the instrument used for tandem mass spectrometry (LCQ is an ion trap mass spectrometer); organism, the source of the protein; sequence, the assigned sequence from the library.

Tandem mass spectra that are different between the two analyses can be flagged for further analysis. To demonstrate this analysis, peptides obtained from normal and sickle cell human hemoglobin were analyzed by using LC/MS/MS. LC/MS/MS of both samples were acquired on a LCQ ion trap mass spectrometer. The normal human hemoglobin spectra were used as the reference set. Figure 5A shows a plot of the tandem mass spectra (the MS ion current has been removed) acquired for normal hemoglobin. Figure 5B shows the corresponding trace of the tandem mass spectra of sickle cell hemoglobin. The two traces do not share much in appearance, and it would be difficult based on these traces to identify differences. Figure 5C shows a subtractive analysis of the two LC/MS/MS analyses. Common tandem mass spectra, based on search scores of 4.5 or greater, between the two analyses have been removed, leaving only the tandem mass spectra unique to the sickle cell hemoglobin analysis. The peaks marked with triangles are tandem mass spectra of peptides containing the amino acid change of E f V, the known amino acid substitution. The following peptides were identified in a subsequent database search: VHLPVE, VHLPVEK, VHLPVEKS, VHLPVEKSAVTA, and VHLPVEKSAVTAL. The large number of tandem mass spectra over this region of the protein’s sequence result from the multiple proteases used to cleave hemoglobin, a procedure used to achieve >95% sequence 3564 Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

coverage in the analysis. Peaks marked with circles are tandem mass spectra of hemoglobin peptides identified using the SEQUEST database searching software. These tandem mass spectra have no counterparts from the digestion of normal hemoglobin, although the amino acid sequences were identical. Peaks marked with an X are unidentified. The subtraction procedure was successful at identifying those tandem mass spectra unique to the mutant protein. The principle advantage to using tandem mass spectra to compare two different analyses is the much greater level of specificity inherent in the fragment pattern. A comparison or subtraction of two very complicated mixtures of peptides at the tandem mass spectra level will provide a higher level of accuracy then using m/z values and retention times. We expect this approach will have many different applications. CONCLUSIONS This research illustrates a method for comparing CID spectra to a reference set of spectra such as a library. Such an approach could have practical application for the construction of contaminant libraries, libraries of peptide spectra obtained during quality control analysis of recombinant proteins, or sets of tandem mass spectra obtained for control experiments in biological experiments. By searching these sets of tandem mass spectra, differences could be quickly targeted with a high degree of specificity. Over time,

Figure 5. (A) An LC/MS/MS analysis of normal human hemoglobin. The ion current for the tandem mass spectra is plotted. (B) An LC/MS/MS analysis of normal human hemoglobin. The ion current for the tandem mass spectra is plotted. (C) Total ion current of tandem mass spectra plotted after removal of tandem mass spectra common to the mutant and normal proteins. The tandem mass spectra marked with triangles were determined by a SEQUEST database search to contain the amino acid variation. The tandem mass spectra marked with circles are hemoglobin peptides, but do not appear in the analysis of the normal hemoglobin. Those tandem mass spectra marked with an X did not match a peptide sequence in a database search.

a laboratory could accumulate a large number of spectra from a particular species used for biochemical studies to aid in additional studies. This library could be screened prior to a database search or prior to de novo interpretation of the spectra. Furthermore, the use of sophisticated computer control methods to acquire CID spectra during the course of a LC analysis allows the acquisition of relatively complete sets of spectra for proteins. This offers the possibility of constructing subtractive analysis methods to comprehensively compare changes in a protein’s structure.

ACKNOWLEDGMENT Support was derived from the National Science Foundation, Science and Technology Center Cooperative agreement DIR 8809710, U.S. Army, and ASMS research award.

Received for review February 3, 1998. Accepted June 25, 1998. AC980122Y Analytical Chemistry, Vol. 70, No. 17, September 1, 1998

3565