Impact of Ion Trap Tandem Mass Spectra Variability on the

Peptide identification based on tandem mass spectrometry and database searching algorithms has become one of the central technologies in proteomics...
0 downloads 0 Views 487KB Size
Anal. Chem. 2004, 76, 2928-2937

Impact of Ion Trap Tandem Mass Spectra Variability on the Identification of Peptides John D. Venable and John R. Yates, III*

Department of Cell Biology, The Scripps Research Institute, La Jolla, California 92014

Peptide identification based on tandem mass spectrometry and database searching algorithms has become one of the central technologies in proteomics. At the heart of this technology is the ability to reproducibly acquire highquality tandem mass spectra for database interrogation. The variability in tandem mass spectra generation is often assumed to be minimal, and peptide identifications are typically based on a single tandem mass spectrum. In this paper, we characterize the variance of scores derived from replicate tandem mass spectra using several database search algorithms and demonstrate the effects of spectral variability on the correct identification of peptides. We show that the variance associated with the collection of tandem mass spectra can be substantial leading to sizable errors in search algorithm scores (∼5-25% RSD) and ultimately incorrect assignments. Processing strategies are discussed to minimize the impact of tandem mass spectra variability on peptide identification. Recent advances in genomics and analytical techniques have provided the infrastructure and technologies necessary to pursue large-scale analyses of proteins in complex mixtures. Because of the inherent need for speed and sensitivity in these analyses, mass spectrometry-based strategies have come to play a crucial role in proteomics. One strategy, termed “shotgun” proteomics, is based on the presumption that proteins need not be separated before analysis. In this strategy, large numbers of peptides generated from proteolytically digested protein mixtures are separated using either single (LC) or multidimensional liquid chromatography (LC/LC). Subsequently peptides are identified by tandem mass spectrometry (MS/MS) coupled with database searching. This approach has led to numerous publications and has become one of the most commonly used technologies for proteomics.1-4 The success of “shotgun” proteomic strategies is ultimately determined by the ability to reproducibly obtain high-quality tandem mass spectra of peptides under conditions suitable for * To whom correspondence should be addressed. E-mail: jyates@ scripps.edu. (1) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-682. (2) Washburn, M. P.; Wolters, D.; Yates, J. R. Nat. Biotechnol. 2001, 19, 242247. (3) Wolters, D. A.; Washburn, M. P.; Yates, J. R., III. Anal. Chem. 2001, 73, 5683-5690. (4) Florens, L.; Washburn, M. P.; Raine, J. D.; Anthony, R. M.; Grainger, M.; Haynes, J. D.; Moch, J. K.; Muster, N.; Sacci, J. B.; Tabb, D. L.; Witney, A. A.; Wolters, D.; Wu, Y.; Gardner, M. J.; Holder, A. A.; Sinden, R. E.; Yates, J. R.; Carucci, D. J. Nature 2002, 419, 520-526.

2928 Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

the analysis of complex protein mixtures. This can be a difficult proposition due to a variety of factors, including limited sample size, high complexity, and the large dynamic range inherent to most biological samples. However, significant progress has been made with the use of preconcentration and the high resolving power of multidimensional separations. One aspect of this process that has not been adequately addressed is the variability associated with the collection of tandem mass spectra. There are numerous sources of error associated with the collection and processing of tandem mass spectra including, but not limited to, fluctuations in the ESI source, errors attributed to the number of ion counting events (i.e., counting statistics), the inherently random nature of the fragmentation process, and errors attributed to centroiding. All of these sources contribute to the total error associated with the collection of tandem mass spectra and could potentially lead to error in resulting database search algorithm scores. If unaccounted for, variability in tandem mass spectra generation could lead to false positives and ultimately reduce the number of correctly identified peptides. This research characterizes the range of scores derived from replicate tandem mass spectra for two different peptide standards. We then explore the impact of this variance on the analysis of a complex mixture of known proteins using MudPIT. In addition, two techniques for minimizing the role of tandem mass spectra variance on the identification of peptides are discussed. First, peptide identification from an average tandem mass spectrum (i.e., obtained from signal averaging multiple spectra) is evaluated. SEQUEST cross-correlation scores derived from this strategy are then compared to those obtained using the average of independently scored replicate tandem mass spectra. EXPERIMENTAL SECTION Peptide and Protein Standard Preparations. [Glu1]-Fibrinopeptide B (Homo sapiens), and angiotension I (H. sapiens) were obtained from Sigma Chemical Co. (St. Louis, MO). Peptide stock solutions were prepared by dilution of standards with 5% formic acid (J. T. Baker) to a final concentration of 10 pmol/µL. A mixture containing five proteins (∼1 nmol/mL) including phosphorylase a (rabbit skeletal muscle), cytochrome c (horse), apomyoglobin (horse heart), albumin (bovine serum), and β-casein (bovine) was used for MudPIT experiments. Before digestion, the protein mixture was denatured with solid urea (final concentration of 8 M in 100 mM tris, pH 8.5), reduced with TCEP (Sigma, 3 mM final concentration for 20 min at room temperature), and alkylated 10.1021/ac0348219 CCC: $27.50

© 2004 American Chemical Society Published on Web 04/20/2004

with iodoacetamide (Sigma, 10 mM final concentration for 30 min at room temperature). Lys-C, Trypsin Digestion. Digestion was performed using a previously described protocol.5 Briefly, the denatured, reduced, and alkylated proteins were digested overnight at 37 °C with 5 µg of endonuclease Lys-C (Roche). After dilution to 2 M urea with 100 mM tris, pH 8.5, 1 µg of modified trypsin (Promega) was added and the mixture was incubated overnight at 37 °C. The resulting peptide mixture was acidified with formic acid (5%). Infusion of Peptide Standards. Two different peptide standards, at a variety of concentrations, were electrosprayed into an LCQ ion trap mass spectrometer (ThermoFinnigan) using a 2.5kV spray potential. The standards were introduced at a constant flow rate (1 µL/min) into the electrospray source using an infusion pump (Harvard Apparatus). Once a stable mass spectral signal was realized (∼1 min), 1000 replicate tandem mass spectra were collected using a 35% normalized collision energy and a 3 m/z isolation window. Multidimensional Liquid Chromatography Tandem Mass Spectrometry of Peptide Mixtures (MudPIT). This approach has been described in detail by several authors,1,5-7 so it will only be briefly detailed here. A three-phase microcapillary column was constructed by slurry packing ∼7 cm of reversed-phase (RP) material (Aqua, Phenomenex) into a 100-µm fused-silica capillary, which had been previously pulled to a tip diameter of ∼ 5 µm using a Sutter Instruments laser puller (Sutter Manufacturing, Novato, CA). Next, 3 cm of strong cation-exchange resin (Partisphere, Whatman), followed by another 3 cm of RP material, was packed into the column. The column was then equilibrated with 5% acetonitrile/0.1% formic acid for ∼30 min before the peptide mixture was loaded using a high-pressure cell. For analysis, the microcolumn was positioned in-line with a Surveyor quaternary HPLC pump (ThermoFinnigan) directly in front of the heated capillary opening of an LCQ-Deca ion trap mass spectrometer (ThermoFinnigan). A six-step separation procedure was used.2,5-6 The first step consisted of a 100-min linear gradient from 100% buffer A (5% acetonitrile/0.1% formic acid) to 80% buffer B (80% acetonitrile/0.1% formic acid). The next four steps consisted of salt pulses of 10, 25, 50, and 80% buffer C (250 mM ammonium acetate/5% acetonotrile/0.1% formic acid), followed by reversed-phase gradients from 100% A to 45% B. The last step entailed a salt pulse of 100% buffer C, followed by a 120-min gradient from 100% A to 70% B. As peptides were eluted and ionized into the mass spectrometer, data-dependent acquisition of tandem mass spectra was repeated continuously during the course of the analysis. All tandem mass spectra were collected using a normalized collision energy of 35% and an isolation window of 3 m/z. The automatic gain control feature was employed with a target value of 5e7 for both full-mass scans and MS/MS scans. Tandem Mass Spectral Analysis. Tandem mass spectra were analyzed using several programs. First, ExtractMS was used to (5) McDonald, W. H.; Ohi, R.; Miyamoto, D. T.; Mitchison, T. J.; Yates, J. R. Int. J. Mass Spectrom. 2002, 219, 245-251. (6) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Saygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900-7905. (7) MacCoss, M. J.; Wu, C. C.; Yates, J. R., III. Anal. Chem. 2002, 74, 55935599.

assign charge states to +1 spectra, as well as remove poor-quality spectra. Next, the algorithm 2to3 8 was used to assign charge states to +2 and +3 spectra as well as eliminate remaining low-quality spectra. Several different database searching algorithms including SEQUEST, PEP_PROBE, and MASCOT were used to interpret tandem mass spectra.9-11 SEQUEST and PEP_PROBE were paralyzed on a Beowulf cluster of ∼35 computers,8 and results were filtered, sorted, and displayed using the DTASelect program.12 Searches were performed against the combined human, mouse, and rat databases (HMR) from RefSeq. When MASCOT was employed, searches were submitted to www.matrixscience.com, and the SwissProt database was selected. In addition, enzyme specificity was set to none for angiotension I and incomplete trypsin digestion for Glu-fibrinopeptide b. Processing Strategies. Several different spectral processing strategies were evaluated based on their ability to minimize the effects of tandem mass spectra variability. For all three processing strategies, data-dependent acquisition employed a cycle consisting of one full-scan mass spectrum (400-1400 m/z) followed by several (depending on the processing strategy) tandem mass spectra. The cycle was repeated throughout the analysis. Unprocessed. For each full scan of the m/z range of 400-1400, three different tandem mass spectra were collected (one “microscan” each) and centroided by the onboard computer of the LCQ ion trap mass spectrometer. No spectral averaging was performed. Presearch Spectral Averaging. For each full scan of the m/z range, three different tandem mass spectra were collected. Each spectrum was averaged by combining three or five microscans by the onboard computer during data acquisition. The resulting averaged spectra were centroided before transfer to the data processing computer. Postsearch Averaging of Replicate Cross-Correlation Scores (XCorrs). For each of three precursor ions selected in the fullmass scan, three replicate tandem mass spectra (one microscan each) were collected. All spectra were centroided by the onboard computer of the LCQ ion trap mass spectrometer and stored as separate spectra. SEQUEST was modified to output every candidate peptide’s cross-correlation score (i.e., 500 XCorr’s per spectra) to a text file. After all spectra were searched using SEQUEST, the resulting cross-correlation scores for each candidate peptide were averaged over the replicates using a Perl program. The average XCorr was then used for subsequent filtering. RESULTS AND DISCUSSION Distribution of Search Scores for Replicate Tandem Mass Spectra. To characterize the variability associated with the acquisition of tandem mass spectra and to identify the role that this variance has in the identification of peptides, 1000 replicate tandem mass spectra were collected for two different peptide standards (angiotension I [+3] and Glu-fibrinopeptide B [+2]). These standards were chosen for a variety of reasons, but (8) Sadygov, R. G.; Eng, J.; Durr, E.; Saraf, A.; McDonald, H.; MacCoss, M. J.; Yates, J. R., III. J. Proteome Res. 2002, 1, 211-215. (9) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J, Am, Soc, Mass Spectrom, 1994, 5, 976-89. (10) Sadygov, R. G.; Yates, J. R., I. Anal. Chem. 2003, 75, 3792-3798. (11) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567. (12) Tabb, D. L.; McDonald, W. H.; Yates, J. R., III. J. Proteome Res. 2002, 1, 21-26.

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2929

Figure 1. Example tandem mass spectra obtained under the direct infusion of (a, b) 0.1 and 0.0002 pmol/µL angiotension I and (c) 0.1 pmol/µL Glu-fibrinopeptide B. No spectral averaging was employed.

primarily because they produce high-quality fragmentation patterns and are commonly used as peptide standards for tandem mass spectrometry. Each standard was infused at a constant flow rate and at several different concentrations. Spectra were then searched and the distributions of search scores for each standard and concentration were evaluated. Figure 1 shows example tandem mass spectra of both angiotension I and Glu-fibrinopeptide obtained under direct infusion conditions. SEQUEST was employed as the primary search algorithm and was used for all data sets. In addition, the 100 fmol/µL data were searched using PEP_PROBE and MASCOT to determine whether the data obtained by SEQUEST were representative of other search programs. The collection of search scores from the 100 fmol/µL concentration data set is shown in Figure 2. From this graph, it is clear substantial variance in the scores obtained for replicate tandem 2930

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

mass spectra exists. In addition, the variance associated with replicate measurements is dependent on the search algorithm used. In these data, the relative standard deviations range from 10.5 and 7.3% RSD using SEQUEST to 16.5 and 20.1% RSD using PEP_PROBE to 12.9 and 17.8% RSD using MASCOT for angiotension I and Glu-fibrinopeptide B, respectively. The difference in precision between the search algorithms is not unexpected because each search algorithm processes spectra differently, and it is entirely possible that some search algorithms could be more tolerant of the variability in tandem mass spectra than others. It should be noted that these data were collected under direct infusion conditions in order to minimize errors due to changing concentration and that the number of scores plotted varies with the search algorithm used because only scores for correct interpretations were plotted. For angiotension I, the number of spectra producing correct interpretations as a function of the

Figure 2. Distribution of database search algorithm scores for 1000 replicate tandem mass spectra of angiotension I (black lines) and Glufibrinopeptide B (gray lines). The number of scores plotted differs for each search algorithm and standard used because only correct interpretations were plotted. The number of measurements varied from 730 and 998 using SEQUEST to 968 and 929 using PEP_PROBE to 86 and 995 using MASCOT for angiotension I and Glu-fibrinopeptide b, respectively.

number of spectra collected was 73% using SEQUEST, 96.8% using PEP_PROBE, and 8.6% using MASCOT. The percentage of correct identifications was 99.8 using SEQUEST, 92.9 using PEP_PROBE, and 95.0 using MASCOT for Glu-fibrinopeptide b. Interestingly, the number of correct interpretations for angiotension I varied substantially depending on the search algorithm employed. This was probably due to the lack of enzyme specificity for this particular peptide or the general appearance of this particular +3 spectrum. However, the purpose of the comparison was not to focus on the differences between search algorithms, but rather the similarities. It is clear from this comparison that all three search algorithms were substantially affected by variability within replicate tandem mass spectra.

Similar data were collected over a range of concentrations (Figure 3) and searched using only SEQUEST. The results are summarized in Table 1, which shows the averages, standard deviations, and relative standard deviations of XCorr measurements for all of the concentrations analyzed. The average XCorr values for the collection of tandem mass spectra dropped by ∼10% over the 1000-fold concentration range from 1000 to 1 fmol/µL. Below a concentration of 1 fmol/µL, the average XCorr values were substantially lower, and the standard deviations were higher than at higher concentrations. This is most likely attributed to the inability to fill the trap with an adequate number of ions within the time designated by the maximum injection time (300 ms). The dependence of the average XCorr on concentration suggests Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2931

Figure 3. XCorr distributions of 1000 replicate tandem mass spectra of angiotension I at three different concentrations (1000, 1, and 0.2 fmol/µL). Table 1. Averages, Standard Deviations, and Relative Standard Deviations for the XCorrs Obtained from 1000 Replicate MS/MS Spectra of Angiotension I and Glu-fibrinopeptide B, Searched against the HMR Database unprocessed average ( σ % RSD

spectral averaging (5 µs) average ( σ % RSD

1000 100 1 0.2

Angiotension I 2.53 ( 0.25 9.8 2.48 ( 0.26 10.5 2.34 ( 0.27 11.5 1.38 ( 0.35 25.0

2.84 ( 0.20 2.77 ( 0.22 2.71 ( 0.23 1.95 ( 0.28

7.0 7.9 8.5 14.4

1000 100 0.2

Glu-fibrinopeptide B 3.41 ( 0.21 6.2 3.60 ( 0.16 3.30 ( 0.24 7.3 3.54 ( 0.18 1.63 ( 0.43 26.3 1.89 ( 0.42

4.4 5.1 22.0

concn (fmol/µL)

that ultimately the precision of XCorr measurements will be limited by the ability to reproducibly inject a sufficient number of ions into the trap. This variation in search algorithm scores for replicate tandem mass spectra is likely due to small differences between spectra that lead to substantial differences in search scores.13,14 For example, Figure 4a and b shows three-dimensional views of the 10 highest scoring and 10 lowest scoring (i.e., by XCorr) spectra in the 0.1 pmol/µL angiotension I data set. In general, the differences between spectra we observed were subtle and primarily consisted of differences in the presence or absence of lowintensity fragment ions, as well as fluctuations in fragment ion intensities and measured m/z ratios (i.e., deviations in centroided peak positions). The fact that the distributions in scores from three different search algorithms (of which PEP_PROBE’s score is independent of intensity) were substantial suggests that changes other than intensities between spectra must be significant. One of the most striking differences between replicate spectra we observed was the variability in the presence or absence of (13) Tabb, D. L.; MacCoss, M. J.; Wu, C. C.; Anderson, S. D.; Yates, J. R., I. Anal. Chem. 2003, 75, 2470-2477. (14) Jarman, K. H.; Daly, D. S.; Peterson, C. E.; Saenz, A. J.; Valentine, N. B.; Wahl, K. L. Rapid Commun. Mass Spectrom. 1999, 13, 1586-1594.

2932 Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

low-abundant ions, which could possibly be attributed to errors from counting statistics or even the centroiding process. To visualize these changes, frequency distributions were plotted as a function of m/z for each predicted fragment ion for the 100 highest scoring and 100 lowest scoring (i.e., by XCorr) spectra in the 0.1 pmol/µL angiotension I data set (Figure 4c and d). From these plots, it is clear that low-intensity fragment ions are present in replicate spectra less reproducibly than high-intensity fragments. This variability could lead to significant differences in search algorithm scores because one of the most important parameters of almost all scoring strategies is the number of matching fragment ions. In addition, intensity fluctuations were also observed that could lead to differences in search algorithm scores. To characterize the differences in fragment ion intensities between replicate spectra, we generated histograms for several different B and Y fragment ions. For each of the fragment ions studied, the distribution of intensities appeared to be Gaussian, as can be seen in Figure 5, which shows a histogram of the intensities observed for the B5 fragment ion. To develop a better understanding of the impact of tandem mass spectral variability on the identification of peptides, the collection of 1000 XCorr measurements for the top 50 candidate peptides (assessed by average XCorr) of both angiotension I (1 pmol/µL) and Glu-fibrinopeptide B (1 pmol/µL) were plotted (Figure 6a and b). For both peptide standards, the distributions for the correct peptide interpretations were shifted to the right relative to the distributions of the incorrect peptide interpretations. There was overlap between the correct and incorrect distributions that became more severe as the concentration of the peptide standard was lowered and led to false positives and incorrect identifications. Table 2 documents the number of incorrectly interpreted spectra out of 1000 as well as the average ∆CN value for all correctly identified spectra. The ∆CN value represents the difference between the first- and second-ranked candidate’s XCorr.9 Therefore, a low average ∆CN results in a greater extent of overlap between the first- and second-ranked XCorr distributions than a high ∆CN. Most likely, the amount of overlap between

Figure 4. Three-dimensional plots of the (a) 10 highest scoring and (b) 10 lowest scoring (sorted by XCorr) spectra in the 0.1 pmol/µL angiotension I data set. Also shown are frequency distributions as a function of m/z for all predicted +1 and +2 B and Y fragment ions for the (c) 100 highest scoring and (d) 100 lowest scoring (by XCorr) spectra in the 0.1 pmol/µL angiotension I data set.

Figure 5. Histogram of observed intensities of the B5 fragment ion in the 0.1 pmol/µL angiotension I data set (gray). The predicted Gaussian distribution (black) having the same average and standard deviation is shown for reference.

the correct and incorrect interpretations will vary depending on the peptide. In summary, it is likely that the observed distributions in search algorithm scores obtained from replicate tandem mass spectra are caused by the convolution of several sources of error. While the specific causes of the variability in these replicate tandem mass spectra have not been formally discerned, it is evident that it leads to substantial inconsistency in XCorr measurements ultimately leading to incorrect identifications. The remainder of this paper will discuss strategies to account for and

minimize the implications of tandem mass spectral variability on the identification of peptides. Strategies To Account for Tandem Mass Spectral Variance. Presearch Processing. The SEQUEST algorithm contains several spectral processing steps including binning and normalization routines that are used to reduce complexity, remove extraneous noise, and modify the format for the cross-correlation analysis.9 However, before spectra are submitted to SEQUEST, several spectral processing steps are usually performed, including, spectral averaging to increase spectral quality (i.e., signal-to-noise enhancements) and centroiding to reduce the file size. Spectral averaging is a common presearch processing strategy and has been outlined in the Experimental Section of this article. To evaluate the effects of spectral averaging on tandem mass spectral variability, the same experiments described previously (i.e., the direct infusion of peptide standards) were repeated using spectral averaging (i.e., acquiring five microscans), and the resulting processed data were searched against the HMR database. The averages, standard deviations, and relative standard deviations from the resulting collections of XCorr measurements can be seen in the third and fourth columns of Table 1. By comparing the spectral averaged data set (Figure 7a and b) to the unprocessed data set (Figure 6a and b), it is apparent that spectral averaging increases the average value for the collection Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2933

Figure 6. XCorr distributions for the top 50 candidate peptide interpretations (assessed by average XCorr) of 1000 replicate tandem mass spectra of both angiotension I (a) and Glu-fibrinopeptide B (b). Data were collected using one microscan and searched against the HMR database. After the application of a 10-point moving average smooth, the XCorr distributions shown in (a) and (b) become (c) and (d), respectively. Table 2. Number of Incorrect Interpretations (False Positives) and Average ∆CN (Correct Interpretations Only) Obtained from 1000 Replicate Tandem Mass Spectra before and after the Application of a Moving Average Smooth (3, 5, 10 Points)a no. of false positives concn (fmol/µL)

no smooth

3 point

5 point

10 point

1000 100 1 0.2

Angiotension I 239 2 270 4 365 27 895 444

0 1 6 237

0 0 0 71

1000 100 0.2

Glu-fibrinopeptide b 0 0 2 0 755 449

0 0 279

0 0 97

average ∆CN concn (fmol/µL)

no smooth

3 point

5 point

10 point

1000 100 1 0.2

Angiotension I 0.11 0.16 0.09 0.16 0.10 0.15 0.12 0.25

0.17 0.18 0.18 0.28

0.19 0.20 0.21 0.30

1000 100 0.2

Glu-fibrinopeptide b 0.27 0.28 0.28 0.27 0.32 0.44

0.28 0.28 0.51

0.29 0.28 0.60

a

All spectra were collected using one microscan and searched against the HMR database. No DTASelect filters were applied.

of XCorr measurements, as well as decreases the width of the XCorr distributions. Overlaps between the distributions of the correct and incorrect candidate peptides, however, are still evident for angiotension I. Interestingly, the correct distribution appears 2934

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

to have shifted to a higher average XCorr value, but the distributions for incorrect interpretations have also shifted to the right. Thus, the resulting average ∆CN values for the spectral averaged data set are similar to those obtained for the unprocessed data set even though the average XCorr value increased. The number of false positives calculated from the spectral averaged data set is shown in Table 3. Spectral averaging reduced the number of false positives by almost half for most of the concentrations with the exception of the lowest concentration when it decreased from 895 to 650 out of 1000. Postsearch Processing. It is apparent from Figure 6a and b that the means of the correct distributions are shifted from the means of the incorrect distributions. Accordingly, averaging replicate XCorr measurements should minimize the probability of obtaining false positives, because the average of several XCorr values should approximate the mean of the distribution. To test this presumption, we employed a Perl program to apply a moving average smooth (3, 5, and 10 points) to the XCorr distribution data sets previously mentioned (i.e., unprocessed, and spectral averaged for angiotension I and Glu-fibrinopeptide b). A moving average smoothes data by replacing each data point with the average of the neighboring data points defined within the span as is shown in eq 1, where n is the span of points to average. The application of

y(i + n - 1) )

∑ y(i) + y(i + 1)...y(i + n - 1) n

(1)

the moving average smooth to the existing XCorr distribution data sets models the effects of averaging different numbers of replicate spectra and allows a direct comparison between the unsmoothed and smoothed data sets. It should be noted that the number of data points is slightly smaller for the smoothed data sets (998 for

Figure 7. XCorr distributions for the top 50 candidate peptide interpretations (assessed by average XCorr) of 1000 replicate tandem mass spectra of both angiotension I (a) and Glu-fibrinopeptide B (b). Data were collected using the average of five microscans and searched against the HMR database. After the application of a 10-point moving average smooth, the XCorr distributions shown in (a) and (b) become (c) and (d), respectively.

3 points, 996 for 5 points, 991 for 10 points) than for the unsmoothed data sets (1000), but the resulting distributions are not significantly affected by this small difference in the number of data points. Figure 6c and d and Figure 7c and d show the unprocessed and signal-averaged data sets, respectively, after the application of a 10-point moving average smooth. The widths of the resulting XCorr distributions were narrower than the unsmoothed data, and the correct distribution was resolved completely from the incorrect distributions (i.e., no overlap). Also, many of the incorrect candidate peptide distributions shifted to the left, resulting in increased separation between the correct and incorrect distributions. With less overlap between the correct and incorrect distributions, the likelihood of obtaining false positives is significantly lower. The number of false positives and average ∆CN values were calculated from the smoothed data sets (Tables 2 and 3). As expected, the ∆CN values increased substantially, signifying greater separation between the first- and second-ranked interpretation’s XCorrs, and the number of false positives was reduced. The increased ∆CN values are due to SEQUEST’s preliminary scoring algorithm (Sp), where only the top 500 Sp scoring peptide interpretations are used in the cross-correlation analysis.9 It appears that correct interpretations are much more likely to reproducibly score high (within top 500) in the Sp process than incorrect interpretations. However, the average XCorrs for incorrect interpretations are lower because of instances where their Sp scores were not in the top 500 and, therefore, were not considered in the cross-correlation analysis. Based on these analyses, we conclude the discriminatory capability of the tech-

nique can be improved in two different ways by using the postsearch processing strategy. First, distributions for similarly scoring interpretations were separated by the increased precision of the measurements (i.e., decreasing width of the distributions). Second, the distributions for very differently scoring interpretations were separated by the Sp skewing process. MudPIT Analysis of a Five-Protein Mixture (Comparison of Processing Strategies). A standard protein mixture containing phosphorylase a (rabbit skeletal muscle), cytochrome c (horse), apomyoglobin (horse heart), albumin (bovine serum), and β-casein (bovine) was analyzed by MudPIT using each of the aforementioned strategies (no processing, spectral averaging, and XCorr averaging). Because the different processing strategies require the data to be collected in different ways, it was necessary to perform two different MudPIT analyses of the five-protein mixture. The same collection conditions (i.e., three replicate, one microscan, tandem mass for each precursor mass) were used to acquire the unprocessed and postsearch XCorr averaging data set. A separate MudPIT experiment was performed to collect data for the spectral averaging data set. All data were collected using the parameters outlined in the Experimental Section, and the results have been summarized in Tables 4 and 5. Interestingly, the percent sequence coverage for each of the five proteins in the standard mixture was similar, regardless of the processing strategy used (Table 4). The number of peptides identified for each protein was also similar; however, the unprocessed data resulted in the largest number of peptides identified. These results suggest that none of the strategies significantly hindered the identification of peptides. Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2935

Table 3. Number of Incorrect Interpretations (False Positives) and Average ∆CN (Correct Interpretations Only) Obtained from 1000 Replicate Tandem Mass Spectra before and after the Application of a Moving Average Smooth (3, 5, 10 Points)a no. of false positives concn (fmol/µL)

no smooth

3 point

5 point

10 point

1000 100 1 0.2

Angiotension I 106 0 148 0 188 1 650 165

0 0 0 47

0 0 0 4

1000 100 0.2

Glu-fibrinopeptide b 0 0 0 0 521 200

0 0 77

0 0 15

average ∆CN concn (fmol/µL)

no smooth

3 point

5 point

10 point

1000 100 1 0.2

Angiotension I 0.11 0.15 0.09 0.14 0.10 0.15 0.09 0.13

0.16 0.16 0.17 0.16

0.17 0.17 0.19 0.20

1000 100 0.2

Glu-fibrinopeptide b 0.26 0.27 0.27 0.27 0.31 0.49

0.27 0.28 0.55

0.27 0.27 0.65

a All spectra were collected using the average of five microscans and searched against the HMR database. No DTASelect filters were applied.

Table 4. Percent Sequence Coverage and Number of Peptides for the Analysis of a Five-Protein Mixturea % sequence coverage

phosphorylase a cytochrome c apomyoglobin albumin β-casein

unprocessed

presearch average

postsearch average

82.6 64.4 69.9 66.6 38.8

68.6 58.7 54.9 65.6 40.2

77.8 63.5 60.8 63.3 35.3

no. of peptides

phosphorylase a cytochrome c apomyoglobin albumin β-casein

unprocessed

presearch average

postsearch average

143 16 21 64 13

106 12 15 78 12

112 14 19 60 9

a Default DTASelect filters applied (+1 1.8, +2 2.5, +3 3.5, ∆CN 0.08).

Because the composition of the mixture was known, it was possible to unambiguously assign correctly and incorrectly identified peptides (i.e., false positives) (Table 5). For the unprocessed data, the number of different peptides assigned correctly was higher than for the other strategies; however, the number of peptide identifications that turned out to be false positives was 2936 Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

Table 5. Summary of a Six-Step MudPIT Analysis of a Five-Protein Mixture, Containing Equimolar Amounts (∼1 nmol/mL) of Phosphorylase a (Rabbit Skeletal Muscle), Cytochrome c (Horse), Apomyoglobin (Horse Heart), Albumin (Bovine Serum), and β-Casein (Bovine)a presearch postsearch unprocessed processing processing no. of spectra preprocessing no. of cross-correlation scores total peptide IDs correct peptide IDs av ∆CN incorrect peptide IDs % incorrect IDs correct protein IDs incorrect protein IDs (p1) incorrect protein IDs (p2)

69,711 69,711 1226b 762 0.28c 464 38 6d 425 26

55,410 18,470 961b 599 0.29c 362 37 6d 351 5

69,711 23,237 817b 602 0.51c 215 26 6d 210 3

a Default DTASelect filters applied (+1 1.8, +2 2.5, +3 3.5, ∆CN 0.08). b Includes peptides from redundant protein IDs. c Correct interpretations only. d Trypsin was also identified.

also higher. In fact, the percentage of incorrect peptide identifications was ∼38% of the total, which led to the incorrect identification of either 425 (1 peptide required) or 26 (2 peptides required) proteins. The average ∆CN value for the peptides that were identified correctly was 0.28. Using spectral averaging as a processing strategy, the total number of peptides identified was lower, but the number of incorrectly identified peptides was also lower. The percentage of incorrect peptide identifications dropped slightly to 37%, which resulted in the incorrect identification of 351 and 5 proteins, with 1 and 2 peptides required, respectively. The average ∆CN value was 0.29, which is similar to what was obtained in the unprocessed data set. Finally, averaging resulting XCorr measurements independently resulted in the lowest total number of identified peptides as well as the lowest number of incorrectly identified peptides and the percentage of incorrectly identified peptides dropped to 25%. Using this method, 210 and 3 proteins (1 and 2 peptides required, respectively) were incorrectly assigned. The average ∆CN value (0.51) was higher than that obtained in the other data sets, which indicates a higher degree of separation between the first- and second-ranked peptide interpretation’s XCorr. To better visualize this difference in separative power between processing strategies, receiver operating characteristic (ROC) curves were generated and are shown in Figure 8.15 The two ROC curves show the percentage of true positives as a function of the percentage of false positives at a variety of different XCorr values for spectra searched with a charge equal to +2. Comparison of the normalized areas under the curves (ROC scores) shows an increase from ∼0.93 to ∼0.97 for the ROC curves generated from spectral averaging and postsearch averaging of replicate XCorr measurements, respectively. Similar data were obtained for spectra searched with a charge equal to +1 and +3 (not shown). It is clear from these curves and the increase in average ∆CN that postsearch averaging of replicate XCorr measurements results in (15) Anderson, D. C.; Li, W.; Payan, D. G.; Noble, W. S. J. Proteome Res. 2003, 2, 137-146.

Figure 8. ROC curves for the analysis of a five-protein mix using two different processing strategies: (black) postsearch averaging of replicate XCorr measurements and (gray) spectral averaging.

greater discrimination between correct and incorrect peptide interpretations than spectral averaging. It should be noted that, regardless of the processing strategy used, one of the most effective ways of limiting the number of incorrectly identified peptides was to require at least two peptides for each protein match. The impact of this step is too large to ignore and should definitely be considered when attempting to limit the number of false positive identifications. As for the impact of the different processing strategies on the number of false positives, it appears that averaging replicate XCorr measurements had the largest effect, without appreciably reducing the number of correctly identified peptides. Spectral averaging also showed benefits but still resulted in a much higher percentage of incorrect identification than averaging replicate XCorr measurements. To further reduce the number of incorrect identifications, the number of replicate spectra could simply be increased, as can be seen in Table 3. CONCLUSIONS In this study, we have shown that scores derived from several database searching algorithms of replicate tandem mass spectra of peptide standards vary substantially (i.e., RSD ∼4-25%). The variance associated with these scores can potentially lead to false positives and ultimately reduce the number of correctly identified

peptides. Therefore, to more accurately assign peptide identifications, it is important to consider the error associated with search algorithm scores as well as derive methods for minimizing the impact of this error on peptide assignments. Two different strategies (presearch spectral averaging and postsearch averaging) were studied and compared using both peptide standards and a MudPIT analysis of a five-protein digest. Overall, postsearch averaging of replicate XCorr measurements proved superior to spectral averaging and resulted in fewer incorrect peptide assignments. Postsearch averaging of replicate XCorr measurements provides several advantages including greater reproducibility and greater separation between the incorrect and correct interpretation’s XCorr (∆CN). Most importantly, the overlap between the XCorr distributions for correct and incorrect peptide sequence interpretations is reduced, which results in increased accuracy of peptide identifications without a significant loss in the number of correct identifications. One potential disadvantage is the increased time required to acquire replicate spectra and the subsequent impact on the number of different precursor m/z peaks sampled into tandem mass spectra analysis. Using lower flow rates and techniques such as “peak parking” could provide more time to perform replicate measurements. However, even in this case, the time required by the search process would be greater, due to the increased number of spectra. This methodology could be used with search algorithms other than SEQUEST to increase the accuracy of peptide assignments; however, the extent of the impact was not formally studied in this paper. When implemented in a large-scale proteomic experiment, this approach should increase the accuracy of the analyses, and could potentially provide an increased confidence in single-peptide protein identifications. ACKNOWLEDGMENT The authors thank Rovshan Sadygov for modifications to SEQUEST. The authors also thank Hayes McDonald, Mike MacCoss, Christine Wu, James Wohlschlegel, and David Tabb for critical reading of the manuscript. J.D.V. is supported by a Cystic Fibrosis Foundation fellowship. J.R.Y. is supported by NIH Grant RR11823. Received for review July 18, 2003. Accepted March 2, 2004. AC0348219

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2937