Anal. Chem. 2006, 78, 8374-8385
Mass Measurement Accuracy in Analyses of Highly Complex Mixtures Based Upon Multidimensional Recalibration Aleksey V. Tolmachev, Matthew E. Monroe, Navdeep Jaitly, Vladislav A. Petyuk, Joshua N. Adkins, and Richard D. Smith*
Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352
Mass spectrometry combined with a range of on-line separation techniques has become a powerful tool for characterization of complex mixtures, including protein digests in proteomics studies. Accurate mass measurements can be compromised due to variations that occur in the course of an on-line separation, e.g., due to excessive space charge in an ion trap, temperature changes, or other sources of instrument “drift”. We have developed a multidimensional recalibration approach that utilizes existing information on the likely mixture composition, taking into account variable conditions of mass measurements, and that corrects the mass calibration for sets of individual peaks binned by, for example, the total ion count for the mass spectrum, the individual peak abundance, m/z value, and liquid chromatography separation time. The multidimensional recalibration approach uses a statistical matching of measured masses in such measurements, often exceeding 105, to a significant number of putative known species likely to be present in the mixture (i.e., having known accurate masses), to identify a subset of the detected species that serve as effective calibrants. The recalibration procedure involves optimization of the mass accuracy distribution (histogram), to provide a more confident distinction between true and false identifications. We report the mass accuracy improvement obtained for data acquired using a TOF and several FTICR mass spectrometers. We show that the multidimensional recalibration better compensates for systematic mass measurement errors and also significantly reduces the mass error spread: i.e., both the accuracy and precision of mass measurements are improved. The mass measurement improvement is found to be virtually independent of the initial instrument calibration, allowing, for example, less frequent calibration. We show that this recalibration can provide sub-ppm mass measurement accuracy for measurements of a complex fungal proteome tryptic digest and provide improved confidence or numbers of peptide identifications. Modern mass spectrometry (MS) instrumentation can provide exceptional mass measurement accuracy (MMA), extending to 1 * To whom correspondence should be addressed. E-mail:
[email protected].
8374 Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
ppm or better. Combined with an on-line separation technique (e.g., liquid or gas chromatography, CIEF, CZE, CITP, or ion mobility), MS has become a powerful tool for characterization of complex mixtures, such as protein digests in proteomics studies.1-3 High-accuracy mass measurements allow more confident identification of analyte species,4,5 but obtaining high accuracy generally becomes a greater challenge with increasing sample complexity, measurement throughput, and spectrum acquisition speeds. In addition to instrument mass calibration “drift”, mass measurement accuracy can be affected, for example, by factors such as the total ion current (TIC) or trapped ion population and the distribution of ion abundances throughout the m/z range that can vary widely in the course of on-line separations.6 Approaches for improving mass accuracies in the face of such variations in measurement conditions include the use of automated gain control (AGC) to attempt to define ion populations to some preselected level7-10 and introduction of internal standards.11-14 A number of data processing approaches for improving mass accuracies have also been reported. For example, multivariate regression fitting applied to ESI time-of-flight (TOF) MS15 uses recalibration that corrects (1) Aebersold, R.; Mann, M. Nature 2003, 422 (6928), 198-207. (2) Smith, R. D.; Anderson, G. A.; Lipton, M. S.; Pasˇa-Tolic´, L.; Shen, Y.; Conrads, T. P.; Veenstra, T. D.; Udseth, H. R. Proteomics 2002, 2, 513-523. (3) Pasa-Tolic, L.; Masselon, C.; Barry, R. C.; Shen, Y, Smith, R. D. Biotechniques 2004, 37 (4), 621-639. (4) Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal. Chem. 1999, 71, 28712882. (5) Norbeck, A. D.; Monroe, M. E.; Adkins, J. N.; Anderson, K. K.; Daly, D. S.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2005, 16 (8), 1239-49. (6) Zhang, L. K.; Rempel, D.; Pramanik, B. N.; Gross, M. L. Mass Spectrom. Rev. 2005, 24, 286-309. (7) Tang, K.; Tolmachev, A. V.; Nikolaev, E.; Zhang, R.; Belov, M. E.; Udseth, H. R.; Smith, R. D. Anal. Chem. 2002, 74, 5431-5437. (8) Pasˇa-Tolic´, L.; Harkewicz, R.; Anderson, G. A.; Tolic´, N.; Shen, Y. F.; Zhao, R.; Thrall, B.; Masselon, C.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2002, 13, 954-963. (9) Belov, M. E.; Zhang, R.; Strittmatter, E. F.; Prior, D. C.; Tang, K.; Smith, R. D. Anal. Chem. 2003, 75, 4195-4205. (10) Page, J. S.; Bogdanov, B.; Vilkov, A. N.; Prior, D. C.; Buschbach, M. A.; Tang, K.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2005, 16 (2), 244-53. (11) Kloster, M. B.; Hannis, J. C.; Muddiman, D. C.; Farrell, N. Biochemistry 1999, 38, 14731-14737. (12) Hannis, J. C.; Muddiman, D. C. J. Am. Soc. Mass Spectrom. 2000, 11, 876883. (13) Taylor, P. K.; Amster, I. J. Int. J. Mass Spectrom. 2003, 222, 351-361. (14) Witt, M.; Fuchser, J.; Baykut, G. J. Am. Soc. Mass Spectrom. 2003, 14, 553561. (15) Strittmatter, E. F.; Rodriguez, N.; Smith, R. D. Anal. Chem. 2003, 75, 460468. 10.1021/ac0606251 CCC: $33.50
© 2006 American Chemical Society Published on Web 11/17/2006
the time drift of the TOF calibration and can also include m/zspecific terms to improve mass measurement accuracy. A statistical analysis of mass measurement errors that included estimation of systematic and random errors and an improvement of mass measurement accuracy through recalibration was demonstrated recently.16 In other recent work, mass correction involving TIC was applied17 to improve the mass accuracy of large peptide data sets. The paper describes the recalibration procedure for FTICR data sets and discusses the value of improved mass accuracy in relation to the number of identified peptides and proteins in experiments involving MS/MS-based peptide identification. In spite of these approaches to improve MMA, it still remains that the MMA achievable with on-line separations is generally considerably less than the theoretical best and the analysis of complex mixtures would benefit from improved MMA: for example, improved confidence of identifications or the number of peptide components effectively identified in proteome analyses. We have developed a new approach that involves a multidimensional recalibration using accurate masses of putative known compounds in the mixture being analyzed, as for example encountered in high-throughput proteome analyses for a defined biological system (a specific microbe, human blood plasma, etc.). We earlier18 described a least-squares fitting algorithm to obtain an improved calibration for individual LC-FTICR data sets and evaluated the approach for high-throughput LC-FTICR and LCTOF measurements of microbial proteome samples, as well as a mixture of 12 known proteins digested with trypsin and 23 known peptides (used for our routine quality control (QC) analyses19). The new multidimensional recalibration approach is more robust for applications involving analyses of very complex mixtures and where there is a substantial probability of false attributions for detected species. The approach applies separate calibrations for peaks that are binned, for example, based upon summed spectrum intensities, m/z, peak intensity, and LC separation time. We show that the multidimensional recalibration approach results in more accurate mass measurements for complex data sets relevant to proteomics having a large number of detected species (>105) and sets of possible known compounds (i.e., for matching) of roughly similar size. Thus, the present approach results in a general improvement in the quality and the number of identifications from accurate mass measurements and has been initially evaluated for complex mixtures of peptides used for global “bottom-up” proteome analyses. Multidimensional Recalibration Procedure. A schematic diagram of the entire data analysis approach is shown in Figure 1. Input data include the collection of mass spectra acquired in the course of an LC separation and a table containing the set of putative compounds (and ideally known separation times) that are likely to exist in the sample (e.g., identified from previous analyses with related samples). The MMA of the raw data is characterized (16) Yanofsky, C. M.; Bell, A. W.; Lesimple, S.; Morales, F.; Lam, T. T.; Blakney, G. T.; Marshall, A. G.; Carrillo, B.; Lekpor, K.; Boismenu, D.; Kearney, R. E. Anal. Chem. 2005, 77, 7246-7254. (17) Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Ville´n, J.; Gygi, S. P. Mol. Cell. Proteomics 2006, 5, 1326-1337. (18) Tolmachev, A. V.; Zhang, R.; Langley, C. C.; Monroe, M. E.; Qian, W.; Strittmatter, E.; Liu, T.; Shukla, A.; Udseth, H. R.; Smith, R. D. Proc. 53rd ASMS Conf., San Antonio, TX, June 2005; DVD ROM. (19) Purvine, S.; Picone, A. F.; Kolker, E. OMICS 2004, 8, 79-92.
Figure 1. Schematic diagram of the recalibration algorithm. Input data include a collection of all mass spectra acquired in the course of the separation and a database containing the set of putative compounds that likely exist in the sample (e.g., identified from previous analyses with related samples). Mass measurement accuracy of the raw data is characterized by means of a histogram of mass residuals (see text). Next, all m/z values are subjected to the recalibration procedure, with a multiregion algorithm as an option. Finally, the resulting mass accuracy is evaluated using the histogram of mass residuals after correction.
Figure 2. Diagram illustrating peak binning in the multidimensional recalibration approach. All m/z values obtained in a course of separation-MS measurements are grouped according to parameters that impact the mass measurement accuracy. In the example shown, a number of intervals is chosen for the ion intensity, separation time, and m/z. As a result, a three-dimensional collection of regions is produced, each of them having its own optimized mass calibration.
using a histogram of mass measurement residuals, as described in detail below. All m/z values are then subjected to the multidimensional recalibration procedure that applies a mass calibration optimized locally according to parameters that impact mass measurement accuracy (Figure 2). Finally, the resulting MMA values are again evaluated using a histogram of mass measurement residuals after correction and, as we will show, this allows other m/z values to be assigned with improved precision and accuracy. The input data consist of all peak m/z values detected in each mass spectrum of an LC-MS analysis. Each high-resolution mass spectrum is subjected to a deisotoping procedure, resulting in a set of monoisotopic masses, abundances, charge states, and LC separation times for each detected isotopic structure. A typical LC-FTICR analysis may contain >105 such isotopic envelopes for Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8375
complex samples. Each mass spectrum is analyzed individually, regardless of the elution profile of any component. The list of putative (possible) compounds is generally a set of peptides that have been confidently identified in a related mixtures (e.g., from the same organism or tissue type) and thus consists of the theoretical masses calculated for each of the possible peptides. We refer to this set of peptides “potential mass and time (PMT) tags” for the organism under investigation. The PMT tag databases for various organisms are generated largely by conventional approaches involving multiple analyses of peptides from tryptic digests using LC-MS/MS and peptide identification, e.g., using SEQUEST.2,3,5 Alternatively, an in silico-generated list of tryptic peptides16 can be used as the list of putative compounds. In the case of LC separations, we use normalized elution time (NET) information to improve peptide identification specificity.2,3,20,21 The sets of PMT tags considered in our studies typically contain accurate masses for 104 to >105 different peptides, depending upon the organism or tissue. Finding an Optimal Calibration. The multidimensional recalibration algorithm we report can be applied using a variety of calibration functions and different instrument types. We begin with an example involving an FTICR mass spectrometer and the following calibration function:6,22
m/z ) A/(f + B)
(1)
Here f is a peak frequency obtained from a frequency domain FTICR spectrum23 and A and B are calibration coefficients. Such coefficients are routinely defined in the process of instrument calibration using the mass spectrum of a calibration mixture. A pair of calibration coefficients A0, B0 is generally selected that provides the best achievable mass accuracy for conditions of the calibration. However, the conditions during an LC separation can deviate considerably from those used for the instrument calibration, resulting in optimal calibration coefficients As, Bs different from A0, B0. As an example, with FTICR, the application of such an “external” calibration depends upon ion population and will be accurate through a separation only when the number of ions trapped in the FTICR cell is either very small or the same for both the calibration and the measured spectrum.6,23 However, an on-line separation typically produces an ion current that varies greatly during the separation process and sometimes well beyond the range of optimal mass measurement conditions. Such ion population variation is known to cause cyclotron frequency shifts in FTICR measurements.6,22-29 Thus, a separation-MS analysis of (20) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Anal. Chem. 2003, 75, 1039-1048. (21) Zimmer, J. D.; Monroe, M. E.; Qian, W. J.; Smith, R. D. Mass Spectrom. Rev. 2006, 25, 450-482. (22) Francl, T. J.; Sherman, M. G.; Hunter, R. L.; Locke, M. J.; Bowers, W. D.; McIver, R. T., Jr. Int. J. Mass Spectrom. Ion Processes 1983, 54, 189-199. (23) Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1-35. (24) Ledford, E. B.; Rempel, D. L.; Gross, M. L. Anal. Chem. 1984, 56, 27442748. (25) Jeffries, J. B.; Barlow, S. E.; Dunn, G. H. Int. J. Mass Spectrom. Ion Processes 1983, 54, 169-187. (26) Chen, S.; Comisarow, M. Rapid Commun. Mass Spectrom. 1991, 5, 450455. (27) Chen, S.; Comisarow, M. Rapid Commun. Mass Spectrom. 1992, 6, 1-3.
8376
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
a complex mixture can be characterized by highly variable ion intensities distributed over a wide m/z range in a time-variable fashion and can often significantly deviate from the conditions used for calibration, thus making inappropriate the use of the A0, B0 calibration coefficients.29 Our present approach determines the calibration optimal for a particular LC-MS data set using the effective internal calibration from compounds likely to be present (e.g., PMT tags) and additionally does so for peaks binned so as to allow for many separate calibrations to be developed and applied for subsets of detected peaks. A challenge for this approach is that, unlike typical calibration procedures, it is generally not possible to unambiguously assign a detected species to a specific candidate, thus creating a substantial possibility of multiple false attributions. Additionally, an often significant fraction of the detected peaks will have no corresponding matches within the putative list of compounds, and vice versa. Our approach addresses these issues and involves the following three steps: Step 1. A list of tentative matches between measured m/z values and PMT tags is compiled. This list, termed further as the putative calibrant list, consists of pairs of a measured quantity, e.g., the cyclotron peak frequency f in the case of FTICR, and a corresponding exact (i.e., theoretical) mass m/za, such that the mass deviation is smaller than a certain tolerance Tsearch:
| m/z0 - m/za | < m/zaTsearch
(2)
Here m/z0 corresponds to the initial instrument calibration:
m/z0 ) A0/(f + B0)
(3)
The Tsearch tolerance is selected to be much larger than expected mass accuracy error in order to ensure that most or all possible correct attributions are included. For our LC-FTICR analyses, we use a fairly loose Tsearch ) 30 ppm. Step 2. A mass accuracy histogram for the list of the tentatively matched candidate detected species is generated (Figure 3). The histogram takes the form of a table of occurrence frequencies as a function of mass deviations expressed in, for example, parts per million (ppm). The mass deviation m/z0 - m/za is incremented in bins covering a range from -Tsearch to +Tsearch. The occurrence frequency is calculated as the total number of all putative calibrants that fall in a particular mass deviation bin. The histogram provides an initial approximate determination of false and true attributions between the lists of putative calibrants and detected species. The false attributions will be distributed according to the normal (approximately Gaussian) distribution with a characteristic width of ∼100 ppm, as follows from the peptide mass distribution of possible amino acid compositions.30-33 (28) Easterling, M. L.; Mize, T. H.; Amster, I. J. Anal. Chem. 1999, 71, 624632. (29) Masselon, C.; Tolmachev, A. V.; Anderson, G. A.; Harkewicz, R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2002, 13, 99-106. (30) Mann, M. Abstracts of the 43rd ASMS Conference on Mass Spectrometry and Allied Topics; Atlanta, GA, May 21-26, 1995. (31) Zubarev, R. A.; Håkansson, P.; Sundqvist, B. Anal. Chem. 1996, 68, 40604063. (32) Gay, S.; Binz, P.; Hochstrasser, D. F.; Appel, R. D. Electrophoresis 1999, 20, 3527-3534. (33) Wool, A.; Smilansky, Z. Proteomics 2002, 2, 1365-1373.
Figure 3. Example mass residual histogram used for mass accuracy characterization. Frequency per 0.5 ppm bin is plotted as a function of mass deviation. The random attribution level shown by the horizontal dashed line corresponds to a probability of random attribution. This value is independent of calibration choice and is proportional to the number of putative compounds times the number of experimentally observed species. The histogram area “T” corresponds to true attributions, which have the matching frequency exceeding the random attribution level. Areas F1, F2, and F3 correspond to false attributions. The width of the histogram peak at a certain level gives an estimation of the error spread dMW. The position of the histogram peak can be interpreted as the systematic error dMS. A certainty of attribution (prior to applying elution time constraints) can be estimated as a ratio of true attribution area T to the total area T + F2 that corresponds to the width dMW.
For absolute mass deviations of ,100 ppm, the false attributions can be approximated by a uniform distribution. The probability of correct attribution increases above the random attribution level in the MMA area where true attributions are centered, resulting in a peak; this is illustrated by the histogram region T in Figure 3. The peak of true attributions rarely fits a normal distribution and often has an asymmetric shape with a wing (or tail) due to spectra resulting from, for example, excessive ion populations that lead to positive mass errors. The mass accuracy histogram enables determination of the following characteristics. (a) A systematic mass measurement error dMS is given by the position of the histogram maximum. (It is also possible to use the centroid; however, we have found that the peak maximum gives a more stable measure since it is less influenced by the wings of asymmetric distributions.) (b) The mass measurement error variation dMW can be characterized by the width of the peak T, measured at a specific level above the random attribution frequency, e.g., 10%, as in Figure 3. (c) The certainty of attribution can be estimated as a ratio of true attribution area T to the total area T + F2 corresponding to the width dMW (or any alternative mass accuracy tolerance that is chosen). Note that this provides an estimation of the matching certainty, based on mass accuracy only. The use of LC elution time constraints provides an additional >10-fold reduction in false positive assignments, depending on the uncertainty of LC NET values.2,3,21 An optimal choice of the bin size is important. Small bin sizes lead to noisy histograms that are difficult to interpret. Alternatively,
Figure 4. Mass accuracy histograms obtained for the QC test peptide mixture with an 11-T LC-FTICR instrument. Results for the raw instrument calibration (gray) and after recalibration (black). (a) A single set of the improved calibration coefficients is applied to the entire LC-MS data set. (b) The multiregion calibration is applied, with the following parameters for the calibration regions: 4 regions for TIC, 4 for m/z, and 4 for individual peak intensity, with the total number of 3D regions equalling 64.
bin sizes larger than dMW results in a distortion of the true attribution area. We have found that bin sizes of typically 0.2-0.5 ppm are reasonable for LC-FTICR data and expect that improvement would be obtained by a multiple-step process in which an initial larger bin size was followed by use of a smaller bin size, as described in the Appendix. Step 3. The systematic error (dMS) can be removed by adjusting either of the two calibration coefficients. For example, a positive average mass error, as shown in Figure 3, can be corrected by decreasing the A0 coefficient or increasing the B0 coefficient. An additional goal is to reduce both the systematic error and the mass error spread by the simultaneous adjustment of both coefficients. To do this, calibration coefficients A and B are changed in small increments, and for each pair, the mass error parameters dMS and dMW are calculated, as described in step 2 above. Ultimately, a pair of coefficients A, B that minimize the dMS and dMW errors provides a new calibration optimal for a given data set. Additional details on the algorithm implementation are provided in the Appendix. Figure 4a shows the recalibration histograms for the QC peptide mixture using an 11-T LC-FTICR instrument developed Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8377
Table 1. Recalibration Parameters Used in This Work figure
putative masses
4a, b
4208 15004 4208 2103 2103 4208
5 7 8a 8b 9
dMS raw, ppm
dMW raw, ppm
Max,b raw
dMW after recal, ppm
Max,b after recal
49690
20.0
2.73
3800
100 926 19 246 23 822
5.0 -0.6 19.5 20 20.0
3.91 1.21 3.02 2.47 45.0
2742 1291 1789 2016 2850
1.93(a) 1.03(b) 0.83 0.66 1.07 1.28 6.8
5229(a) 8519(b) 7339 2548 3988 3796 7748
potential calibrantsa
124 137
a Matches with tolerance 30 ppm for all figures, except 15 ppm for Figure 7 and 100 ppm for Figure 9. b Histogram peak maximum, counts per 0.5 ppm bin, except 0.2 ppm bin for Figure 7 and 2 ppm bin for Figure 9.
at our laboratory.34,35 Parameters used for this and all subsequent recalibration histograms are summarized in Table 1. All 144 382 detected monoisotopic masses from the analysis (including species observed in multiple spectra as a peak elutes from the LC) are compared to 4208 previously identified peptide PMT tags with a tolerance Tsearch ) 30 ppm, resulting in 49 690 potential calibrants spread across the set of mass spectra. The initial instrument calibration (gray curve) yields a positive mass shift of all m/z values, dMS ) 20 ppm. The mass accuracy between different experiments can vary for many reasons and is a key reason for the need to determine the mass accuracy distribution for every study, as well as the need for better calibration methods. We believe that one contribution in the case noted resulted from the fact that the instrument calibration was performed several days before the LC-MS analysis, and the large 20 ppm deviation in calibration is mainly attributed to magnetic field drift. After application of the recalibration process (black curve), the mass error distribution maximum is centered at 0 ppm and the mass error spread is improved from 2.7 to 1.9 ppm. Multidimensional Recalibration. We achieved the next level of mass accuracy improvement by means of a multiregion calibration (Figure 2). This approach exploits the notion that the calibration should be performed for conditions as close as possible to the same measurement conditions. Since the measurement conditions can vary during the course of LC-MS measurements (e.g., as mixture composition, complexity, and average m/z change), we hypothesized that the optimal calibration will also vary. For example, an important factor for FTICR mass measurements is the total population of ions present in the trapped ion cell during detection, as discussed above. Under idealized conditions, increased ion populations cause an increased frequency shift of all peak frequencies detected.22,24-27 This so-called global frequency shift can be accounted for in the calibration equation; e.g., in the case of calibration eq 1, this will take a form of the B coefficient being a function of the ion population. Unfortunately, this idealized scheme provides only a minor mass accuracy improvement at best. One reason for this is the difficulty associated with obtaining a direct and reliable measure of the ion population. The ion population is roughly related to the total signal, but this correlation suffers from uncontrolled variations for the (34) Bruce, J. E.; Anderson, G. A.; Wen, J.; Harkewicz, R.; Smith, R. D. Anal. Chem. 1999, 71, 2595-2599. (35) Harkewicz, R.; Belov, M. E.; Anderson, G. A.; Pasˇa-Tolic´, L.; Masselon, C. D.; Prior, D. C.; Udseth, H. R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2002, 13, 144-154.
8378 Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
different ion transient durations, in addition to m/z biases, e.g., resulting from ion kinetic energy variations and the FTICR cell trapping potential anharmonicity.36-39 Additionally, we have previously described how “local” effects in individual mass spectra can apply over specific m/z regions,29 and it is well recognized that the optimal calibration will be different when different m/z ranges are considered.6 More generally, the concept of an A coefficient that represents magnetic field and B coefficient that accounts for electric fields is applicable for the case of an ICR cell having an ideal harmonic distribution of the trapping potential, in addition to negligibly small space charge effects.23 However, the trapping potential anharmonicity inherent for current ICR cells produces frequency shifts dependent upon the ion spatial distribution, which in turn is a function of ion axial energy that generally is m/zdependent. As is well known by FTICR practitioners, the A and B coefficient values obtained from a routine instrument calibration are generally influenced not just by TIC and magnetic field drift but also by instrumental settings of the ion optics preceding the FTICR cell, which may include potentials, cooling time settings, and pressure regimes of the ion transfer/accumulation/cooling/ extraction/trapping processes. These factors are not directly accounted for by the simple two-term calibration function, and the use of an additional total ion signal-dependent calibration term or other similar corrections provides only limited improvement. Thus, in practice both the A and B coefficients can be affected by variability of conditions during LC separations. To address this challenge, we have developed a new multidimensional recalibration approach that uses multiple separate calibrations for a single separation-MS data set. To compensate for the calibration variations due to variable ion population, all mass spectra are grouped according to the TIC values measured from the summation of peak intensities for each mass spectrum. The number of groups NTIC may vary from 1 (meaning no division into groups) to >100. Each group contains mass spectra with TICs falling inside a certain interval of TIC values, termed a TIC region. The TIC regions are defined such that all potential calibrants are distributed evenly between all of the regions. This is done by sorting all putative calibrants with respect to the TIC value of a (36) Gabrielse, G.; Haarsma, L.; Rolston, S. L. Int. J. Mass Spectrom. Ion Processes 1989, 88 (2-3), 319-332. (37) Guan, S. H.; Marshall, A. G. Int. J. Mass Spectrom. Ion Processes 1995, 146, 261-296. (38) Vartanian, V. H.; Laude, D. A. Int. J. Mass Spectrom. 1998, 178 (3), 173186. (39) Barlow, S. E.; Tinkle, M. D. Rev. Sci. Instrum. 2002, 73, 4185-4200.
corresponding mass spectrum and choosing equidistant intervals in the sorted list. After the mass spectra are classified into the groups, the recalibration is performed for each group individually, using the procedure described above. As a result, instead of one calibration common for the whole LC-MS data set, a number of different calibrations is obtained, each one being optimized for a narrow TIC range. We have found this approach to greatly improve the precision of measurements after recalibration (see below). Another parameter that can influence calibration is the m/z value. The calibration precision can often be improved if a narrower mass range is used. To do this we select a number of m/z ranges for recalibration where all the potential calibrants are evenly distributed among the regions, similar to the TIC regions above. Note that when several m/z regions are used, different potential calibrant peaks from one mass spectrum may fall into different groups, and a particular mass spectrum may have several calibrations effective over different m/z regions. This approach was found to further narrow the width of the mass accuracy histogram (i.e., improve mass measurement precision) after recalibration. Additionally, peaks can be divided by the LC separation time in a given number of ranges. This option is useful when the instrument calibration has a significant temporal variation (drift). Minor temporal variations of the calibration may occur, e.g., due to TIC variations across an LC separation or, more significantly, with TOF MS instrumentation due to temperature changes and instabilities of voltages involved in the TOF measurement cycle. Magnetic field drift of the FTICR magnet is generally not observed during LC measurement, i.e., 1-3 h, with our FTICR instruments; however, the time binning would equally address the problem if present. A possible source of the calibration drift with FTICR instruments involving AGC is the temporal variation of the ion population, possibly due to a temperature-related drift in the gain of an electron multiplier that controls externally accumulated ion population. An accurate FTICR calibration can also depend on how individual ion abundances are distributed along the m/z range of measurements.29 The individual peak intensity is also important for calibration of TOF mass analyzers.40,41 Thus, our approach includes multiple regions of individual ion intensities as an additional option for the multiregion recalibration. The division into groups, defined by the three parameters based on TIC, m/z ranges, and individual peak intensity produces a three-dimensional (3D) space of calibration conditions (Figure 2); the corresponding numbers of regions are designated NTIC, N2, and NAi. Although there is no limitation on the number of parameters used, the 3D approach was chosen for the initial evaluation of the method for practical reasons. The multidimensional recalibration is performed as described above, except that a separate calibration is obtained for each region of the 3D array of peaks. The calibration coefficients (stored in a 3D matrix) are then used to correct each of the measured m/z values in the LCMS data set using a triple index (i, j, k), attributed to each experimentally observed peak according to any three of the (40) Blom, K. F. Anal. Chem. 2001, 73, 715-719. (41) Kofeler, H. C.; Gross, M. L. J. Am. Soc. Mass Spectrom. 2005, 16 (3), 406408.
Figure 5. Mass accuracy histograms obtained for a N. crassa fungus sample using an 11-T LC-FTICR MS. Results for instrument calibration (gray) and after recalibration (black). The number of calibration regions for TIC, m/z, and peak intensity is 10 × 2 × 10 ) 200. The systematic mass measurement error (i.e., histogram maximum position) is corrected from 5 to 0 ppm, and the mass error spread is improved from 3.9 to 0.8 ppm. The histogram maximum is increased >3 times, signifying a corresponding improvement in the certainty of identifications; see also Figure 6 and Table 2.
parameterssTIC, separation time, individual abundance value, and m/z rangesin which the given peak resides. Finally, the mass measurement accuracy after recalibration is characterized using a mass accuracy histogram. Since we are performing a statistical analysis for each group, the total number of groups N3D ) NTIC × N2 × NAi should be small enough so that each group will have a statistically large number of potential calibrants, >100 per bin. Optimal values for NTIC, N2, and NAi can be adjusted for a particular system. Data sets with larger numbers of detected species and putative mass tags can be processed using a larger number of groups, each providing separate calibrations. Figure 4b shows the resulting multiregion recalibration histograms for the same LC-FTICR analysis as in Figure 4a. Automated analysis of LC-MS data involves reading the file containing all detected isotopic structures, generating the raw mass accuracy histogram, performing the multiregion recalibration, and calculating the final mass accuracy histogram. The processing takes ∼2 min using a 2-GHz desktop PC, though computation time can increase to ∼30 min for more complex data sets. Since the computation time is reasonably short, it is easy to test various combinations of the parameters NTIC, N2, and NAi. Shown in Figure 4b are the results for a total number of groups N3D ) 4 × 4 × 4 ) 64. The mass measurement precision of the instrument calibration, 2.7 ppm, is improved to 1.9 ppm when using a single-region recalibration (Figure 4a) and to 1.0 ppm for the 64-region recalibration (Figure 4b). Importantly, the histogram maximum is increased >2-fold, corresponding to a significant reduction (∼2-fold) in the relative number of resulting random (i.e., false) identifications. The number of regions can be further increased for more complex systems and when more species are assigned with high confidence. Our typical peptide-level (bottom-up) proteomics analyses generally involve the use of PMT tag sets that provide comparable or larger numbers of useful calibrants than the QC mixture used for Figure 4. As a comparison, Figure 5 shows Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8379
Figure 6. Example mass spectrum of the 11-T LC-FTICR analysis of a N. crassa fungal sample. Numbers designate isotopic structures matched to the putative mass list peptides extracted from the PMT tag database for N. crassa, see Table 2. Inset shows the highresolution detail view of a typical isotopic structure. Mass measurement errors listed in Table 2 show 3-6 ppm before and 1000 for each of the two calibration coefficients or >106 combinations of all possible pairs. To speed the search process, we developed an iterative procedure that starts with a rough step Dppm ≈ 1 ppm and then gradually reduces the step size and the range of search so that the histogram accuracy maximum remains inside the search scope. In a course of iterations, the central bin width DHM is reduced proportionally to Dppm:
DHM ) CbinDppm
(a6)
The coefficient Cbin ) 4 is used as a typical setting. The variation step Dppm is reduced by a factor of 20.5 at each subsequent iteration. This scaling factor was found to be sufficiently small for a stable operation and gives a convenient scaling law of powers of 2. The iterative procedure is terminated when the bin size DHM reaches a preset minimum D_HM. The D_HM value sets a desired level of the calibration refinement. If D_HM is too small,
it can produce a poor calibration because of poor statistics. D_ HM values of 0.2-0.5 ppm were found reasonable for the present LC-FTICR data sets. After the recalibration is complete, the mass measurement accuracy is characterized using the mass accuracy histogram, as described in step 2 above. Finally, the two histograms for initial and refined calibrations are used to evaluate the improvement in mass measurement accuracy and precision,
as well as the improvement in the certainty of attributions, as illustrated in Figure 3.
Received for review April 4, 2006. Accepted September 22, 2006. AC0606251
Analytical Chemistry, Vol. 78, No. 24, December 15, 2006
8385