ARTICLE pubs.acs.org/ac
Improving Proteomics Mass Accuracy by Dynamic Offline Lock Mass Ying Zhang,†,§ Zhihui Wen,†,§ Michael P. Washburn,†,‡ and Laurence Florens*,† † ‡
Stowers Institute for Medical Research, 1000 E. 50th Street, Kansas City, Missouri 64110, United States Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, Kansas 66160, United States
bS Supporting Information ABSTRACT: Several methods to obtain low-ppm mass accuracy have been described. In particular, online or offline lock mass approaches can use background ions, produced by electrospray under ambient conditions, as calibrants. However, background ions such as protonated and ammoniated polydimethylcyclosiloxane ions have relatively weak and fluctuating intensity. To address this issue, we implemented dynamic offline lock mass (DOLM). Within every MS1 survey spectrum, DOLM dynamically selected the strongest n background ions for statistical treatments and m/z recalibration. We systematically optimized the mass profile abstraction method to find one single m/z value to represent an ion and the number of calibrants. To assess the influence of the intensity of the analyte ions, we used tandem mass spectroscopy (MS/MS) datasets obtained from MudPIT analyses of two protein samples with different dynamic ranges. DOLM outperformed both external mass calibration and offline lock mass that used predetermined calibrant ions, especially in the low-ppm range. The unique dynamic feature of DOLM was able to adapt to wide variations in calibrant intensities, leading to averaged mass error center at 0.03 ( 0.50 ppm for precursor ions. Such consistently tight mass accuracies meant that a precursor mass tolerance as low as 1.5 ppm could be used to search or filter post-search DOLM-recalibrated MS/MS datasets.
M
ass accuracy resulting from the high resolution provided by high-end mass spectrometry (MS) instruments has been a fundamentally important factor in proteomics research. To improve mass accuracy, mass calibration must be achieved using standards with known observed mass over charge ratios (m/z) and expected m/z. Online mass calibration1,2 is carried out in real time, while offline calibration35 is implemented after MS data acquisition. By using online calibration, the mass data are already calibrated and ready for use, while additional software for postacquisition calibration is usually required for offline calibration. By using offline calibration, the MS instrument is relieved from the ionization competition between analytes and standards1 or from the increase of MS duty cycle, because of the additional analysis of standard ions.2 Depending on whether calibrants and analyte ions are observed within the same spectrum or between spectra, mass calibration may also be classified as internal or external. Internal mass calibration requires recalibration of every spectrum; therefore, it often demands more experimental and computational work. However, with internal mass calibration, the calibrants and analyte ions are codetected and, hence, share more systematic parameters than they would with external mass calibration. Consequently, internal mass calibration often generates better mass accuracy than external mass calibration.5 The selection of calibrants is critically important for mass calibration. The calibrants may be background ions produced by electrospray ionization (ESI) under ambient conditions,2,3,6 external ions introduced by an electron transfer dissociation chemical ionization source5 or by dual ESI sources,1 or even analyte ions r 2011 American Chemical Society
identified through a survey tandem mass spectroscopy (MS/MS) search.4 The advantage of introducing external ions is that the user may choose calibrants in terms of specific requirements and may raise the response intensity by changing the concentration. However, this method may only be implemented with specific ionization sources1,5 and raising calibrants’ concentration is limited by the ionization competition with analyte ions. By using analyte ions or background ions as calibrants, there is no ionization competition between calibrants and analyte ions. However, the use of analyte ions requires survey MS/MS searches, and, more notably, it may not be used for internal (within-spectra) calibration, while using background ions as calibrants may be used for internal calibration. We have set out to optimize offline mass calibration using background ions produced by electrospray under ambient conditions. Since the online counterpart is called online lock mass,2 we simply refer to offline mass calibration with well-known background ions as “offline lock mass”. Although offline lock mass has been reported previously,3 the effects of calibrant selection on mass accuracy were not studied. The selected calibrants in the previously published analyses were predetermined; therefore, we refer to this type of offline lock mass as being “fixed”. Using background ions produced by ESI has the following striking benefits: (i) there is limited ionization competition between calibrants and analyte ions; (ii) it may be universally Received: July 28, 2011 Accepted: November 1, 2011 Published: November 01, 2011 9344
dx.doi.org/10.1021/ac201867h | Anal. Chem. 2011, 83, 9344–9351
Analytical Chemistry applied in electrospray, since it does not require specific ionization sources; and (iii) it may be used for internal (withinspectrum) calibration. However, compared to the signal of analyte ions, the signal of protonated and ammoniated polydimethylcyclosiloxane (PDC) ions is relatively weak and more inclined to fluctuation between spectra than the signal of analyte ions. To address these issues, we evaluated how to select calibrants. We called the method “dynamic offline lock mass” (DOLM), because, for every single spectrum, the calibrants were not predetermined; instead, they were dynamically selected in order of intensity. To obtain the m/z values of the parent ion, we tested several mathematical methods of mass profile abstraction. We also evaluated the effects of the number of calibrants used in the calibration function on mass accuracy. The dynamic feature of our method was devised to dampen the impact from the fluctuation of calibrants intensity, which was greatly dependent on the type of sample analyzed.
ARTICLE
internal recalibrating factor for all ions detected within each MS1 spectrum. Data Analysis. Using RAWDistiller v. 1.0 with various DOLM settings, .raw files were extracted into .ms2 file format11 (see Table S1 in the Supporting Information). In addition, .raw files were also extracted into .dta file format using DeconMSn software12 and DtaRefinery software.4 We then used an in-house written script to convert these .dta files to .ms2 files before searching them with high-accuracy SEQUEST v.27 (rev.9) software.13 No enzyme specificity was imposed during searches; a mass tolerance of 20 ppm was set for precursor ions (unless otherwise indicated), and a mass tolerance of (0.5 amu was set for fragment ions. Mass Accuracy. We reported mass accuracy in terms of mass error, expressed in parts per million (ppm).14 The mass error (ε) for an ion was defined as the difference between a peptide’s observed and expected protonated molecular weights (MH+O MH+E) divided by MH+E, times 1 000 000: ε¼
’ EXPERIMENTAL PROCEDURES MudPIT Analysis. Two sets of three technical replicate MudPIT runs7 were acquired a whole cell lysate from Saccharomyces cerevisiae (ScWCL)8 on the zeroth, fourth, and sixth days after instrument mass calibration either in profile mode or centroid mode (see the Supporting Information). Another two sets of three ScWCL replicates were acquired by implementing Xcalibur online lock mass, using either 1 out of 1, or 1 out of 8 PDC ions. Except for the differences in instrument control software noted above, these four ScWCL data sets were acquired under essentially identical chromatographic and mass spectrometric conditions. Three technical replicates for a human Mediator complex (HsMED)9 were run on day 0, day 2, and day 4, postmass calibration, in profile mode, without online lock mass (see Table S1 in the Supporting Information). Dynamic Offline Lock Mass as Implemented by RawDistiller. RawDistiller v. 1.0 was written in C++ with Microsoft Visual Studio 2005. To access the proprietary Thermo Scientific .raw files, XRawfile2.dll provided with Xcalibur 2.1.0 was used to access theIXRawfile4 COM interface. When the MS1 data were not acquired in centroid mode, RawDistiller could implement several methods to abstract mass profiles, including centroid, or the centers of Gaussian, modified Gaussian, Lorentz, or inverse polynomial fittings (see Figure S1 in the Supporting Information). The center for each fitting was used to calculate precise m/z values for the parent ions. All of the nonlinear least-squares residual minimization fittings were based on the Levenberg Marquardt algorithm. Any required vector and matrix algebra operations were done by GSL routines.10 To calibrate m/z values in MS1 spectra, RawDistiller could select any number of four protonated PDC ions at 445.1200, 519.1388, 593.1576, and 667.1764 m/z, and four ammoniated PDC ions at 462.1466, 536.1654, 610.1842, and 684.2030 m/z (see Figure S1 in the Supporting Information). For each spectrum, RawDistiller sorted the eight PDC ions by intensity. Since the intensity of calibrants fluctuated between spectra, the selected most-intense calibrants could be different between spectra. Alternatively, RawDistiller could also use a predetermined list of calibrant ions under fixed lock mass settings. Unless noted otherwise, for each MS1 spectrum, RawDistiller plotted the observed m/z of the selected PDC ions in a particular DOLM setting as a function of their calculated m/z, then derived the slope of the linear regression through zero. This slope was then used as the
þ MHþ O MHE 1 000 000 MHþ E
ð1Þ
All mass errors reported in Tables S2S6 (ScWCL) and Table S7 (HsMED) in the Supporting Information were calculated using NSAF v7 software15 and have been plotted as frequency distribution histograms and cumulative frequencies using OriginPro8.1 (OriginLab Corp., Northampton, MA). Statistical comparisons of mass error distributions were performed using the F-TEST (variance) and T-TEST (mean) functions in Microsoft Excel (Microsoft Corp., Redmond, WA).
’ RESULTS AND DISCUSSION DOLM Parameters and Performance Evaluation. We developed the RawDistiller software to improve precision and accuracy in large-scale proteomics datasets acquired on high-resolution instruments. In the dynamic offline lock mass (DOLM) protocol, three important parameters had to be considered: a, n, and r, where a is the method used to abstract the MS1 profile to obtain precise m/z values for parent ions, and n is the number of background ions used in the recalibration function (r) to improve mass accuracy (Figure S1 in the Supporting Information). To represent the possible combinatorial settings of the DOLM process, we used the functional handle D a ∈ ðc, g, gm, l, ip, xÞ, n, r ∈ ð“blank”, b, sÞ ð2Þ
Here, D represents the dynamic aspect of our protocol, as opposed to using a fixed list of calibrant ions (F) or an online lock mass (O). When the data were acquired in profile mode, the abstraction method (a) could be centroid (c), Gaussian (g), modified Gaussian (gm), Lorentz (l), or inverse polynomial (ip). When the data were acquired in centroid mode, RawDistiller did not perform mass abstraction, and a value was “x”, for Xcalibur centroid. A varying number of background ions was used to recalibrate each MS1 spectrum. Because of the limited number of calibrants available, the default calibration function r was a classic linear regression, y = kx, where x and y were the observed and expected m/z values for the n selected PDC ions, respectively (see Figure S1 in the Supporting Information) and k was the slope of the regression between x and y. The r value was left blank when the default linear regression (y = kx) was chosen. We also tested using y = kx + b as the calibration function, where k and b were the slope and intercept of the linear regression between 9345
dx.doi.org/10.1021/ac201867h |Anal. Chem. 2011, 83, 9344–9351
Analytical Chemistry
Figure 1. Optimizing mass profile abstraction, acquisition mode, number of calibrants, and calibration function. Replicate analyses of yeast whole cell lysates (ScWCL) were acquired in profile mode (A, C, or D), or centroid modes (B, D(x, n) datasets) on different days post-mass calibration of an LTQ-Orbitrap. After searching the .ms2 files in exactly the same manner with SEQUEST, the mass errors for all peptide/spectrum matches were calculated (Tables S2S5 in the Supporting Information). The cumulative frequency percentages P(ε) at absolute mass error (ε) were binned (size = 0.02 ppm) and the absolute mass error was plotted on a log2 scale. Abstraction Method A: In the D(c, 2), D(g, 2), D(gm, 2), D(ip, 2), and D(l, 2) settings, the centroid or the centers of Gaussian, modified Gaussian, Lorentz, and inverse polynomial fittings were used, respectively, to obtain precise m/z values. In each MS1 spectrum, the two calibrants with the highest intensity were dynamically selected from a list of eight PDC ions (Figure S1 in the Supporting Information) to calibrate m/z using the slope derived from the linear regression through the calibrants. Acquisition Mode B. ms2 files were created either using the m/z values derived from the Xcalibur centroid, D(x, n), or by abstracting the mass profile using the RawDistiller centroid, D(c, n), or Gaussian, D(g, n), without calibration (n = 0) or with four calibrants. The default linear regression y = kx was used as the calibration function. Number of Calibrants C: ms2 files were generated using RawDistiller, such as the center of Gaussian fitting, to abstract the mass profile, and 08 calibrant ions were dynamically selected in the order of intensity to calibrate m/z using the slope derived from the linear regression y = kx through the n calibrants. Calibration Function D: The D(g, 4), D(g, 4, b), and D(g, 4, s) settings all used the center of Gaussian fitting to abstract the mass profile and the four most-intense calibrants to derive calibration factors. D(g, 4) used y = kx as the calibration function, where x and y were the observed and expected m/z for the four selected PDC ions (Figure S1 in the Supporting Information) and k was the slope of the linear regression between x and y. D(g, 4, b) used y = kx + b as the calibration function, where b was the intercept of the linear regression between the x and y m/z values for the four selected calibrants. D(g, 4, s) used y = x + s as the calibration function, where s was the averaged m/z shift of the four calibrants. Within each MS1 spectrum, the solutions to these linear regressions (slope k, or slope and intercept, b), or averaged mass shift (s) were then used to recalculate the m/z values of the analyte ions internally.
x and y m/z values for the n selected calibrants (r = b); as well as using y = x + s as the calibration function, where s is the calibrants’ averaged m/z shift, calculated as the difference between the calculated and observed m/z (r = s) values. For all of the calibration settings that we tested throughout this work, we used the cumulative frequency percentage P(ε) (see
ARTICLE
Figure 1) to better represent the spread around mean quantitatively, which was more robust than the standard deviation around a mean when multiple distributions existed in the datasets (denoted by small shoulders in the distribution histograms shown in Figures S2S4 in the Supporting Information). At a given absolute mass error (ε), the higher the cumulative frequency percentage was, the better the mass accuracy was. Because P(ε) was normalized, it was not dependent on the total spectral counts; therefore, it was suitable for all of the cases that we examined. When the absolute mass error was above 15.0 ppm, the P(ε) values were almost identical for all of the calibration settings that we tested (data not shown). If low-ppm mass accuracy was not required, all calibration settings were satisfactory. We then mainly evaluated mass accuracy with absolute mass errors in the low-ppm range (