Automated Protein Identification by the Combination of MALDI MS and

Aug 10, 2004 - Recently, atmospheric pressure (AP) MALDI coupled to an ion trap (IT) has emerged as a convenient method to obtain tandem mass spectra ...
0 downloads 0 Views 75KB Size
Automated Protein Identification by the Combination of MALDI MS and MS/MS Spectra from Different Instruments Fredrik Levander and Peter James* Department of Protein Technology, Electrical Measurements, Lund University, Lund, Sweden Received August 10, 2004

The identification of proteins separated on two-dimensional gels is most commonly performed by trypsin digestion and subsequent matrix-assisted laser desorption ionization (MALDI) with time-of-flight (TOF). Recently, atmospheric pressure (AP) MALDI coupled to an ion trap (IT) has emerged as a convenient method to obtain tandem mass spectra (MS/MS) from samples on MALDI target plates. In the present work, we investigated the feasibility of using the two methodologies in line as a standard method for protein identification. In this setup, the high mass accuracy MALDI-TOF spectra are used to calibrate the peptide precursor masses in the lower mass accuracy AP-MALDI-IT MS/MS spectra. Several software tools were developed to automate the analysis process. Two sets of MALDI samples, consisting of 142 and 421 gel spots, respectively, were analyzed in a highly automated manner. In the first set, the protein identification rate increased from 61% for MALDI-TOF only to 85% for MALDI-TOF combined with APMALDI-IT. In the second data set the increase in protein identification rate was from 44% to 58%. APMALDI-IT MS/MS spectra were in general less effective than the MALDI-TOF spectra for protein identification, but the combination of the two methods clearly enhanced the confidence in protein identification. Keywords: MALDI • AP-MALDI • time-of-flight • ion trap • protein identification

Introduction The separation of proteins on 2D-gels and subsequent excision of spots and protein identification using peptide mass fingerprinting (PMF) has become a standard proteomic methodology.1 Despite recent advances in multidimensional LC separation of peptides, 2D-gels still have advantages in detection of different protein isoforms and in throughput. For highthroughput setup as many gel spots as possible should be identified by MALDI MS rather than time-consuming ESI-MS/ MS.2 MALDI-TOF analysis can give high identification ratios in favorable cases, but sometimes MS/MS analysis is needed for high confidence protein identification. Consequently, MALDITOF-TOF and MALDI-Q-TOF mass spectrometers are rapidly gaining popularity. However, the price of these instruments limits their use in many labs. The atmospheric pressure MALDI source that can be mounted on ion traps3 is a cheaper alternative to this kind of equipment, especially as ion traps are standard equipment in the proteomics lab. The ion trap allows isolation of peptides for MS/MS fragmentation with smaller precursor mass windows compared to current MALDITOF/TOF mass spectrometers. The drawback of ion traps compared to the TOF analyzers is the mass accuracy that can make identification more ambiguous. Mehl et al. analyzed a small set of proteins with AP-MALDIIT and compared with LC-ESI-IT and MALDI-TOF.4 From their * To whom correspondence should be addressed. Department of Protein Technology, Lund University, Box 7031, 22007 Sweden. E-mail: Peter.James@ elmat.lth.se. 10.1021/pr0498584 CCC: $30.25

 2005 American Chemical Society

data it was evident that AP-MALDI and MALDI-TOF analysis could yield complementary information. In this work, we investigate the feasibility of combining APMALDI-IT analysis after MALDI-TOF for high-throughput protein identification using several hundreds of samples. We first analyze the MALDI target plates by MALDI-TOF, and subsequently using AP-MALDI-IT. The mass accuracy from the TOF-analysis is then used to adjust the precursor masses in the AP-MALDI-IT spectra before database searching. The analysis is performed in a highly automated manner, and we show that the number of proteins identified with confidence is significantly increased after analysis by AP-MALDI-IT.

Experimental Section Sample Preparation. 2D gel electrophoresis was performed on Ettan IPGphor and Ettan DALT II system (Amersham Biosciences) and staining was with either Coomassie Blue R350 (Amersham Biosciences) or ruthenium II tris (bathophenanthroline disulfonate).5 2.0 mm diameter gel spots were cut, digested and spotted in an Ettan Spot Handling Workstation (Amersham Biosciences). In gel digestion was performed with modified porcine trypsin (promega). Digests were dried and dissolved in 2-5 mg/mL R-cyano-4-hydroxy cinnamic acid in 50% acetonitrile 0.5% triflouroacetic acid before spotting on MALDI target plates. Mass Spectrometry. MALDI-TOF spectra were acquired on a MALDI-HT (Waters). Typically, 200 shots were collected at each spot in data-dependent mode. AP-MALDI-IT spectra were Journal of Proteome Research 2005, 4, 71-74

71

Published on Web 11/13/2004

research articles acquired on a LCQ DecaXP plus (ThermoElectron) with an APMALDI source (Masstech). A home-built plate holder was used to hold the Micromass target plates. The target plate voltage was 1.8 kV. Automatic Gain Control was turned off and instead the injection time was set to 300 ms and 25 microscans per scan. 4 min acquisition per sample was performed in datadependent mode, with 1 MS followed by up to 10 MS/MS. The dynamic exclusion was put to +-2 Da and 2 min. To achieve optimal fragmentation, the acquisition program was divided into segments such that the first half of the program used collision energy of 38 and the second part 45. Data Handling. Spectrum processing was conducted for one MALDI target plate at the time. MALDI-TOF spectra were searched automatically using PIUMS,6 with different settings as described previously.7 Peaks with signal-to-noise ratios higher than 1.9 were used since this setting yielded many peptide peaks and still gave a reasonable number of noise peaks in the present datasets. The typical differential settings used were as follows: (1) No filtering 200 ppm mass tolerance, (2) Filtering, no recalibration 150 ppm tolerance, (3) Recalibration, no filtering, 70 ppm, (4) Filtering and recalibration 120 ppm tolerance, (5) Filtering, no recalibration 60 ppm tolerance. APMALDI-IT spectra were converted to merged DTA peak lists using a C++ program. The program converts the .RAW files to DTA by calling the Xcalibur Lcq_dta.exe program (ThermoElectron) with the settings -C1 -G1 -AHTFEMAO. All DTA files from one spectrum file are merged into one file. The program then looks up the corresponding MALDI-TOF peak list based on the target plate position and a separate merged DTA is generated with the precursor ion masses adjusted to the closest values in the MALDI-TOF peak list. If the closest peak is further away than 2 mass units, no adjustment of the precursor mass is made. The recalibrated but unfiltered peak lists were used in this process. The merged DTA files were then searched using Mascot Daemon (Matrix Science, www. matrixscience.com). For unadjusted DTAs the search settings were 1.5 Da precursor mass tolerance and 0.8 Da fragment ion tolerance. The so-called peptide summary was used (www. matrixscience.com). For adjusted DTAs 100 ppm precursor mass tolerance and the protein summary was used. The search results from PIUMS were exported in Proteios XML format (www.proteios.org). A C++.NET program, MascotPiumsCombine, was used to combine the results from PIUMS and Mascot. The program reads the Proteios XML as well as the Mascot Database and combines and displays a table, which is printable and exportable in XML format. Databases. For yeast and Streptococcus samples the nonredundant database from ExPASy consisting of TrEMBL and Swiss-Prot was used. The database was expanded for splice variants using Swissknife, and separate entries were created for CHAIN and PEPTIDE entries. Only entries which contained “Saccharomyces” or “Streptococcus pyogenes” were used. To check for false positives the database sequences were modified such that K and Rs were exchanged with amino acids that appear equally frequent in the studied species according to data at www.ebi.ac.uk/proteome/. K/Rs were exchanged as follows: S. pyogenes: V/Q, S. cerevisiae I/G.

Results and Discussion In a recent study by Mehl et al., AP-MALDI-IT analysis of protein digests was more efficient than MALDI-TOF analysis.4 However, the database searching with the MALDI-TOF data was performed with a fixed low mass tolerance without pre72

Journal of Proteome Research • Vol. 4, No. 1, 2005

Levander and James

filtering of the spectra, and it is possible that better results could have been obtained by more refined search strategies. For analysis of MALDI-TOF data we used a multiple search strategy as described previously.7 With this strategy we frequently obtain protein identification rates of more than 90% from automatically excised gel spots of 20-100 kDa. In the present study, we selected data where the success rate was clearly lower, due to low abundance spots or small proteins. The initial test was performed on a batch of 142 gel spots from Streptococcus pyogenes stained with Coomassie Blue R350. After the initial automated PMF multiple search 61% of the samples were identified with confidence. Expectation values were used to distinguish significant scores to allow for comparisons with different search engines.8 The confidence level we used as a limit was an expectation value of 0.002, since almost 500 searches are performed for a plate of 96 samples. An expectation value of 0.002 would mean that the database could be searched 500 times with only one random hit obtaining a score this high. The MALDI target plates were than analyzed by MS/MS in automated mode using AP-MALDI-IT. The peak lists were searched in Mascot using a precursor mass tolerance of 1.5 Da. A second search with a precursor mass tolerance of 100 ppm was performed after adjustment of the peak list DTA precursor masses to the values from the internally recalibrated MALDITOF spectra. The search that returned the lowest expectation value from the two approaches was retained. We used an expectation value limit of 0.005 for Mascot searches, motivated from the fact that nearly 200 searches are performed for a plate. With the commonly used limit of 0.05 we observe frequent false positives, as has also been reported by several other groups. We analyzed the results of the batch after searching with low mass tolerance and high mass tolerance. Mascot can produce a so-called peptide summary, where only MS/MS that produce sequence ion hits are scored, or a protein summary, where the precursors on their own also are used. The peptide summary is independent to the MALDI-TOF-MS PMF search results and should therefore be the method of choice for being orthogonal to the PMF searches. However, the protein summary has the advantage that a protein with a good MS can be distinguished even if much of the MS/MS data is of low quality. Inspection of the results implied that the peptide summary was more efficient with the high mass tolerance settings, but that several significant hits could be obtained by the protein summary with a low mass tolerance where the peptide summary did not yield significant hits. Some examples are presented in Table 1. We therefore chose to use the peptide summary by default for the low mass accuracy peak lists and the protein summary for the proteins with the adjusted precursor masses. In the Streptococcus batch, 5 more samples were significantly identified after the adjustment of precursor masses. Out of these, three new protein top candidates appeared. Finally the MALDI-TOF results and the AP-MALDI-IT searches were combined into a table. The outcome of this batch is displayed in Table 2. About half of the samples gave the same significant protein top candidate by both MALDI-TOF and AP-MALDI-IT. Another set of proteins could be identified significantly by either of the two methods, and was confirmed by being the top candidate by the other method. Three samples gave the same top protein candidate but below the significance level for both methods. On inspection these proteins all had ions scoring in their MS/ MS and were also fitting with the expected molecular mass from the 2D gel and we chose to consider these hit as correct.

research articles

Combination of MALDI-TOF and AP-MALDI-IT Table 1. MS/MS Search Examples

sample namea

MALDI-TOF PIUMS top

St1 Sa1 Sa2 Sa3

Q9A1B0 P22943 P22768 P00950

E

10-10

4× 0.08 8 × 10-5 2 × 10-3

AP-MALDI-IT Mascot precursor masses not adjusted protein summary top

Q8P218 P22943 Q07451 P00950

AP-MALDI-IT Mascot precursor masses adjusted

peptide summary

E

top

0.032 0.99 20 0.068

E

Q8K7V9 P39935 P25336 P00950

1.2 1.6 1.0 0.03

protein summary top

Q9A1B0 P22943 P22768 P00950

E

10-5

1× 7 × 10-4 0.12 1 × 10-4

peptide summary top

E

Q8K8I4 P22943 P22768 P00950

0.25 0.01 1.6 6 × 10-3

a Stn for Streptococcus samples and San for yeast samples. Top ) ExPASy accession number for protein with highest score. E ) Expectation value for highest scoring protein.

Table 2. Protein Identification Summariesa batch

Streptococcus

yeast

no. of samples same top protein, both methods significant score same top protein, PIUMS PMF only significant same top protein, Mascot MS/MS only significant same top protein, not significant score different top protein. PIUMS PMF significant different top protein. Mascot MS/MS significant mixture (both methods) different top protein, both significant, same size total identified

142 73

421 81

5

41

17

26

3

16

6

58

13

18

3 1

4 0

121

244

success rate

0.852

0.580

a The success rate of protein identification using either PIUMS searching with MALDI-TOF spectra or Mascot searching with AP-MALDI-IT peak lists (In the latter case the best result with or without adjusted precursor masses).

The benefit of using the two methods in parallel is further strengthened by the fact that another set of proteins could be identified only by either MALDI-TOF or AP-MALDI-IT. The total success rate increased to 85%, and the samples that were not identified all had very few peptide peaks in the spectra. The most commonly used PMF search engines each apply different scoring schemes, and one can therefore expect to get clearer results and more protein identifications by searching in two or more search engines.9 We therefore also sent the MALDI-TOF data to Mascot PMF. For each search, the peak list that gave the highest score quality with PIUMS, that is untreated, filtered and recalibrated, or filtered only, was used. The search results were very similar, with the same reported top candidate in all but 20 cases. The score expectation value varied between the two search engines, and in a few cases only one of the search engines judge a hit as significant. To test whether the criterion that both PMF search engines reported the same top candidate could tell whether a hit was true or not, independently of the score, we run all the searches against a modified database where the K and R residues were exchanged with amino acids of similar frequency. This modified database yields theoretical tryptic peptides of a similar size distribution as those in the database, but no peptides are the same as those in the real database. When all the peak lists were submitted to this database, PIUMS and Mascot reported one significant hit each, which is in the order of what could be expected by random results. In both cases, the other search

engine also reported the same protein as the top candidate, even though the hit was below the significance threshold. Another 15 peak lists returned the same protein as top candidate with both search engines. From these results one can conclude that the parallel usage of two search engines can highlight interesting hits with nonsignificant scores, but it does not protect against false positives. To check how the combination of MALDI-TOF and APMALDI-IT affected the false positive ratio and reliability of results, we also searched the AP-MALDI DTA peak list with and without modified precursor masses to the same modified database with the same search parameters as described above. In this case, Mascot did not return any significant results. In addition, in no case did the two search engines return the same top candidate protein. The MS/MS data from the AP-MALDIIT and the MS data from the MALDI-TOF is thus orthogonal enough that the reporting of the same top candidate from the PMF and MS/MS search could be used as a strong criterion that the reported protein is correct, even if the individual scores are not significant. To investigate whether the present methodology is of general usefulness, we tested the setup on a set of 421 gel spots from a yeast extract (Table 1). These spots were generally of lower abundance then the Streptococcus samples, and staining was with ruthenium II tris (bathophenanthroline disulfonate).5 After initial optimized PMF search, 44% of the samples gave significant identification results. In this case AP-MALDI-IT confirmed all but 58 of the results. In comparison, the MALDI-TOF PMF confirmed all but 18 of the AP-MALDI-IT significant identifications, after adjustment of the precursor masses. In this batch, 36 samples got the right top candidate protein only after adjustment of the precursor masses and search with smaller precursor mass tolerance, and another 17 samples had their score improved such that it passed the significance level after the precursor mass adjustment. The main difference between the two instruments seems to be with low abundance samples, where the AP-MALDI-IT is not sensitive enough to provide good MS/MS spectra, even if the MS spectra are of decent quality. The mass accuracy on the MS level of the AP-MALDIIT is in many cases not good enough to provide unambiguous PMF identification, and MALDI-TOF mass accuracy is needed for identification. The protein identification rate of the yeast samples did not increase as much after AP-MALDI-IT as the Streptococcus batch did, and this was mainly due to limitations in quality of MS/MS spectra, as a consequence of low abundance samples. Ongoing AP-MALDI source developments such as pulsed dynamic focusing10 and optimized gas flow11 could possibly overcome some of the limitations in sensitivity of this technique. Furthermore, since MS/MS fragmentation of singly charged precursors in an ion trap is different from that of Journal of Proteome Research • Vol. 4, No. 1, 2005 73

research articles conventional fragmentation of multiply charged ions in ion traps, it is likely that optimized scoring algorithms can further improve peptide identification using MS/MS data. When this set of peak list was run against a modified database to check for random hits, PMF yielded 2 positives with PIUMS, but none with Mascot. 43 peak lists yielded the same top candidate protein with both search engines. When the APMALDI-IT data was used with Mascot, no significant identifications were reported. As for the previous batch, when the MALDI-TOF PMF search results were combined with the APMALDI-IT Mascot results the same top candidate was not reported for any sample. The results from the smaller dataset could thus be confirmed with these dataset. A high degree of confidence and very low false positive rate can thus be expected when MALDI-TOF and AP-MALDI-IT data are combined and the same protein is given by both methods.

Conclusions The combination of MALDI-TOF and AP-MALDI-IT described here significantly increases the rate of reliable protein identification. Very little extra work is required to generate the AP-MALDI-IT data compared to MALDI-TOF only, since no extra spotting is involved; the MALDI target plate is simply moved to the AP-MALDI source, and data acquisition and data combination is automated. As a consequence, we are now routinely performing AP-MALDI-IT analysis on MALDI target

74

Journal of Proteome Research • Vol. 4, No. 1, 2005

Levander and James

plates after MALDI-TOF for protein identification, and this has clearly reduced the amount of protein identification by ESIMS/MS in the lab.

Acknowledgment. The authors thank VINNOVA for financial support.

References (1) Henzel, W. J.; Watanabe, C.; Stults, J. T. J. Am. Soc. Mass Spectrom. 2003, 14, 931-942. (2) Quadroni, M.; James, P. Electrophoresis 1999, 20, 664-677. (3) Laiko, V. V.; Moyer, S. C.; Cotter, R. J. Anal. Chem. 2000, 72, 52395243. (4) Mehl, J. T.; Cummings, J. J.; Rohde, E.; Yates, N. N. Rapid Commun. Mass Spectrom. 2003, 17, 1600-1610. (5) Rabilloud, T.; Strub, J. M.; Luche, S.; van Dorsselaer, A.; Lunardi, J. Proteomics 2001, 1, 699-704. (6) Samuelsson, J.; Dalevi, D.; Levander, F.; Ro¨gnvaldsson, T. Bioinformatics 2004, DOI: 10.1093/bioinformatics/bth460. (7) Levander, F.; Ro¨gnvaldsson, T.; Samuelsson, J.; James, P. Proteomics 2004, 4, 2594-2601. (8) Fenyo¨, D.; Beavis, R. C. Anal. Chem. 2003, 75, 768-774. (9) Chamrad, D. C.; Korting, G.; Stuhler, K.; Meyer, H. E.; Klose, J.; Bluggel, M. Proteomics 2004, 4, 619-628. (10) Tan, P. V.; Laiko, V. V.; Doroshenko, V. M. Anal. Chem. 2004, 76, 2462-2469. (11) Miller, C. A.; Yi, D. H.; Perkins, P. D. Rapid Commun. Mass Spectrom. 2003, 17, 860-868.

PR0498584