An Iterative Strategy for Precursor Ion Selection for ... - ACS Publications

Apr 30, 2009 - Department of Mathematics and Computer Science, Free University Berlin, Takustrasse 9,. D-14195 Berlin, Germany. Received October 6 ...
1 downloads 0 Views 6MB Size
An Iterative Strategy for Precursor Ion Selection for LC-MS/MS Based Shotgun Proteomics Alexandra Zerck,*,† Eckhard Nordhoff,† Anja Resemann,‡ Ekaterina Mirgorodskaya,†,# Detlef Suckau,‡ Knut Reinert,§ Hans Lehrach,† and Johan Gobom†,¶ Max Planck Institute for Molecular Genetics, Department Vertebrate Genomics, Ihnestr. 63-73, D-14195 Berlin, Germany, Bruker Daltonik GmbH, Fahrenheitstrasse 4, D-28359 Bremen, Germany, and Department of Mathematics and Computer Science, Free University Berlin, Takustrasse 9, D-14195 Berlin, Germany Received October 6, 2008

Currently, the precursor ion selection strategies in LC-MS mainly choose the most prominent peptide signals for MS/MS analysis. Consequently, high-abundance proteins are identified by MS/MS of many peptides, whereas proteins of lower abundance might elude identification. We present a novel, iterative and result-driven approach for precursor ion selection that significantly increases the efficiency of an MS/MS analysis by decreasing data redundancy and analysis time. By simulating different strategies for precursor ion selection on an existing data set, we compare our method to existing result-driven strategies and evaluate its performance with regard to mass accuracy, database size, and sample complexity. Keywords: Iterative precursor ion selection • Result-driven LC-MS/MS • IPS

Introduction In the past decade, liquid chromatography tandem mass spectrometry (LC-MS/MS) based shotgun proteomics has become a standard analytical approach for identifying proteins in complex mixtures. Several thousand peptides are typically detected in an LC-MS analysis of a tryptic digest of a complex protein sample. It often is not feasible to record MS/MS spectra of all of them, because the measurement time or sample amount necessary are not available. However, for successful protein identification, it is usually not required to fragment all peptides in the sample. If, during ongoing LC-MS analyses, the recorded MS and MS/MS data can be used to limit the subsequent MS/MS analyses to a subset of peptides that still represents all sample proteins, the analysis time can be greatly shortened. Several strategies have been reported that improve precursor ion selection for LC-MS/MS. A simple approach is the use of static exclusion lists, containing the molecular masses of calibrants or contaminants. Another approach to decrease redundant data acquisition are dynamic exclusion lists, containing the molecular masses of already fragmented peptides in conjunction with a fixed or flexible retention time window.1 Dynamic exclusion lists can also be used to exclude peptides * To whom correspondence should be addressed. E-mail: zerck@ molgen.mpg.de. † Max Planck Institute for Molecular Genetics. ‡ Bruker Daltonik GmbH. # Current address: University of Gothenborg/Sahlgrenska University Hospital, Department of Occupational and Environment Medicine, Gothenburg, Sweden. § Free University Berlin. ¶ Current address: University of Gothenborg/Sahlgrenska University Hospital, Department of Neuroscience and Physiology, Clinical Neurochemistry Laboratory, S-431 80 Mölndal, Sweden. 10.1021/pr800835x CCC: $40.75

 2009 American Chemical Society

from MS/MS analysis in replicate analyses of a sample,2-6 leading to a larger number of unique peptide identifications in the replicate runs3 and an overall higher number of identified proteins than simple repetitions.6 The concept of dynamic exclusion lists was further extended by excluding peptides with m/z values that match predicted tryptic peptides of already identified proteins.7,8 This approach was implemented in a database search engine for LC-MS/MS by Wallace et al.;7 Scherl et al.8 demonstrated that it also improves protein identification by MALDI MS/MS in samples containing tryptic peptides of multiple proteins. While this approach further decreases redundant data acquisition compared to the previous approaches, it also, however, carries the risk of false matches between detected and predicted peptides whose molecular mass differences fall within the specified error tolerance of the database search, leading to exclusion of peptides from possibly unidentified proteins and thus fewer protein identifications. An alternative to exclusion lists is a strategy that has been termed directed MS/MS. Here, the MS data from one or several LC-MS analyses are first used to determine the molecular mass profile of all compounds in the sample. This profile then serves as the basis for precursor ion selection. This strategy is typically used with LC-MALDI MS because MS and MS/MS spectra acquisition can be performed separately in time; however, a few recent studies also show its performance using LC-ESI MS/MS.9-11 Compared to the standard data-dependent acquisition, which typically selects the most abundant compounds, directed MS/MS can lead to a higher number of identified peptides, especially for precursor ions of low abundance.11 Furthermore, Picotti et al.10 showed that for tryptic digests of single protein samples the number of peptide identifications per protein can be drastically increased. Journal of Proteome Research 2009, 8, 3239–3251 3239 Published on Web 04/30/2009

research articles In this work we describe an iterative, result-driven approach for precursor ion selection in LC-MS/MS and evaluate its performance. We show that it outperforms the use of dynamic exclusion lists and reduces the required number of recorded MS/MS spectra significantly. We used offline nano LC-MALDI TOF MS/MS, where the acquisition of MS and MS/MS can be performed separately, so that MS data can be used to collect molecular weight and retention time information of all compounds and to guide precursor ion selection for MS/MS. These data were also used to simulate online analyses of a complex peptide mixture and evaluate the potential advantages of our strategy when no information of previous MS measurement is available, as it usually is the case in LC-ESI MS/MS based protein identification. To compare precursor ion selection strategies without potential interference from experimental variations, we analyzed a set of samples by LC-MALDI MS, acquiring from each the maximum possible number of MS/MS spectra without placing any restraint on the analysis time. Application of the different precursor ion strategies was then simulated using the acquired data, and the results were evaluated in terms of number of identified proteins per acquired MS/MS spectra.

Material and Methods Chemicals. Trifluoric acid (TFA), tetrahydrofuran (THF), n-octylglucopyranoside (nOGP) and water used for HPLC solvents and MALDI matrix solutions were purchased from Fluka Chemie (Buchs, Switzerland). Citric acid, R-cyano-4hydroxycinnamic acid (CHCA), Iodoacetamide (IAA) and Dithiotherol (DTT) were purchased from Sigma (Sigma-Aldrich, St. Louis, MO). Acetonitrile (ACN, HPLC Gradient grade) was purchased from Carl Roth GmbH (Karlsruhe, Germany). Samples. Five different samples were used to evaluate the different approaches. Sample 1 was the 50S ribosomal subunit, consisting of 33 different proteins, which was isolated from Escherichia coli as described previously.12 It was a gift from Dr. Fucini (Max Planck Institute for Molecular Genetics, Berlin, Germany). The sample was subjected to tryptic digestion as previously described.13 A 6 µL sample, corresponding to 1 pmol of 50S subunits, was used for each LC-MS analysis. Sample 2 was the Universal Proteomics Standard (UPS1, Sigma-Aldrich). It consists of 5 pmol each of 48 human proteins. The protein standard was dissolved in 25 µL of 50 mM NH4HCO3/10 mM nOGP. After adding 5 µL of 25 mM DTT, the sample was incubated for 30 min at 37 °C. Then, 5 µL of 50 mM IAA was added and the mixture was again incubated for 30 min at 37 °C. The sample was diluted by adding 85 µL of H2O. Two microliters of trypsin (100 ng/µL) was added and the sample was incubated at 37 °C overnight. The digest was acidified and diluted by addition of 380 µL of 0.1% TFA and stored in 10 µL aliquots, containing 100 fmol of each of the 48 proteins, at -20 °C. With this sample amount, 28 proteins were identified in the Mascot search (Swiss-Prot human, 10 ppm precursor error tolerance) with at least two significant peptide hits. Samples 3 and 4 consisted of the tryptic digest of Sample 2, spiked with a tryptic digest of human serum albumin (HSA, Sigma-Aldrich) in molar ratios of 1:10 and 1:100. The HSA digest was prepared using the following protocol. HSA was dissolved in water to a concentration of 200 pmol/µL. Five microliters was further dissolved in 44 µL of 100 mM NH4HCO3 and 0.5 µL of 1 M nOGP. After addition of 5 µL of 45 mM DTT, the sample was incubated at 37 °C for 30 min. Five microliters of 3240

Journal of Proteome Research • Vol. 8, No. 7, 2009

Zerck et al.

Figure 1. 2D map of Sample 5. The signal intensities are colorcoded, high intensities are dark, low intensities light.

100 mM IAA was added and the sample was incubated for 30 min in the dark. Then, 140 µL of 100 mM NH4HCO3 was added. Seven microliters of trypsin (100 ng/µL) in 0.015% HCl was added and the sample was incubated overnight at 37 °C. Finally, 786 µL of 0.1% TFA was added, resulting in a concentration of 1 pmol/µL. Sample 5 was a complex sample. We used a tryptic digest of the total proteome of 10 000 HEK293 cells. This sample was analyzed in the contest of the 13th Workshop for micro methods in protein chemistry in Martinsried (www. arbeitstagung.de). It was prepared and provided by the group of Prof. H. Meyer (Medical Proteome Center, Ruhr University Bochum, Germany). The peptide lyophilizate was dissolved in 20 µL of 0.1% TFA. Figure 1 with a survey view of Sample 5 shows the high number of signals present in the sample. LC-MS. All samples except Sample 5 were analyzed on an 1100 Series Nanoflow LC system (Agilent Technologies, Waldbronn, Germany). The mobile phases were Buffer A, 1% acetonitrile and 0.05% TFA, and Buffer B, 90% acetonitrile and 0.04% TFA. The samples were separated using a 100 min gradient. The Agilent 1100 fraction collector spotted from minute 14 to 77 every 30 s. The gradient started with 100% Buffer A, after which the concentration of Buffer B was set to 3% after 5 min and increased to 15% after 8 min. Then Buffer B was linearly increased to 45% over 60 min. At min 73, Buffer B was set to 95% and held at 95% for 5 min. Prior to the HPLC analysis, AnchorChip 800/384 targets (Bruker Daltonics, Bremen, Germany) were prepared with thin layer of CHCA matrix as previously described.13 All mass spectra were acquired on a Bruker Ultraflex III MALDI TOF-TOF with 200 Hz solid state smartbeam laser. Positively charged ions of m/z 800-4000 were detected; for Sample 5, this window was extended to m/z 700-5000, and 1000 single-shot spectra were accumulated at 10 different positions. The monoisotopic peaks were determined using SNAP, implemented in the FlexAnalysis 3.0 software (Bruker Daltonics). Except for Sample 5, all spectra were internally calibrated using two peptides present in the matrix solution (Angiotensin I, 1296.6853 Da, and ACTH (18-39) 2465.1989 Da). Monoisotopic peaks in successive spectra were combined to compounds and selected for MS/MS analysis using the software Warp-LC 1.1 (Bruker Daltonics). Sample 5 was analyzed on an Easy-nanoLC (Bruker). The mobile phases were Buffer A, consisting of 0.5% TFA, and Buffer B with 90% acetonitrile and 0.05% TFA. We used a 205 min

Iterative Precursor Ion Selection

research articles

Figure 2. Schematic workflow of IPS.

gradient; for the first 10 min, 98% Buffer A and 2% Buffer B. Afterward, Buffer B was linearly increased to 35% over 120 min. Then, it was further increased to 70% over 60 min, and finally, it was increased to 100% over 10 min. The fractions were spotted from the 37th to the 165th min every 10 s, resulting in 768 spots on two targets. Half of the sample (10 µL) was injected. Protein Identification. The database searches were performed using Mascot Server (version 2.2).14 The Swiss-Prot protein sequence database (release 54.5) was searched, with taxonomy selection E. coli for Sample 1, and human for all other samples. The search settings were the following: mass error tolerance for the precursor ions, 10 ppm (if not noted otherwise); mass error tolerance for the fragment ions, 0.4 Da; fixed modifications, carbamidomethylation (except for Sample 5); variable modifications, methionine and tryptophane oxidation; number of missed cleavages, 1; type of instrument, MALDI-TOF-TOF. The procedure for iterative precursor ion selection necessitates categorizing the proteins retrieved from the database search as ‘identified’ or ‘uncertain candidates’. In this study, we classified a protein as identified when at least two peptides were identified with a confidence >0.95 as calculated by the Mascot software. Proteins with only one identified peptide are classified as ‘uncertain candidates’. Additionally, Occam’s razor15 was applied for the calculation of the number of identified proteins: proteins were grouped together if they cannot be distinguished by their peptide identifications and were counted as one protein identification. Iterative Precursor Ion Selection. After LC analysis of a sample, all fractions are first analyzed in MS mode. Following automatic peak detection, a compound list is calculated by grouping detected peptides according to their m/z values and retention times. In this study, this task was performed using the software Warp-LC 1.1 (Bruker Daltonics). The software assigns scores to the compounds in the list according to their suitability for MS/MS analysis based primarily on signal intensity and excludes overlapping signals of partially coeluting peptides. This is a standard method for setting the priority of MS/MS analyses, and will in the following be referred to as

static precursor ion selection (SPS). The workflow used for iterative precursor ion selection (IPS) is shown in Figure 2. It starts by acquiring MS/MS spectra of the first few top scoring compounds. A database search is performed with the MS/MS data and the retrieved proteins are categorized as ‘identified’ or ‘uncertain candidates’ (See previous section). All protein sequences retrieved from the search are subjected to in silico proteolysis, and the calculated molecular masses of the produced proteolytic peptides are compared to the m/z values of all entries in the compound list. Because MS/MS analysis of compounds with m/z values that match the in silico calculated peptides of already identified proteins within the tolerated error limits is less likely to result in new identified proteins than MS/ MS of other compounds, their score (priority for MS/MS) is decreased. Conversely, MS/MS of compounds that match in silico calculated peptides of uncertain candidates are more likely to result in identifications than MS/MS of other compounds, and thus, their score is increased. After recalculating the scores of the compounds, MS/MS analysis is performed on the next top entries in the list. A new database search is performed with this MS/MS data set, and the identification results are combined with the previously retrieved results. This process is then repeated until the set termination criteria have been fulfilled (see below). The number of acquired MS/MS spectra per iteration, referred to in the following as ‘step size’, was set to one unless otherwise stated. The basic implementation uses a simple strategy for changing the score of the compounds: if a compound has a mass matching a peptide of an already identified protein, its score is halved, and if it matches an uncertain candidate, its score is set to the maximal score present in the list. Often more than one peptide matches a given experimental m/z value within the tolerated error range. The number of matching peptides varies depending on the m/z, the searched database, and the error tolerance. Thus, each m/z match between a detected compound and an identified protein or uncertain candidate has a different probability of being true. To reflect this difference, a weighting factor was used when rescoring the entries in the compound list. The weighting factors were Journal of Proteome Research • Vol. 8, No. 7, 2009 3241

research articles

Zerck et al.

calculated by preprocessing the database once before the datadependent selection process starts, as described below. Database Preprocessing and Rescoring. The magnitude by which the score of a given compound is increased or decreased is controlled by a weighting factor that reflects the probability that the compound belongs to the protein, Pacc, to which it was matched. The weighting factors are based on the frequency of peptide masses in the sequence database used for protein identification. To decrease the influence of the database size, the weights are scaled to the maximum relative frequency. The in silico digestion of all proteins in the database and the calculation of their peptide masses is the computationally most demanding part of our algorithm. To speed up the process, these calculations can be performed before the iterative precursor ion selection starts. Using string indices, like a suffix tree, could further shorten the data processing time by avoiding redundant computations; however, in practice, the time for the preprocessing did not warrant the effort. Instead, the masses of the proteolytic peptides of all proteins are stored and can be quickly accessed in each iteration. The weighting factor for a peptide with mass m is calculated as w(m) ) 1 -

f (m) fmax

(1)

where f(m) is the frequency of that mass in the database (within a specified error range) and fmax is the maximal frequency. If mass m is very common in the database, that is, the mass matches many different peptides, the weighting factor will be close to zero. For low-frequency masses it will be close to one. If a compound c with mass m is to be shifted down in the list, its new score s′ is calculated as follows: s′(c) ) s(c) -

s(c) w(m) w(m) ) s(c) 1 2 2

(

)

(2)

With a very common mass, w is small, and hence, the score of the compound is decreased by only a small amount. Conversely, with a high weighting factor, the score is basically halved. Analogously, the new score of a compound c that matches an uncertain protein candidate is increased:

( (

s′(c) ) s(c) + (smax - s(c))w(m) ) s(c) 1 +

) )

smax - 1 w(m) s(c) (3)

Here, a low weighting factor, that is, a low frequency of mass m, leads to a new, higher score. The score can maximally be smax, which is the maximum score found in the initial compound list. With the new score, the compound is among the top entries, but the order of the top entries is based on their initial score and the frequency of their masses in the database. Hence, the compounds that are most likely to give good identification results are at the top. Implementation. The algorithms necessary to use IPS have been implemented in C++ as part of OpenMS,16 an open source library for mass spectrometry (www.OpenMS.de). A command line tool is available as part of The OpenMS Proteomics Pipeline (TOPP),17 that determines the next precursors with one of the compared strategies given a list containing all MS compounds and the identification results until this point. There is no online implementation available that communi3242

Journal of Proteome Research • Vol. 8, No. 7, 2009

Figure 3. Workflow of the evaluation. In an exhaustive LC-MS/ MS analysis all MS and MS/MS spectra were acquired, additionally a list of all compounds found in the LC-MS spectra was created. Each precursor ion selection strategy processes this compound list in a different order, in each iteration considering the database search results from the corresponding MS/MS spectra.

cates directly with the mass spectrometer, as the required interface is instrument specific. However, any research group or instrument manufacturer interested in implementing IPS can download and use our tools as well as the underlying source code under the GNU Lesser General Public License (LGPL).

Results and Discussion The aim of IPS (iterative precursor ion selection) is to speed up protein identification by iteratively changing the priority of precursor ions selected for MS/MS using protein identification results obtained from the previously acquired MS and MS/MS data. The priority of a given precursor ion is changed based on the probability that it stems from an already identified protein (in which case the priority is decreased) or from a protein that is still an uncertain candidate because set identification criteria have not been met (in which case the priority is increased). To evaluate IPS and compare it to other strategies, the different selection strategies were simulated on already acquired LC-MALDI MS data sets containing MS/MS data on as many detected sample components as possible. Each simulation is based initially on the same list of (MS) compounds detected in a given LC-MS run. The precursors are then chosen in different order depending on the selection strategy, see Figure 3. For the iterative strategies, the database search results from the corresponding MS/MS spectra are then successively used to influence the subsequent precursor ion selection. We chose to perform the comparison in this way rather than applying the different selection strategies in separate LC-MS analyses because it avoids the risk of confusing performance of the precursor ion selection strategy with experimental interrun variations of the LC-MS analyses. Because the order in which fragment ions are analyzed does not affect the quality of MALDI MS/MS spectra in our experimental setup, the simulated identification results of each selection strategy can be expected to be identical to those that would have been obtained had the strategy been applied in real-time.

Iterative Precursor Ion Selection The performance of the method in terms of number of identified proteins per number of performed MS/MS analyses was compared to SPS (static precursor ion selection), which is based only on features in the MS data, primarily signal intensity. A comparison was also made to Dynamic Exclusion7,8 (DEX), in which precursor ions with m/z values matching tryptic peptides of previously identified proteins are successively excluded from subsequent MS/MS analyses. The protein samples used for evaluation of IPS differ in complexity and in the relative abundance of the constituent proteins. MS/MS analyses were performed of all detected compounds, after which each of the precursor ion selection methods was simulated using the acquired data. Hence, all strategies result in the same number of identified proteins, except DEX where specific compounds are excluded. Replicate analyses were performed of each sample: five runs for the protein standard, six for the ribosomal sample, and two for each concentration of the protein standard spiked with HSA. The number of proteins identified at each MS/MS iteration with the different approaches was averaged over the replicates; therefore, in the resulting figures, also noninteger values appear as protein identification counts. Figure 4A shows the number of identified proteins in Sample 2 as a function of the number of acquired MS/MS spectra. The tolerated mass difference for matching detected compounds to in silico produced proteolytic peptides (and for database searching) was 10 ppm. The theoretically optimal precursor ion selection (pink graph) is shown as a reference. This graph results in a straight line with a slope of 0.5, according to the criteria set for protein identification (2 identified peptides per protein) and assuming that all peptides are unique to the proteins in the sample, until all proteins have been identified. When using SPS (blue graph), the priority of the scheduled MS/ MS analyses is solely determined based on features in the MS data set, primarily the precursor ion signal intensities. Here, the slope decreases continuously with the number of acquired MS/MS spectra for two reasons: First, the precursor ions are analyzed in order of decreasing intensity and thus MS/MS spectra acquired early are more likely to result in identifications than spectra acquired later. Second, as the MS/MS analysis proceeds, more and more precursor ions are analyzed that belong to already identified proteins and thus do not contribute to new identifications. IPS (red graph) resulted in a steeper slope. For instance, identification of 25 proteins required recording of less than half the MS/MS spectra when using IPS compared to SPS. DEX initially performed marginally better than SPS, but in the end failed to identify the same number of proteins because some compounds had been erroneously matched to identified proteins, and were excluded from subsequent MS/MS analyses. The figure also shows the separate effect of the two components of IPS, that is, decreasing the priority for compounds matching peptides of identified proteins (Down-shift), and increasing the priority for compounds matching peptides of uncertain candidates (Up-shift). Both perform slightly better than SPS, but worse than IPS. DEX and Down-shift have the same performance for the first 400 spectra, after which Down-shift has a higher identification rate, that is, it is better to be less stringent and only decrease the priority of compounds that match peptides of identified proteins instead of excluding them. Mass Accuracy. Figure 4B,C shows the identification results obtained with the different strategies when allowing a mass error tolerance of 30 and 50 instead of 10 ppm for database

research articles

Figure 4. Sample 2. Protein identification rates with different mass error tolerances for database searching and matching of compounds: (A) 10 ppm, (B) 30 ppm and (C) 50 ppm. Shown are optimal precursor ion selection (pink), SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). Swiss-Prot human was used as database.

searching and matching of compounds. The performance of SPS is largely independent of differences in mass error tolerance within the investigated range, whereas the iterative strategies show dependence. The effect is strongest for DEX where the total number of identified proteins decreases from 27 to 22 with rising error tolerance, due to a larger number of false matches between compounds and identified proteins. The effect is less pronounced with Down-shift, but overall, the performance Journal of Proteome Research • Vol. 8, No. 7, 2009 3243

research articles

Zerck et al.

degrades. High mass error tolerance also influences IPS, which loses its advantage over SPS. Up-shift shows the smallest dependence on mass accuracy, but the advantage over SPS is only minor. IPS thus benefits strongly from high mass accuracy as there are fewer peptide matches for each mass and hence less false matches possible. With 10 ppm error tolerance, in total, 192 compounds were shifted up, thereof 5 were false assignments leading to a different peptide identification than expected. This number increases to 32 wrong assignments with 50 ppm error tolerance. The same holds true for DEX where high error tolerance leads to false positive hits and thereby to more false exclusions. Elias and Gygi18 have calculated the number of tryptic peptides with a certain expected m/z considering different mass windows using the IPI human database. Their data show that even within mass windows as narrow as 1 ppm there can fall up to 10 different peptides. With increasing window width, the number of peptides raises steeply, explaining the poor performance of DEX with 50 ppm error tolerance. Database Size. Figure 5 shows the influence of the database size on the performance of the method. Three databases of different size were used: a self-composed minimal database containing only the 48 proteins from the protein standard, IPI human (v. 3.46) with 72 079 sequences, and the complete SwissProt database containing 289 473 sequence entries. The allowed mass deviation for matching detected compounds to in silico peptides and for database searching was set to 10 ppm. Four more proteins were identified when searching the minimal database compared to the larger databases, because of the lower ion score cutoff value for identification. Apart from this difference, the resulting graphs (Figure 5A) look similar to those obtained with IPI human (Figure 5B): IPS has the steepest slope in both cases. The slope of SPS is continuously decreasing. DEX is only slightly better than SPS, as well as Down-shift. Up-shift performs worst during the first 200 spectra but improves with an increasing number of acquired spectra. Thus, the size of the database does not significantly affect the performance of IPS. The reason for this is that the number of up- or downshifted compounds remains approximately the same regardless of the database size, mainly because compounds can be shifted only once in each direction even if more than one protein contains a matching peptide. When searching the complete Swiss-Prot database (Figure 5C), the results are markedly different: SPS is unaffected, but almost all iterative approaches perform worse. IPS has advantages only during the first 400 spectra. DEX and Down-shift at first perform comparably to the SPS, but soon become the worst choice. Only Up-shift performed slightly better when using this database compared to the other databases. The poor performance of the iterative approaches is not a result of the large database size, but is caused by the presence of many homologous protein sequences from different species in the database. After a given protein has been identified, the in silico produced peptides of all its homologues will be matched to the detected compounds and lead to numerous erroneous down-shifts. While this is often not a problem because the sample taxonomy is known, this result points to the problem of handling degenerate tryptic peptides15 whose sequences are shared by many different proteins. When using IPS, one or a few members of large protein families, such as the actins and tubulins, are quickly identified, but peptides unique to a specific member of the family may be identified very late. 3244

Journal of Proteome Research • Vol. 8, No. 7, 2009

Figure 5. Sample 2. Dependence of protein identification rates on database size: self-composed database containing only the 48 proteins present in the sample (A), IPI human (B) and complete Swiss-Prot (C). The precursor mass error tolerance was set to 10 ppm. Shown are the optimal selection (pink), SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red).

Proteins sharing a conserved domain but otherwise having distinctly different sequences pose the same problem. Sample Stoichiometry. To evaluate IPS on a biological sample of moderate complexity, we analyzed the 50S ribosomal subunit of E. coli, consisting of 33 proteins. Twenty-nine of the 33 proteins were identified with at least 2 significant peptide hits in almost all replicate runs. Among the unidentified proteins were the two smallest proteins of the 50S subunit, both

Iterative Precursor Ion Selection

Figure 6. Sample stoichiometry. The plots show the number of identified proteins over the number of acquired MS/MS spectra for the 50S ribosomal subunit of E. coli (A) and the protein standard spiked with HSA in concentration 1:10 (B) and 1:100 (C). Shown are the optimal selection (pink), SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). The error tolerance was set to 20 ppm and the databases were SwissProt E.coli and Swiss-Prot human. Noninteger values for the protein number are due to averaging over replicates.

with MW below 10 kDa. For these, the probability is lower to detect two high scoring peptides. Figure 6A shows the identification rates obtained with 20 ppm mass error tolerance. DEX and Down-shift perform similar to or slightly better than SPS. Up-shift is the worst strategy for the first 250 spectra, but later,

research articles it is slightly better than SPS. IPS has a steep slope approaching to the optimal curve, identifying 25 proteins with 130 spectra while SPS required 300 spectra more. A plausible explanation for the higher performance of IPS for this sample compared to Sample 1 is the difference in sample stoichiometry. In Sample 1, all proteins were present at equimolar concentrations, which is no realistic setting. Proteins in biological samples often differ in their concentrations over several orders of magnitude. For protein identification by LC-MS, this leads to overrepresentation of high-abundance proteins, and thus accumulation of redundant MS/MS data, while low-abundance proteins, represented by only a few detected peptides, often fail to be identified. To evaluate the stoichiometry effect, we performed experiments in which the tryptic digest of the UPS1 protein standard was spiked with a tryptic digest of HSA at a molar excess of 10 and 100, respectively. In the figures, also noninteger values for the number of identified proteins are shown; this is due to averaging this number over replicate runs. In both cases, the total number of identified proteins decreased drastically. Figure 6B shows the results for the protein standard with 10 times molar excess of HSA. In total, 16 proteins were identified, which is 10 fewer than for the original UPS1 sample. SPS, DEX and Down-shift again show similar performance. IPS results in a steeper curve in the beginning, while Up-shift lies in between IPS and the other curves. With 100-fold molar excess of HSA, the number of identified proteins is reduced to seven (Figure 6C). For the first 100 MS/ MS spectra, all strategies yield similar results, but afterward, the curves dramatically divide: SPS, Down-shift and DEX fail to identify a new protein for more than 200 MS/MS spectra. One would expect DEX to perform well in the spiking experiments; however, between iterations 50-350, where no new proteins were identified, mainly compounds that did not lead to any peptide identification were selected. For the few peptide identifications in this phase, the second peptide needed to support the protein identification is found several hundred spectra later. Up-shift leads to a quicker identification of other proteins. It also accounts for the main benefit of the performance of IPS, which again shows the best results. Sample Complexity. We also evaluated our method on a complex protein mixture. We used a tryptic digest of the proteome of 10 000 HEK293 cells (Sample 5). A total of 400 proteins were identified with at least two significant peptide hits. The data were externally and not internally calibrated; hence, the experimental mass accuracy was 25 ppm instead of 10 ppm. An important observation was that the performance of all iterative approaches (IPS, DEX, Up-shift, Down-shift) was worse than that of SPS (see Figure 7A). To evaluate the dependency of the iterative precursor ion selection on the mass accuracy with Sample 5, we simulated a high experimental mass accuracy. We performed a first database search using 25 ppm. Then the m/z values of all identified peptides were replaced by the calculated (error-free) values. With the modified compound list, we compared the different strategies again now using, in addition to 25 ppm, 10 ppm, 5 ppm, 2 ppm, 1 ppm and 0.5 ppm as allowed precursor mass error tolerance in the database search. Figure 7 summarizes the results. With 25 ppm error tolerance, both DEX and IPS perform worse than SPS and it is clearly visible that our iterative approach in this case requires very specific search input data to outperform SPS. Already with an error tolerance of 5 ppm, its advantage is Journal of Proteome Research • Vol. 8, No. 7, 2009 3245

research articles

Zerck et al.

Figure 7. Sample 5 with different precursor ion mass tolerances: (A) 25 ppm, (B) 10 ppm, (C) 5 ppm, (D) 2 ppm, (E) 1 ppm, and (F) 0.5 ppm. The star indicates a simulated experimental mass accuracy (see text). Shown are the optimal selection (pink), SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). Swiss-Prot human was used for the database search.

almost lost. With 0.5 ppm error tolerance instead, a significant improvement was observed when using IPS. For instance, it took 1900 spectra less than SPS to identify 350 proteins. When the acquisition of one MS/MS spectrum takes 10, 20, or 30 s, this translates into a difference of more than 5, 10, or 15 h analysis time, respectively. Using a precursor mass error tolerance of 0.5 ppm, with IPS the maximal number of proteins are identified in 1250 spectra less than with SPS. With a mass error tolerance of 1 ppm, this difference drops to 500 spectra. 3246

Journal of Proteome Research • Vol. 8, No. 7, 2009

With 0.5 ppm error tolerance, 185 (9.8%) out of the 1896 upshifted compounds were leading to a different peptide identification than expected; that is, they were false assignments. This number increased to 855 (18%) out of 4724 upshifted compounds with 5 ppm error tolerance. This comparison clearly shows the problem our strategy encounters when applying it to a complex sample. To benefit from it, the number of false positive assignments needs to be low, for example, 10% or less.

Iterative Precursor Ion Selection

research articles

Figure 8. The number of MS/MS acquired at each retention time point needed to identify 100, 200, 300, and 400 proteins with IPS (left) and SPS (right).

Analysis Sensitivity and Efficiency. In the results shown above, the performance of the different precursor selection methods was evaluated with regards to their identification rates (the number of identified proteins per number of acquired MS/ MS spectra). These naturally converge as the MS/MS analyses of all sample compounds approach completion, resulting in the same number of identified proteins for all methods. In reality, however, sample consumption often limits the number of MS/MS analyses that can be performed per fraction, and the selection of precursor ions thus affects the total number of identified proteins. For MALDI MS, it has been reported that it is possible to enhance the overall detection sensitivity by reducing the amount of MALDI matrix as well as the sample spot size, for example, by using 4 times less matrix and 400µm instead of 800-µm sample anchor spots.19 It has been shown, however, that this strategy does not result in a linear increase of the detection sensitivity as might be expected because the necessary higher degree of sample concentration on the AnchorChip is not a lossless process. In fact, as in ESI, the detection sensitivity in the first place is limited by the sample concentration and not the sample amounts. The tradeoff of the smaller sample spot size is a loss of MS/MS measurement capacity. Figure 8 shows the distribution of the acquired MS/MS spectra over the different fractions that were needed to identify 100, 200, 300, and 400 proteins with IPS and SPS, respectively. In general, the MS/MS spectra are distributed more evenly over the fractions with IPS. For instance, after 400 protein identifications, there are very few fractions with more than 15 MS/MS spectra when using IPS, whereas with SPS, this is frequently the case. To evaluate the performance of the different precursor ion selection methods in a situation where the sample amount

for MS/MS spectra acquisition is limited, we simulated the analysis of Sample 5 using different number of allowed MS/ MS spectra per fraction. In such a setting, when using SPS, not all compounds can be acquired at their optimal position on the target. However, most compounds elute over several fractions, making it possible to acquire MS/MS spectra of them in an adjacent fraction. A precursor ion selection software would typically optimize the selection of the sample position for each precursor ion so that the maximum number of MS/ MS spectra can be acquired. In our simulation, we assumed that acquiring the MS/MS spectra on another sample spot, where the signal-to-noise ratio is higher than 10 (an initial criterion for our precursor ion selection), would lead to the same identification result as the optimal sample position. While this assumption may not be correct, it poses no particular disadvantage to any of the precursor ion selection strategies. Figure 9 shows the results of the simulation with 5, 10, and 15 maximal MS/MS analyses per fraction. In a setting where the sample is limited, so that not all scheduled MS/MS spectra can be acquired, IPS leads to the identification of more proteins. For example, with a maximum of 5 spectra per fraction, IPS identifies 350 proteins and SPS only 298. With increasing spot capacity, the difference shrinks and with 15 spectra per spot SPS identifies only 2 proteins less than IPS. Online IPS. In the results shown above, precursor ion selection was made from a compound list based on a molecular mass profile of all compounds in the sample. This is typically the approach of choice for offline analytical schemes used in LC-MALDI MS. However, IPS can also be used online during ongoing LC-MS/MS analyses. In this case, the identification results from previous MS/MS spectra are used to rank the signals of the current MS scan. Journal of Proteome Research • Vol. 8, No. 7, 2009 3247

research articles

Zerck et al.

Figure 10. Sample 5 with a simulated online precursor selection and limited fraction capacity of 5 MS/MS spectra per fraction. As before, a simulated experimental mass accuracy (see text) of 1 ppm was used. Shown are SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). Swiss-Prot human was used for the database search.

Figure 9. Sample 5 with a simulated limited spot capacity: (A) 5 MS/MS, (B) 10 MS/MS spectra, and (C) 15 MS/MS spectra per fraction. As before, a simulated experimental mass accuracy (see text) of 1 ppm was used. Shown are SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). Swiss-Prot human was used for the database search.

The online approach was simulated for Sample 5 by reading in the compound list in steps according to retention time. At each time point, the scores for the available precursor ions were calculated based on previously recorded MS and MS/MS data, analogously to the offline strategy. As in the previous section, this means that the MS/MS spectrum used for protein identification may not have been recorded from the fraction where it was selected for fragmentation. 3248

Journal of Proteome Research • Vol. 8, No. 7, 2009

Figure 10 shows the number of identified proteins over the retention time for the different precursor ion selection methods. The number of MS/MS spectra per fraction was limited to 5. With the use of IPS, 339 proteins were identified, whereas SPS yielded only 295 identifications. As expected, the performance gain of IPS is smaller than in the offline setting, but still significant. Again the performance difference decreases when more MS/MS spectra per fraction are allowed. With 10 instead of 5 MS/MS spectra, the number of identified proteins rose to 399 and the difference between the two strategies dropped to seven proteins (data not shown). This simulation shows the potential of IPS for online LCMALDI MS/MS. One MS spectrum and 5 MS/MS spectra were acquired of each fraction collected over 10 s intervals. This data acquisition rate is not far away from those typically employed in LC-ESI MS/MS-based shotgun proteomics. A fundamental difference between ESI and MALDI MS/MS of tryptic peptides concerns the charge states of the precursor ions. With ESI, depending on the size and sequence of the peptides, preferably doubly and triply charged molecular ions are analyzed, whereas MALDI generates almost exclusively singly charged species. For this and other reasons, for example, the fragmentation technique employed, the informative value of our simulation is limited regarding potential advantages when combining IPS with LC-ESIMS/MS. An obvious parameter that needs to be considered in the scoring scheme when using ESI is the charge state along with the option to prefer one over the other and to include several or only one. Practical Aspects. 1. Protein Identification Criteria. In the current study, two matching peptides, identified with over 95% confidence, were chosen as the criteria for protein identification, while proteins with a single matching peptide were considered uncertain candidates. We are aware of the simplicity of this criterion. However, so far there is no accepted standard for protein identification. In the implementation, our criteria can be easily substituted by any other. A popular approach is the calculation of false discovery rates (FDR) based on target-decoy database searches. In the end, the protein identification criteria are used to decide whether the peptides identified so far are sufficient or further peptide identifications

Iterative Precursor Ion Selection are needed. For instance, if the FDR for a specific protein is too high, at least one additional peptide identification might be required. The impact of the required confidence in the protein identifications on the performance of the different precursor ion selection strategies was evaluated using Sample 5, considering different numbers of required peptides for protein identification. Figure 11 shows the results with 1 ppm error tolerance when requiring three, four or five peptides. By increasing the number of required peptide hits, the distance in performance between SPS and IPS grows, as expected. With four and five peptide hits required, however, there is a region in the graph for IPS between 2000 and 3500 MS/MS analyses where no new proteins are identified. The reason for this plateau phase is twofold. On the one hand, a large portion of the peptide identifications leads to proteins for which there are not enough peptides available to cross the identification threshold. The second problem is degenerate peptides shared by a protein family. After their identification, remaining compounds matching m/z values of a tryptic peptide of any member of this family are assigned a low rank, although these compounds are required for an identification of the members of the family. These downranked compounds are scheduled for MS/MS after around 3500 MS/MS spectra and lead to the high increase of protein identifications after the plateau phase. This observation shows that peptide sequences shared by many different proteins should be handled differently depending on if the aim of the experiment is identifying specific members of protein families or if it is sufficient to know that at least one member of the family is present. 2. Termination Criteria. To benefit from our approach in an online scenario, reliable termination criteria are necessary. There are at least two options: The acquisition should stop when no new protein was identified during the last x iterations, with x being a user defined number. Another possibility is to consider the slope of the identification rate curve, which flattens out when approaching the maximal number of proteins that can be identified. So if the slope falls beneath a predefined threshold across a defined number of iterations, the acquisition terminates. In the current implementation of IPS, these criteria have to be applied with care to prevent a premature termination. For instance, the complex sample with four or five peptides required for protein identification would have been terminated too early (Figure 11B,C). 3. Step Size. Above results were produced considering one MS/MS spectrum per iteration. In practice, it can be too timeconsuming to use such small step sizes. Figure 12 shows the results for the complex sample with 1 ppm error tolerance using different step sizes. The expected trend is the smaller the step size, the better the performance is. Step sizes up to 10 seem to be good values to use.

Conclusions and Perspectives Depending on the aim of the analysis, the benefits of IPS can vary. In a scenario where a large number of samples needs to be analyzed in short time, as shown, IPS can lead to a significant saving of time. When the sample amount is limited, using IPS the sample can be efficiently consumed and IPS can lead to more protein identifications. By increasing the number of required peptides for protein identification, IPS can be used to thoroughly identify one or more proteins by targeting specifically their peptides. However, in a situation where sufficient time and sample is available and an extensive analysis

research articles

Figure 11. Sample 5 with different protein identification criteria: we required 3 (A), 4 (B), and 5 (C) significant peptide hits per protein identification. Shown are the optimal selection (pink), SPS (blue), DEX (black), Up-shift (green), Down-shift (orange) and IPS (red). The experimental mass accuracy was artificially increased (see text); the precursor mass error tolerance was set to 1 ppm, and Swiss-Prot human was used as database.

of the sample is wanted, all possible MS/MS spectra will be acquired anyway, thus IPS would be of little use. A major advantage of IPS is the saving of analysis time in comparison with SPS. The greatest gain is possible in a setting where the samples are analyzed in high-throughput, for example, in a service laboratory. Here it is often not necessary Journal of Proteome Research • Vol. 8, No. 7, 2009 3249

research articles

Figure 12. Sample 5 with different step sizes. The experimental mass accuracy was artificially increased (see text); the error tolerance was set to 1 ppm, and Swiss-Prot human was used as database.

to identify all proteins present in the sample. As shown above, the slope of the identification rate curve flattens out when approaching the maximal number of possible protein identifications. Hence, accepting the loss of the last 5% of protein identifications results in a significantly reduced analysis time. But even when requiring identification of all proteins, IPS can lead to a clear reduction of analysis time given a high experimental mass accuracy. The main focus of this study was the performance evaluation with offline LC-MALDI MS/MS. However, a simulation showed that also in an online LC-MS/MS setting IPS can increase the number of protein identifications. In conjunction with directed MS/MS, as possible for instance with RePlay,20 where a fraction of the separated peptides are split off and stored in a capillary for a second analysis that takes place while the LC is being conditioned for the next sample, online IPS could yield a significant performance improvement compared to standard data-dependent acquisition. However, it is clear that the method, at least in its current implementation, requires a high specificity (mass accuracy) of the input data to outperform the existing strategies for precursor ion selection. If complex protein mixtures are to be analyzed, the relative errors of the precursor ion masses should fall below 1 ppm, which is not surprising because even at that level false positive assignments of experimentally determined masses to peptide sequences are not rare events. Unfortunately, most if not all of the currently commercially available mass spectrometers do not meet that demand, and to come close requires the use of internal calibrants. Identification of peptides in large databases by LC-MS (not MS/MS), however, is possible when their determined mass is combined with the observed retention time to yield a unique tag (AMRT).21,22 Including observed and predicted retention times in IPS is a promising way to lower the risk of false positive assignments and relax the constraints on the mass error tolerance to a level achievable to many laboratories, for example, 10 or 20 ppm instead of 1 ppm. This extension is a central part of our current studies. Other limitations of IPS concern an increased risk of false positive assignments of predicted tryptic peptides to experimental masses caused by protein sequence homology, which turned out to be a considerable problem if the search was extended from one species to all present in the database or if 3250

Journal of Proteome Research • Vol. 8, No. 7, 2009

Zerck et al. more than two peptides were required for protein identification. High homology results in discrimination of related proteins once one of them has been identified. Degenerate peptides, in contrast, can qualify many different protein sequences as uncertain candidates with the consequence that a large pool of predicted tryptic peptides is matched against the remaining list of precursor ions, raising the risk of false positive score manipulation. Both issues can possibly be addressed by calculating for each peptide the number of proteins that share its sequence (protein count) and converting these numbers into normalized frequencies. Our goal is to combine these with the peptide frequencies (peptide counts) that are already implemented in IPS and used to tune the degree to which scores are raised or lowered. What we expect is an overall improved risk management regarding false positive assignments. Finally, the demand that identification of any protein requires identification of at least two peptides discriminates identification of small proteins and proteins that for other reasons produce few peptides useful for identification upon cleavage with the used protease. Introducing a molecular weight threshold below which only one peptide is required for identification would be a simple means to enhance the identification of small proteins. The loss in confidence could be balanced by requiring a higher peptide identification score below that line. A disadvantage of this approach is the introduction of an inflexible arbitrary threshold. A general concern is the acceptance of protein identifications based on only one peptide. Whether this is acceptable depends on the given experimental conditions as well as the follow-up work based on the identification results. We consider implementation of protein or protein-class specific probability based identification criteria.

Acknowledgment. We thank Klaus-Dieter Kloeppel and Gabriela Thiele for scientific discussions and Beata Lukaszewska-McGreal for technical assistance. Johan Gobom acknowledges financial support from “Stiftelsen för gamla tjänarinnor. This study was funded by the National Genome Research Network (NGFN) of the German Ministry for Education and Research (BMBF) within the project SMP Protein, and the Max Planck Society. References (1) Kohli, B.; Eng, J.; Nitsch, R.; Konietzko, U. An alternative sampling algorithm for use in liquid chromatography/tandem mass spectrometry experiments. Rapid Commun. Mass Spectrom. 2005, 19, 589–596. (2) Hui, J. P. M.; Tessier, S.; Butler, H.; Jonathan, B.; Kearney, P.; Carrier, A.; Thibault, P. Proceedings of the 51st ASMS Conference on Mass Spectrometry and Allied Topics, Montreal, Canada, 2003. (3) Chen, H.-S.; Rejtar, T.; Andreev, V.; Moskovets, E.; Karger, B. L. Enhanced characterization of complex proteomic samples using LC-MALDI MS/MS: Exclusion of redundant peptides from MS/ MS analysis in replicate runs. Anal. Chem. 2005, 77, 7816–7825. (4) Wang, N.; Zheng, J.; Whittal, R.; Li, L. Proceedings of the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006. (5) Wang, N.; Li, L. Exploring the precursor ion exclusion feature of liquid chromatography-electrospray ionization quadrupole timeof-flight mass spectrometry for improving protein identification in shotgun proteome analysis. Anal. Chem. 2008, 80, 4696–4710. (6) Bendall, S.; Hughes, C.; Campbell, J.; Stewart, M.; Pittock, P.; Liu, S.; Bonneil, E.; Thibault, P.; Bhatia, M.; Lajoie, G. An enhanced mass spectrometry approach reveals human embryonic stem cell growth factors in culture. Mol. Cell. Proteomics 2009, 8 (3), 421– 432. (7) Wallace, A.; Ritchie, M.; Jones, C.; Leicester, S.; Langridge, J. ABRF Poster, 2003.

research articles

Iterative Precursor Ion Selection (8) Scherl, A.; Francois, P.; Converset, V.; Bento, M.; Burgess, J. A.; Sanchez, J.-C.; Hochstrasser, D. F.; Schrenzel, J.; Corthals, G. L. Nonredundant mass spectrometry: A strategy to integrate mass spectrometry acquisition and analysis. Proteomics 2004, 4, 917– 927. (9) Rinner, O.; Mueller, L.; Huba´lek, M.; Mu ¨ ller, M.; Gstaiger, M.; Aebersold, R. An integrated mass spectrometric and computational framework for the analysis of protein interaction networks. Nat. Biotechnol. 2007, 25, 345–352. (10) Picotti, P.; Aebersold, R.; Domon, B. The implications of proteolytic background for shotgun proteomics. Mol. Cell. Proteomics 2007, 6, 1589–1598. (11) Schmidt, A.; Gehlenborg, N.; Bodenmiller, B.; Mueller, L.; Campbell, D.; Mueller, M.; Aebersold, R.; Domon, B. An integrated, directed mass spectrometric approach for in-depth characterization of complex peptide mixtures. Mol. Cell. Proteomics 2008, 7, 2138–2150. (12) Bommer, U.; Burkhardt, N.; Ju ¨ nemann, R.; Spahn, C.; TrianaAlonso, F.; Nierhaus, K. In Subcellular Fractionation: A Practical Approach; Graham, J., Rickwoods, D., Eds.; IRL Press: Oxford, 1997; pp271-301. (13) Mirgorodskaya, E.; Braeuer, C.; Fucini, P.; Lehrach, H.; Gobom, J. Nanoflow liquid chromatography coupled to matrix-assisted laser desorption/ionization mass spectrometry: Sample preparation, data analysis, and application to the analysis of complex peptide mixtures. Proteomics 2005, 5, 399–408. (14) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551–3567.

(15) Nesvizhskii, A.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75, 4646–4658. (16) Sturm, M.; Bertsch, A.; Groepl, C.; Hildebrandt, A.; Hussong, R.; Lange, E.; Pfeifer, N.; Trieglaff, O. S.; Zerck, A.; Reinert, K.; Kohlbacher, O. OpenMSsAn open-source software framework for mass spectrometry. BMC Bioinf. 2008, 9, 163. (17) Kohlbacher, O.; Reinert, K.; Grp ¨ l, C.; Lange, E.; Pfeifer, N.; SchulzTrieglaff, O.; Sturm, M. TOPP-the OpenMS proteomics pipeline. Bioinformatics 2007, 23, e191–197. (18) Elias, J.; Gygi, S. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207–214. (19) Nordhoff, E.; Lehrach, H.; Gobom, J. Exploring the limits and losses in MALDI sample preparation of attomole amounts of peptide mixtures. Int. J. Mass Spectrom. 2007, 268, 139–146. (20) Waanders, L.; Almeida, R.; Prosser, S.; Cox, J.; Eikel, D.; Allen, M.; Schultz, G.; Mann, M. A novel chromatographic method allows online reanalysis of the proteome. Mol. Cell. Proteomics 2008, 7, 1452–1459. (21) Smith, R.; Anderson, G.; Lipton, M.; Pasa-Tolic, L.; Shen, Y.; Conrads, T.; Veenstra, T.; Udseth, H. An accurate mass tag strategy for quantitative and high-throughput proteome measurements. Proteomics 2002, 2, 513–523. (22) Norbeck, A.; Monroe, M.; Adkins, J.; Anderson, K.; Daly, D.; Smith, R. The utility of accurate mass and LC elution time information in the analysis of complex proteomes. J. Am. Soc. Mass Spectrom. 2005, 16, 1239–1249.

PR800835X

Journal of Proteome Research • Vol. 8, No. 7, 2009 3251