Prediction of Novel Modifications by Unrestrictive Search of Tandem Mass Spectra Seungjin Na and Eunok Paek* Department of Mechanical and Information Engineering, University of Seoul, Seoul, Korea Received February 10, 2009
Post-translational modifications (PTMs) greatly increase the complexity and diversity of the proteome so that a protein can carry out a wide variety of functions. PTM prediction on proteins is one of the major challenges in proteomics research. Various approaches have been developed for an unrestrictive search of PTMs in proteins using tandem mass spectrometry (MS/MS). However, most tools usually addressed frequent modifications despite the fact that critical biological modifications may be rare. Here, we present MODmap for exploring potentially important rare and unknown modifications from MS/MS spectra. Extended sequence tag-based spectral alignment is proposed. It is highly sensitive to modified regions in an MS/MS spectrum and is tolerant of multiple modifications per peptide. We have developed an unrestrictive algorithm (MODi), which rapidly searches for all known types of PTMs at once, without limiting a multitude of modified sites in a peptide. After MODi produces spectral alignment results using all known types of PTMs, high-quality spectral alignments are subject to MODmap and are re-estimated. New mass offsets are reported via local alignment and MODmap determines novel modification candidates. In analyses of PTM-rich lens proteins, our methodology was demonstrated to be sensitive to rare modifications and suggested several confident novel modification candidates. Keywords: MS/MS • mass spectrometry • post-translational modification • spectral alignment • human lens • proteomics
Introduction Most proteins undergo post-translational modifications (PTMs). They increase the complexity of the proteome and are often key regulators of biological functions, localization, and interactions of proteins inside a cell.1-3 A major challenge in proteomic experiments is identifying post-translationally modified proteins using tandem mass spectrometry (MS/MS).4,5 Many early approaches were developed to identify peptides and proteins using amino acid sequence information from MS/MS6,7 and have been extended to identify modified peptides and proteins.8,9 However, many of them took into account only a few types of PTMs during the analysis, ignoring all the others. Investigators had to guess in advance which PTMs exist in a sample. Most search tools compared an MS/MS spectrum with all possible combinations of PTMs for each peptide from a database, thus, requiring extremely expensive computation. Peptide sequence tag approaches have been suggested for error tolerant database search and narrowed down the search space on PTM identification.10-12 A short sequence tag (2-4 amino acid stretch) can be derived from an MS/MS spectrum and used to screen for peptides in a protein database. With relaxed criteria for a tag match, possible modifications can be inferred from the difference between the precursor ion mass * To whom correspondence should be addressed. Eunok Paek, Dept. of Mechanical and Information Engineering, University of Seoul, 90 Jeonnongdong, Dongdaemun-gu, Seoul, Korea, 130-743. Tel: 82-2-2210-2680. Fax: 822-2210-5575. E-mail:
[email protected].
4418 Journal of Proteome Research 2009, 8, 4418–4427 Published on Web 08/07/2009
of the experimental spectrum and the theoretically calculated peptide mass. Additionally, extended sequence tag approaches13,14 demonstrated that the use of multiple tags from an MS/MS spectrum is effective to identify multiple modifications in a peptide and to identify proteins whose sequences are yet unknown. Recently, blind approaches have been introduced to identify an extensive range of PTMs of proteins. De novo sequence, inferred from an MS/MS spectrum using de novo tools,15,16 was compared with peptides from a protein database.17-19 The differences between fragment ion masses of de novo and database sequences can be used to localize the modification or substitution. Their major limitation is the quality of de novo interpretation. MS/MS spectra are inherently informationdeficient and noisy, because the fragmentation may not occur at every amide bond and may occur at random positions, and thus, de novo sequences often include errors. As other approaches to mining modifications in a sample, MS-Alignment20 suggested a spectral alignment between a spectrum and database sequences using dynamic programming approach, and ModifiComb21 introduced a ∆M histogram between unassigned spectra and unmodified peptides with similar retention time. They predicted only abundant modifications based on the strength in the number of repeated mass differences, and thus could not distinguish rare real modifications from computational artifacts. Although many approaches have been developed to detect modified peptides, a single variable 10.1021/pr9001146 CCC: $40.75
2009 American Chemical Society
research articles
Improved Spectral Alignment for Novel Modification Discovery modification was often assumed per peptide, because their performances depended highly on the number of modified sites in a peptide. We have proposed an improved spectral alignment algorithm13 (MODi) for an unrestrictive identification of multiple modifications. Our algorithm is based on the sequence tag approach, but an extended error-tolerant alignment of multiple sequence tags from an MS/MS spectrum. Sequence tags are reliable subsequence segments of a target peptide. They are much less sensitive to sequencing errors than complete de novo sequences. In a spectral alignment considering modifications, the important thing is narrowing down on local regions where modifications exist. The use of multiple sequence tags can effectively localize modified regions within a spectrum, and additionally overcome de novo sequencing errors. Localizing modifications into small regions is very important in terms of computational complexity, because in a spectral alignment, as the number of modifications considered increases, the computational complexity required for interpretation grows exponentially. This notion is similar to that of traditional sequence comparison between query and database sequences. BLAST22 used a list of “words” (similar to sequence tags) from query sequence, and retrieved sequences in the database using them. FASTA23 first identified best diagonal (local identity) regions between query and database sequences by first identifying short diagonal matches called “ktups” (similar to sequence tags), and provided a fast alignment based on several diagonals. They showed much faster performance than full alignment programs based on dynamic programming, known as SmithWaterman algorithm.24 In this article, we introduce MODmap to explore potentially important rare and unknown modifications from MS/MS spectra as the latter part of a two-step approach: the first step is a MODi analysis with hundreds of all known modifications as input, and the second step makes use of partial interpretation results of MODi and predicts novel modification sites in a peptide. MODi rapidly searches for all known types of PTMs at once, without limiting a multitude of modified sites in a peptide. After MODi interprets initial spectral alignment results using all known types of PTMs, high-quality spectral alignments are subject to MODmap and are re-estimated. New mass offsets are reported via local alignment and MODmap determines novel modification candidates. In analyses of PTM-rich lens proteins, MODi showed its performance competence by identifying modified peptides with various types of PTMs. On the other hand, MODmap suggested various novel modification candidates. Our alignment algorithm is tolerant of multiple modifications per peptide. MODmap could discover novel modifications, even when other PTM sites exist in a peptide, unlike other blind PTM search methods. MODmap is also sensitive to rare modifications. In previous approaches, any forms of profiles (matrix or histogram) were generated against modification masses estimated from the mass difference between a spectrum and a peptide. However, such histograms contained much noise, and at the same time, abundant modifications were predicted based on the strength in the number of repeated mass differences. They cannot tell rare but significant modifications from noise. In contrast, in our approach, after the initial peptide identifications are done for an MS/MS spectrum with known types of modifications using MODi, if re-estimation of the spectrum with an unknown mass shift is much more significant than initial identifications (including null identification), the unknown mass shift is
selected as a novel modification candidate. As a result, our approach is less susceptible to noises, which make the detection of rare real modifications difficult. We expect that our methodology can serve as an effective and robust platform to identify multiple unrestrictive modifications in a peptide.
Experimental Section MS/MS Data Set. We analyzed MS/MS data sets obtained from quadrupole time-of-flight (Q-TOF) mass instrument. A total of 12 800 MS/MS spectra were acquired from lens proteome of a 93-year-old human male with nuclear cataracts. The description and the availability of the data are discussed in previous work.25 This data set was previously used in many modification studies of lens proteins.25-27 In addition, ISB data from linear ion trap Fourier transform (LTQ-FT) was used to test the tool performance.28 PTM Search. MODmap search coupled with MODi (http:// prix.uos.ac.kr/modi) was conducted against 12 800 MS/MS spectra from lens proteome, with search parameters of more than four hundred variable modifications from Unimod (http:// www.unimod.org), (0.5 Da tolerance for peptide and fragment ions, tryptic termini, and up to five missed cleavages. The mass range for peptide modification was specified from -150 to 250 Da. The searched database consisted of 13 crystallin proteins, which are listed in Table 1 in Supporting Information. We used a protein database identified by unmodified peptides. This is based on the assumption that at least one unmodified peptide is present in sample proteins.29 To determine a false discovery rate, an additional search was separately conducted against a decoy database.30 For lens proteins, their reverse and shuffled sequences are used together as a decoy. All predicted modifications from a decoy database search were considered false identifications. We estimated 1% false discovery rate for MODmap. Only one hit was from the decoy database, while the normal database search reported 154 hits. To verify predicted unknown modifications, Mascot search (ver. 2.1) was done with predicted unknown modifications (30 types) against the Swiss-Prot human database (ver. 56.2, 20 327 entries). Because Mascot allows only nine modifications at once, we conducted iterative searches against spectra while changing the set of modifications, and for any spectrum, we selected the candidate with the highest score. Modifications with the same mass are included in the same set so that they compete with each other (they can affect identifications including modifications of the given mass during search). The search parameters are as follows: (0.5 Da tolerance for peptide and fragment ions; tryptic peptides; up to two missed cleavages. Simulation Test. A simulation test was performed to evaluate how sensitive MODmap and MS-Alignment are to a modified region in an MS/MS spectrum. First, for MS/MS data obtained from the ISB standard protein mixture 3 (LTQ-FT), SEQUEST search was done against the 18 standard proteins and 15 contaminants database appended with the reverse sequences of IPI human (ver. 3.49, 74 017 entries). The search parameters were as follows: (0.5 Da mass tolerance for peptides; carbamidomethyl Cys as fixed modification; no enzyme. A total of 3951 doubly charged and fully tryptic peptides matched to one of the standard proteins and contaminants above XCorr 2.2 and ∆Cn 0.1 were adopted for further analysis.31 Second, of selected peptide-spectrum matches (PSMs), about 50% of their corresponding database sequences were mutated by changing one residue of sequences to a random residue. As a result, the Journal of Proteome Research • Vol. 8, No. 10, 2009 4419
research articles peptide sequences of 1969 spectra were mutated from the original database, while the remaining 1982 were kept intact. Then, we checked how many spectra could be identified as the original peptides with one amino acid mutation (as a modification) when each of MODmap and MS-Alignment search is conducted against the mutated database. Final mutated database (SIMdb) consisted of mutated sequences of standard proteins and contaminants, and their reverse and shuffled sequences (99 entries). Model Definition for Modified Peptide Identification. Let A ){xi|1 e i e 20} be the set of amino acids. Each amino acid xi has a molecular mass |xi|. A peptide P ) s1s2...sn, (si ∈ A) is a string over amino acids, with the mass |P| ) ∑1eien|si|. An experimental MS/MS spectrum ES is an ordered set of peaks, {p1, p2, ..., pn}, where each pi has a mass ai and a intensity hi. ES has its precursor mass |ES|, which is equal to the mass of the peptide that generated the spectrum. Theoretical spectrum TS(P) of a peptide P is represented as an ordered list of masses {b1, b2, ..., bn} from all prefixes (N-terminal or b-ions) and suffixes (C-terminal or y-ions) of P. Then, peptide identification problem is defined as follows: Given ES, compute Match(ES,TS(P)) against every P from a protein database satisfying a condition |ES| ) |P|, where Match(ES,TS(P)) represents a similarity score between ES and TS(P). Match(ES,TS(P)) can be defined as a shared peaks count (the number of the common masses in the two spectra), for example. Then P with the best Match(ES,TS(P)) is selected as the precursor peptide of the ES. In reality, how to define Match(ES,TS(P)) is very critical in the peptide identification problem, and many sophisticated approaches have been developed.32,33 Now, we consider a modification ∆ (mass shift) to a peptide P. If a modification ∆ happens to a position si of a peptide P ) s1s2...sn, the peptide becomes a modified peptide P(∆,i) ) s1s2...si∆...sn with the mass of residue si∆ increased by ∆ from |si|. Also, multiple modifications ∆1 and ∆2 to a single peptide P may increase the masses of residues si and sj at once. For a modified peptide, Match(ES,TS(P∆)) is calculated between the spectrum ES and a modified peptide P∆ that may be k modifications away from a peptide P from a protein database. For only one modification ∆ to a peptide P, we know that a modification ∆ is localized to one position, for example, si of P ) s1s2...sn, where ∆ ) |ES| - |P|. Thus, determining the modification site requires the time proportional to the length of a peptide. But, for multiple modifications, its time complexity grows exponentially. In the case of two modifications, for example, there are many cases that ∆ may be divided into ∆1 and ∆2, where ∆ ) ∆1 + ∆2, then we have to search two combinatorial positions si and sj of P for each possible ∆1 and ∆2. Therefore, the parameter k (the number of modified sites) is very critical in conquering the problem of modified peptide identification. Spectral Alignment. Spectral alignment algorithms were developed to find an optimal alignment between an MS/MS spectrum and a peptide, allowing k modifications.34,35 Spectral alignment algorithms have been applied to various biological problems.36-38 Figure 1 illustrates spectral alignment in a twodimensional matrix. The row {am} represents masses in an MS/ MS spectrum and the column {bn} represents masses in a database peptide. In a spectral alignment matrix, jumps are allowed to move from one point (am,bn) to another (am′,bn′). A diagonal jump (where am′ - am ) bn′ - bn) between the two points represents a mass correspondence between a spectrum and a peptide (Figure 1a), and an oblique jump (am′ - am * bn′ 4420
Journal of Proteome Research • Vol. 8, No. 10, 2009
Na and Paek
Figure 1. A spectral alignment example is shown as a twodimensional matrix, in which all matching peaks between two spectra (MS/MS spectrum and theoretical spectrum of a peptide) are represented as an intersection of horizontal and vertical gray lines. Jumps are shown to connect two points (matching peaks). (a) Diagonal jump in the same mass difference between peaks of a spectrum and a peptide. (b and c) Oblique jumps with a modification of +∆ or -∆ to a peptide, respectively. The final spectral alignment comprises a sequence of jumps from the top left corner to the bottom right corner. Its path (dots) can be obtained by backtracking through the matrix.
- bn) introduces a modification (mass shift) into the subsequence of the peptide that corresponds to the segment defined by the two end points (Figure 1b,c). A vertical jump means an insertion into the corresponding subsequence, while a horizontal jump means a deletion from the corresponding subsequence. The spectral alignment can be done using dynamic programming, and the recurrence relation can be defined for the alignment matrix M as follows:
{
M[m, n + 1] + f(m + 1, -) M[m + 1, n + 1] ) max M[m + 1, n] + f(-, n + 1) M[m, n] + f(m + 1, n + 1)
}
where f(i,j) represents a score function defined on a jump. Then, an optimal alignment path is found by backtracking through the matrix M. Improved Spectral Alignment. We have developed an improved spectral alignment algorithm (MODi) for an unrestrictive identification of multiple modifications per peptide.13 It can identify multiple PTMs in a peptide while taking into account all known types of PTMs at once, so that we could have a more comprehensive understanding of modifications in proteins. It should be noted that our spectral alignment algorithm does not restrict the number of oblique jumps, while other algorithms allow only a limited number of them. The workflow is summarized in Figure 2. Local identity regions (diagonals) are identified via the alignment between a peptide and multiple sequence tags from an MS/MS spectrum (Figure 2b). Then, we check whether two diagonals can be joined together and find an optimal subset of these diagonals to form an initial alignment (Figure 2c). The region between two joined diagonals is called a gap (oblique jump in an alignment matrix). Finally, gap alignments are made using a modification table
research articles
Improved Spectral Alignment for Novel Modification Discovery
database, and introduce MODmap for discovery of novel modifications. After MODi interprets spectral alignment results, high-quality spectral alignments are subject to MODmap for their re-estimationsonly the best initial alignment above the defined threshold per spectrum is re-estimated. Then, new mass offsets are reported via local alignment for the gaps of the selected alignments, and MODmap determines novel modification candidates. MODmap software was implemented in Java programming language and is available online (http:// prix.uos.ac.kr/modi) in conjunction with MODi search. Initial Alignment. Given an experimental spectrum ES and a peptide P, an initial alignment IA is constructed as follows. First, a diagonal is identified as a pair 〈AM, Pnn′〉, where Pnn′ ) sn...sn′ (n′ - n > 0) is a subsequence of P and AM ){a0, ..., am} is an ordered list of experimental masses selected from ES, where |Pnn′| ) am - a0 and |sn+i| ) ai+1 - ai (0 e i e n′ - n ) m - 1). Then, a subset of identified diagonals constructs an initial alignment IA ){dia1, ..., diak|diai.am < diai+1.a0 (1 ei < k)}. Gaps are defined as a triple 〈ES(diai.am)(diai+1.a0), P(diai.n′+1)(diai+1.n-1), ∆ ) |ES(diai.am)(diai+1.a0)| - |P(diai.n′ + 1)(diai+1.n-1)|〉 between consecutive diai and diai+1 (1 e i < k), where ESamam′ ){(ai,hi)|am e ai e am′}. In addition, N-terminal gap 〈ESS(dia1.a0), P1(dia1.n-1), ∆ ) |ESS(dia1.a0)| - |P1(dia1.n-1)|〉 can be added if dia1.n * 1 and C-terminal gap 〈ES(diak.am)F, P(diak.n′+1)PL, ∆ ) ES(diak.am)F - |P(diak.n′+1)PL|〉 can be added if diak.n′ * Length Of P (PL), where ESSF ) ES (S and F represent virtual starting and ending peak masses of a spectrum, respectively.) Then, the score for the initial alignment is calculated as follows: Figure 2. An improved spectral alignment algorithm to detect multiple modifications. (a) Sequence tags of a fixed length, e.g., three, are inferred from an MS/MS spectrum using a spectrum graph,16 where a node represents a peak and there is an edge when a pair of nodes differs by a certain amino acid in mass, and candidate peptides are retrieved from a protein database. (b) Between a peptide and all the identified tags, alignment is done according to their mass positions. Once tags are aligned, they are referred to as diagonals (marked as “dia” in the figure). Two dashed lines represent the only range in which alignments are allowed (that is, maximum and minimum mass offsets possible from allowed modifications). In this work, the values of -150 and 250 Da were used. This range can be specified as a parameter, by a user. (c) A subset of diagonals forms an initial alignment. (d) Gap alignments are made using the full list of known modifications from www.unimod.org.
from Unimod (Figure 2d). Our main contribution to a spectral alignment is the use of the initial alignments by multiple diagonals so that our algorithm could be tolerant of multiple modifications without expensive computational burden. This approach is scalable and performs well, even when more than four hundred modification types are considered and the number of potential PTMs in a peptide has no limitation.
Results Overview. Our PTM prediction utilizes a two-step strategy: (1) Peptide identification (modified or unmodified): interpretation of initial alignments using all known types of PTMs; (2) novel modification prediction: reinterpretation of high-quality initial alignments. Once an initial alignment is made, its modification is thoroughly interpreted by MODi by hypothesizing combinations of various types of variable modifications given as an input. Here, we focus on potential modifications unexplored by MODi, that is, those not listed by Unimod
Alignment Score(IA) )
∑
Match(ES, ∪ δ(ak)) ak∈diai
diai∈IA
R
∑
(1 + |gapj.∆| × 0.01)
gapj∈IA
where δ(b) ){b - 18, b - 17, b, b + 1, |ES| - b + 2}, the set of masses of the ions related to the ion with mass b. The first term gives a positive score for diagonals. Match(ES,TS) was defined as the intensity sum of matched ions, ∑(a,h)∈MATh, where MAT ){(ai,hi) ∈ ES|b ∈ TS and |b - ai| e ε}, and ε is the error in mass measurement of mass spectrometer. In this work, 0.5 Da was used for ε and the intensities of peaks were normalized cumulatively.39 The second term gives a negative score for gaps. Gaps are classified into two groups: (1) matched mass (∆ ) 0), (2) mismatched mass (∆ * 0). In case 1 (R ) 0), there is no penalty, while in case 2 (R ) 1, these gaps introduce modifications), 1 is penalized with |gapj · ∆| × 0.01. Gap penalties prevent many diagonals from being continuously added to an alignment so that we could find the simple but optimal alignment. Constants related to each penalty were determined experimentally and can be further optimized. As a result, the best initial alignment with the maximum score is subjected to MODmap. Local Gap Alignment. For the local alignment, a candidate peptide’s subsequence that corresponds to a gap is aligned to an MS/MS spectrum by introducing a single modification of mass ∆, where ∆ can be determined when the two diagonals neighboring the gap are joined, but note that the value of ∆ is assumed to have resulted from a previously unknown modification. Figure 3 shows the process of the gap alignment, which compares all possible alignments of a peptide subsequence and a spectrum by ∆. That is, given a gap 〈ESamam′, Pnn′, ∆〉, our (∆,k) problem is to look for arg max neken′ Match(ESamam′,TS(Pnn′ )), (∆,k) where Pnn′ ) sn...s∆k...sn′ (n e k e n′). To confidently predict a Journal of Proteome Research • Vol. 8, No. 10, 2009 4421
research articles
Na and Paek
Figure 3. Local gap alignment for predicting a novel modification. A subsequence corresponding to a gap is aligned to an MS/MS spectrum using a unknown modification of ∆, whose mass is determined when two diagonals are joined. For a gap 〈ES(314)(728), PTM, ∆ ) 85〉, all possible alignments are shown in (a) ‘P∆TM’, (b) ‘PT∆M’, and (c) ‘PTM∆’. Dots represent the matching peaks by each alignment. ‘PT∆M’ alignment results in two matching peaks, while ‘P∆TM’ and ‘PTM∆’ have only one matching peak. As a result, a modification of ∆ is assigned to Thr (T).
modified site, MODmap takes only the best alignment where the score difference between the top two gap alignments is above a stringent threshold. Depending on modified sites in a gap, local alignment scores were determined by matched b- and y-ions. Also, a score by random match in the gap region was added to the score list. Of them, the highest score was normalized to 1 and delta score was calculated from the second highest score. For modification site assignment, it is important to consider delta score only within a gap region.40 For example, when we try to determine the modified site of a peptide TAGGAPTAG, the distinguishable site-determining ions between two modified peptides, TAGG∆APTAG and TAGGA∆PTAG, are only one b- and one y-ions. Scores of the two peptides may be almost the same and their delta score may be negligible. However, if we restrict the scoring to the subsequence of the gap ‘GAP’, scores of G∆AP and GA∆P can display a more prominent difference and thus their delta score can be used to distinguish modified sites confidently. We regarded the local alignment as significant if its delta score is greater than 1/gap length. The gap length factor was considered in compensation for the delta score as gap length is long. The aforementioned scoring cannot be applied to N-terminal gaps. The scores in the N-term region are often ambiguous because ion peaks normally have very low intensity or are not observed at all near the N-terminal end of a spectrum.41 If modified N-terminal gap does not have significant delta score, but the length of the gap is less than 3, modified site was assumed as N-term. Novel PTM Prediction. The best gap alignment from MODmap is compared with MODi results. For example, let us assume that there was a gap whose subsequence and ∆ mass are ‘GAP’ and +20 Da, respectively, and the gap was aligned as ‘G+20AP’ by MODi. MODi result might also include another alignment ‘G+10AP+10’ by the combination of known modifications (MODi performs all possible gap alignment with all known modifications). If MODmap’s local alignment returns ‘G+20AP’ which was included in MODi results, it would be excluded out of novel modification candidates. However, if the local alignment results in new ‘GAP+20’, where +20 Da on amino acid P 4422
Journal of Proteome Research • Vol. 8, No. 10, 2009
has not been previously reported, and its score is more significant than MODi results, it is reported to a novel modification list. For gaps that MODi could not explain, the MODmap’s local alignment results can be immediately reported to the novel modification list. We limit only one unknown modification per peptide. If an initial alignment has two gaps, one of the two must be explained by known modifications. If all the ∆’s of the gaps are estimated to be unknown, this alignment is rejected in subsequent steps. Finally, a pair of modification ∆ and amino acid a from the best gap alignment is reported in a modification frequency matrix. The matrix entry (∆, a) represents the number of peptide identifications for which ∆ mass on amino acid a is interpreted in gap alignments, and is a candidate for novel modification. A matrix example is shown in Figure 4. As expected, the matrix is fairly sparse. It should be noted that our frequency matrix is different from those of other approaches. In previous approaches, any forms of profiles (matrix or histogram) were generated against modification masses estimated from the mass difference between a spectrum and a peptide. However, such histograms contained much noise, and at the same time, abundant modifications were predicted based on the strength in the number of repeated mass differences. They cannot tell rare but significant modifications from noises. In contrast, in our matrix, after the peptide identification had been completed with all known types of modifications, only highly confident unknown mass shifts on residues were recorded. All the entries come from the excellent PSMs and deserve further examination. MODmap Application to MS/MS Data Sets. To validate our algorithm, PTM search (MODmap in conjunction with MODi) was conducted against MS/MS spectra acquired from lens proteome of a 93-year-old human male with nuclear cataracts.25 MODi analysis results were described previously.13 Here, we present the results of MODmap. The MODmap matrix from lens data is shown in Figure 4. To verify predicted unknown modifications, Mascot search was done with new modifications in the matrix against the Swiss-Prot human database. Unique peptide identifications that MODmap and Mascot agree in top scoring are present in Table 2 in Supporting Information. Of
Improved Spectral Alignment for Novel Modification Discovery
research articles
Figure 4. A MODmap result from lens sample analysis is shown. A pair of modification ∆ and an amino acid a from the best gap alignment was reported. Each entry of the matrix represents the number of the corresponding modifications predicted in a sample, as candidates for novel modifications (some of predicted modifications can be interpreted as multiple known modifications at a single amino acid, but in this work one modification was assumed per amino acid). Mascot search results with predicted unknown modifications are shown in Table 2, Supporting Information. Grayed cells were excluded from the Mascot search because most of them were confirmed to be caused by nontryptic events (explainable ∆’s by the combination of amino acid masses at N-term or R and K (C-terminal residues of tryptic peptides)) or errors in peptide mass measurement (modifications of e3 Da). In this lens sample, predicted modifications with the same mass were considered at once in Mascot search because they can affect each other. Of 30 types of modifications re-evaluated by Mascot, 22 types of modifications (orange cells with asterisk) are confirmed by Mascot results.
30 types of modifications re-evaluated by Mascot, 22 types of modifications were identified by Mascot, while most of unconfirmed 8 modification types were identified with the same mass of modification at other site of the same peptide. MODmap observed the uncharacterized mass of +144 Da on sixth Lys (K) of a peptide, QYLLDKKEYR of βS-crystallin, as previously reported.27 Figure 5 shows MS/MS spectra for this peptide, when it is unmodified and when it has one or two modified sites. Upon comparing unmodified and modified peptide spectra, it can be noted that fragment ions including ‘K+144’ have more neutral-losses (marked by *) than those from the unmodified peptide. Note that MODmap can handle peptides with two modified sites as exemplified in Figure 5c. Even when the peptide is modified at other site (N-terminal Gln), our algorithm localized the mass shift of +144 Da on sixth Lys, showing well its ability to identify multiple modifications. This is a distinguishing feature of our approach, different from other blind PTM search methods by which this modification candidate was reported only as a singly modified peptide. In the right panel of the figure, initial alignments between a peptide and an MS/MS spectrum are shown. The initial alignments show well the effectiveness of the method using multiple sequence tags from an MS/MS spectrum. The modified regions within the spectrum are exactly localized to gaps. This is very important in terms of computational complexity, because a smaller gap can dramatically reduce the search space generated by modification combinations (modifications are allowed only inside the gap regions).
As another novel modification candidates observed by MODmap, various modifications to a single amino acid are shown in Figure 6. Three novel modifications are suggested by MODmap: +30, +73, and +102 Da to Ser (S). Two of them are observed on the third Ser of a peptide APSWFDTGLSEMR of RB-crystallin. The presence of such modifications is confidently supported by their MS/MS spectra in Figure 6. Most of their y-ions were assigned to intense peaks in the spectrum. B2- and b3-ions corresponding to the fragment ‘S + ∆’ were confidently observed as well. Another evidence is the detection of immonium ions corresponding to ‘S + ∆’. They are 90 and 133 m/z for ‘S+30’ and ‘S+73’, respectively, and are not present in unmodified spectra. The presence of immonium ions provides compositional information in sequence assignment and confidence of modifications to specific residues.9,42 In Figure 6b, the immonium ion of ‘S+73’ is highly intense and its related N-terminal ions (a2, b2 and b3) are very strong in the spectrum, while it is not observed in a spectrum of the unmodified peptide (the strongest peak was one of y-ions, Figure 1 in Supporting Information). Additional new modification is shown in Figure 6c from the peptide TVLDS+102GISEVR of RA-crystallin. These modification candidates were also observed in MS-Alignment test. Previously, for this lens sample, unknown modification +55 Da on Arg (R) was reported in a peptide AEFSGECSNLADR+55GFDR.20 We also identified the same modification on multiple sites in multiple peptides, LVVFELENFQGR+55R, AEFSGECSNLADR+55GFDR, WNTWSSSYR+55SDR, of βB1-crystallin. Journal of Proteome Research • Vol. 8, No. 10, 2009 4423
research articles
Na and Paek
Figure 5. Uncharacterized mass of +144 Da on Lys (K) is shown. (a) MS/MS spectrum of an unmodified peptide, QYLLDKKEYR of βS-crystallin. (b) MS/MS spectrum of a modified peptide, QYLLDK+144KEYR. (c) MS/MS spectrum of a multiply modified peptide, Q-17YLLDK+144KEYR. For each peptide, the initial alignment between the peptide and the MS/MS spectrum is shown. AA* indicates immonium ions associated with the amino acid AA, and * indicates neutral-loss (-NH3 or -H2O) from b- and y-ions. In the right panel, initial alignments between a peptide and an MS/MS spectrum are shown. Diagonals aligned with y-ion peaks are shown in red, while diagonals aligned with b-ion peaks are shown in blue (Figure 6).
Besides novel modification candidates, MODmap shows several events related to MS/MS experiments. The first thing is the detection of in-source fragmentation.41 We observed 17 and 18 Da losses at N-term (C-term) in the matrix of Figure 4. A possible explanation is that, after an original parent ion had been fragmented in source, its C-terminal ion with neutral loss was subject to MS/MS. Second, when we consider tryptic peptides (peptides must be digested after R or K residues), nontryptic events were observed. In the matrix, -131 Da at N-term corresponds to known N-terminal Met (M) cleavage.43 A +211 Da at Arg (R) in a peptide ‘VEGGTWAVYER’ was also observed. Its protein (βS-crystallin) sequence ‘K.VEGGTWAVYER.PNF’ revealed that the +211 Da corresponds to the mass of a subsequence ‘PN’ after Arg and it was digested not after Arg but Asn (N). Their annotated spectra are presented in Figure 2 in Supporting Information. We scrutinized whether the mass shifts at N-term or K and R (C-terminal residues of tryptic peptides) are matched with the mass combination of amino acids adjacent to the peptide in protein sequence, and observed many nontryptic events (in the matrix, +99 Da (Val) at K, -113 Da (Ile) at N-term, and so forth). Finally, we often observed mass shifts of +1 Da at any position. It seems due to errors in peptide mass measurement. We manually confirmed that most of them actually correspond to precursor mass errors. The measurement of a peptide mass often becomes difficult for larger peptides because the monoisotopic peak shrinks 4424
Journal of Proteome Research • Vol. 8, No. 10, 2009
relative to the peptide mass.44,45 We regarded small (- 3 to +3) mass shifts as errors in precursor mass. Comparison with Other Tools. For modified peptide identification, the number of modified sites in a peptide is very critical to the software performance. Many search tools often assumed that each peptide contains a single modification site. ModifiComb conducted a fragment pair comparison for only one modification and its algorithm did not address multiple modifications in a peptide at once. On the other hand, MSAlignment conducted a global alignment allowing modifications at any position of a peptide. This makes the spectral alignment computationally complex as multiple modifications are introduced to a peptide. In our test on MS-Alignment, it showed reasonable performance when one modification is considered per peptide. But, for two modifications, it was less competent than MODi and MODmap in both its speed and accuracy. With lens data, for one modification as an input parameter, MS-Alignment took about 800 s on a regular Pentium IV PC, while for two modifications, it took about 11 500 s. Note that our PTM search (MODmap in conjunction with MODi) took about 340 s without the limitation on the number of modifications under the same condition. In addition, MS-Alignment allowing up to two modifications lost many identifications from the one-modification search. As a result, unknown modification candidates such as S+30, S+73, Q+161,
Improved Spectral Alignment for Novel Modification Discovery
research articles
Figure 6. Three unknown modifications, of +30, +73, and +102 Da, to Ser (S) are shown. The N-terminal ions (immonium, a2, b2, and b3) related to the modified residue are observed strongly in its spectrum. (a) MS/MS spectrum of a peptide APS+30WFDTGLSEMR of RB-crystallin. (b) MS/MS spectrum of a peptide APS+73WFDTGLSEMR of RB-crystallin. (c) MS/MS spectrum of a peptide TVLDS+102GISEVR of RA-crystallin. Diagonals aligned with y-ion peaks are shown in red, while diagonals aligned with b-ion peaks are shown in blue.
R+55, and K+144 were not observed from the two-modification search, while they were only reported in the one-modification search. We performed a simulation test to evaluate how sensitive MODmap and MS-Alignment are to a modified region in an MS/MS spectrum. For 3951 highly confident PSMs identified from ISB mixture, database sequences corresponding to 1969 PSMs (about 50%) were mutated by changing one residue of sequences to a random residue (SIMdb, see Experimental Section). Then, it was measured how many spectra were identified as correct peptides with exact mutated sites when searched against SIMdb. MS-Alignment identified 1164 correct peptides with exact mutated sites. In this test, we did not consider ∆ - correct PSM, which means a peptide identification with a misplaced modification.20 MODmap identified 1318 correct peptides with exact mutated sites, correctly identifying 154 more PSMs with mutated sites than MS-Alignment. Note that these figures do not represent the number of identified peptides. MS-Alignment could identify 1802 correct peptides with ∆ - correct, while MODmap basically rejects ∆ - correct to predict reliable modification sites (see local gap alignment). MODmap reported total 1360 modifications to its PTM frequency matrix. Of them, 1318 (97%) came from exactly mutated sites of correct peptides, and 39 (3%) corresponded to ∆ correct, and only 3 came from wrong peptides. Also, MODmap did not report any modifications for 1982 nonmutated PSMs, while MS-Alignment reported 93 of them as peptides with modifications. All the MS-Alignment results analyzed above were obtained by taking PSMs below p-value of 0.05.
Unlike previous algorithms that rely on overall inspection of a spectrum, MODi and MODmap basically use the divide and conquer strategy. In our approach, identifying initial diagonals from multiple sequence tags offers an advantage of reducing the size of a region within which modifications can be localized. We need consider modifications only within the regions that are not covered by diagonals, that is, gaps. Our main contribution to a spectral alignment is the use of multiple diagonals to reduce the size of a gap. It is certain that as many diagonals are joined to form a single alignment, the size of a gap becomes small. Notably, the most prominent difference between our algorithm and others is the use of a parameter k (the number of modifications). Other tools would only be working given a value for k in advance. But ours does not require the parameter k at all, and does not suffer performance degradation from the increase in k, which can be estimated from the initial alignment. Each gap in an initial alignment potentially means one modification, and the parameter k is the same as the number of gaps in an initial alignment. Therefore, when considering k multiple modifications, we require only additive time corresponding to the total size of k gaps than the time required for no modification analysis. Among algorithms for comparison between query and database sequences, FASTA showed high sensitivity and much faster performance than full alignment programs based on dynamic programming. Our algorithm is analogous to FASTA algorithm in that FASTA also identified best diagonal (local identity) regions between query and database sequences, and provided a sequence alignment based on several diagonals. On Journal of Proteome Research • Vol. 8, No. 10, 2009 4425
research articles the other hand, MS-alignment is analogous to Smith-Waterman algorithm in that it is based on full-blown dynamic programming.
Discussion We presented a comprehensive approach to identifying various types PTMs and exploring unknown modifications from MS/MS spectra. PTM identification is important to understanding the biological functions of proteins. It is expected that the proteome-wide extent of modifications can be very large. Recently, unrestrictive search algorithms for PTM identification have been developed. Along with many modification studies in proteomics, it became crucial to assess the accuracy of modification types and sites.26 In recent computational proteomics, a major problem is distinguishing correct peptide identifications from false positives. Many approaches have been developed to estimate the significance of peptide identifications,46-48 but there are few reports with modifications. With modifications, the false positive problem can be significantly exacerbated. Many types of modifications result in a large number of false positives due to the combinatorial increase in the number of possible matches.49 Modified peptides from abundant proteins might replace other peptide identifications, resulting in preventing the low-abundance peptide identifications. On the other hand, a confident method is also necessary to identify rare modifications. As an effort of distinguishing rare real modifications, a method of searching modifications in the genome sequence databases of different species was proposed, and it reported the confidence of a modification at the same site in orthologous genes in different but related species.50 The role of modifications to protein functions is still an open problem. The extent of modifications is very wide, and the types are also diverse. For in-depth modification study, a highly sensitive and tolerant tool to modifications is required to explore potentially important rare and unknown modifications. In this view, we expect that our approach can be successfully applied to modification studies. It must be admitted that, although we suggest the candidates for novel modifications, their mechanism and significance should be revealed and verified via additional biological experiments.
Acknowledgment. This work was supported by 21C Frontier Functional Proteomics Project from Korean Ministry of Education, Science & Technology (FPR08-A1-020), by the Korea Science and Engineering Foundation (KOSEF) through the Center for Cell Signaling & Drug Discovery Research (CCS & DDR, R15-2006-020) at Ewha Womans University, and by the University of Seoul 2008 Research Fund. S. Na was supported by Brain Korea 21 (BK21) Project and Seoul Science Fellowship (SSF). Supporting Information Available: Mascot search result and annotated spectra can be found. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Cantin, G. T.; Yates, J. R. Strategies for shotgun identification of post-translational modifications by mass spectrometry. J. Chromatogr., A. 2004, 1053, 7–14. (2) Mann, M.; Jensen, O. N. Proteomic analysis of post-translational modifications. Nat. Biotechnol. 2003, 21, 255–261. (3) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Extent of modifications in human proteome samples and their effect on dynamic range of analysis in shotgun proteomics. Mol. Cell. Proteomics 2006, 5, 2384–2391.
4426
Journal of Proteome Research • Vol. 8, No. 10, 2009
Na and Paek (4) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. (5) Steen, H.; Mann, M. The ABC’s (and XYZ’s) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 2004, 5, 699–711. (6) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976– 989. (7) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551–3567. (8) Havilio, M.; Wool, A. Large-scale unrestricted identification of posttranslation modifications using tandem mass spectrometry. Anal. Chem. 2007, 79, 1362–1368. (9) Matthiesen, R.; Trelle, M. B.; Hojrup, P.; Bunkenborg, J.; Jensen, O. N. VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins. J. Proteome Res. 2005, 4, 2338–2347. (10) Mann, M.; Wilm, M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 1994, 66, 4390–4399. (11) Tabb, D. L.; Saraf, A.; Yates, J. R. GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 2003, 75, 6415–6421. (12) Tanner, S.; Shu, H.; Frank, A.; Wang, L. C.; Zandi, E.; Mumby, M.; Pevzner, P. A.; Bafna, V. InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 2005, 77, 4626–4639. (13) Na, S.; Jeong, J.; Park, H.; Lee, K.-J.; Paek, E. Unrestrictive identification of multiple post-translational modifications from tandem mass spectrometry using an error-tolerant algorithm based on an extended sequence tag approach. Mol. Cell. Proteomics 2008, 7, 2452–2463. (14) Sunyaev, S.; Liska, A. J.; Golod, A.; Shevchenko, A.; Shevchenko, A. MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem. 2003, 75, 1307–1315. (15) Ma, B.; Zhang, K.; Hendrie, C.; Liang, C.; Li, M.; Doherty-Kirby, A.; Lajoie, G. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 2003, 17, 2337–2342. (16) Chen, T.; Kao, M.; Tepel, M.; Rush, J.; Church, G. M. Dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 2001, 8, 325–337. (17) Taylor, J. A.; Johnson, R. S. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 1997, 11, 1067–1075. (18) Han, Y.; Ma, B.; Zhang, K. SPIDER: software for protein identification from sequence tags with de novo sequencing error. J. Bioinform. Comput. Biol. 2005, 3, 697–716. (19) Searle, B. C.; Dasari, S.; Turner, M.; Reddy, A. P.; Choi, D.; Wilmarth, P. A.; McCormack, A. L.; David, L. L.; Nagalla, S. R. Highthroughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/ MS de novo sequencing results. Anal. Chem. 2004, 76, 2220–2230. (20) Tsur, D.; Tanner, S.; Zandi, E.; Bafna, V.; Pevzner, P. A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 2005, 23, 1562–1567. (21) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. ModifiComb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol. Cell. Proteomics 2006, 5, 935–948. (22) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. (23) Pearson, W. R.; Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 2444– 2448. (24) Smith, T. F.; Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197. (25) Searle, B. C.; Dasari, S.; Wilmarth, P. A.; Turner, M.; Reddy, A. P.; David, L. L.; Nagalla, S. R. Identification of protein modifications using MS/MS de novo sequencing and the OpenSea alignment algorithm. J. Proteome Res. 2005, 4, 546–554. (26) Tanner, S.; Payne, S. H.; Dasari, S.; Shen, Z.; Wilmarth, P. A.; David, L. L.; Loomis, W. F.; Briggs, S. P.; Bafna, V. Accurate annotation of peptide modifications through unrestrictive database search. J. Proteome Res. 2008, 7, 170–181. (27) Wilmarth, P. A.; Tanner, S.; Dasari, S.; Nagalla, S. R.; Riviere, M. A.; Bafna, V.; Pevzner, P. A.; David, L. L. Age-related changes in human
research articles
Improved Spectral Alignment for Novel Modification Discovery
(28)
(29) (30) (31) (32)
(33) (34) (35) (36) (37) (38) (39) (40)
crystallins determined from comparative analysis of post-translational modifications in young and aged lens: does deamidation contribute to crystalline insolubility. J. Proteome Res. 2006, 5, 2554– 2566. Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H.; Schmidt, A.; Ossola, R.; Eng, J. K.; Aebersold, R.; Martin, D. B. The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 2008, 7, 96–103. Craig, R.; Beavis, R. C. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 2003, 17, 2310–2316. Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207–214. Washburn, M. P.; Wolters, D.; Yates, J. R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001, 19, 242–247. Elias, J. E.; Gibbons, F. D.; King, O. D.; Roth, F. P.; Gygi, S. P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 2004, 22, 214– 219. Wan, Y.; Yang, A.; Chen, T. PepHMM: a hidden markov model based scoring function for mass spectrometry database search. Anal. Chem. 2006, 78, 432–437. Pevzner, P. A.; Dancik, V.; Tang, C. L. Mutation-tolerant protein identification by mass-spectrometry. J. Comput. Biol. 2000, 7, 777– 787. Pevzner, P. A.; Mulyukov, Z.; Dancik, V.; Tang, C. L. Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res. 2001, 11, 290–299. Bandeira, N.; Tsur, D.; Frank, A.; Pevzner, P. A. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 6140–6145. Frank, A. M.; Pesavento, J. J.; Mizzen, C. A.; Kelleher, N. L.; Pevzner, P. A. Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 2008, 80, 2499–2505. Ng, J.; Pevzner, P. A. Algorithm for identification of fusion proteins via mass spectrometry. J. Proteome Res. 2008, 7, 89–95. Na, S.; Paek, E. Quality assessment of tandem mass spectra based on cumulative intensity normalization. J. Proteome Res. 2006, 5, 3241–3248. Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gygi, S. P. A probability-based approach for high-throughput protein phos-
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
phorylation analysis and site localization. Nat. Biotechnol. 2006, 24, 1285–1292. Mouls, L.; Aubagnac, J.-L.; Martinez, J.; Enjalbal, C. Low energy peptide fragmentations in an ESI-Q-Tof type mass spectrometer. J. Proteome Res. 2007, 6, 1378–1391. Hohmann, L. J.; Eng, J. K.; Gemmill, A.; Klimek, J.; Vitek, O.; Reid, G. E.; Martin, D. B. Quantification of the compositional information provided by immonium ions on a quadrupole-time-of-flight mass spectrometer. Anal. Chem. 2008, 80, 5596–5606. Gupta, N.; Tanner, S.; Jaitly, N.; Adkins, J. N.; Lipton, M.; Edwards, R.; Romine, M.; Osterman, A.; Bafna, V.; Smith, R. D.; Pevzner, P. A. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res. 2007, 17, 1362–1377. Park, K.; Yoon, J.; Lee, S.; Paek, E.; Park, H.; Jung, H.-J.; Lee, S.-W. Isotopic peak intensity ratio based algorithm for fast and accurate determination of isotopic clusters and monoisotopic masses of polypeptides from high resolution mass spectrometric data. Anal. Chem. 2008, 80, 7294–7303. Venable, J. D.; Xu, T.; Cociorva, D.; Yates, J. R. Cross-correlation algorithm for calculation of peptide molecular weight from tandem mass spectra. Anal. Chem. 2006, 78, 1921–1929. Choi, H.; Ghosh, D.; Nesvizhskii, A. I. Statistical validation of peptide identifications in large-scale proteomics using the targetdecoy database search strategy and flexible mixture modeling. J. Proteome Res. 2008, 7, 286–292. Kall, L.; Canterbury, J.; Weston, J.; Noble, W. S.; MacCoss, M. J. A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets. Nat. Methods 2007, 4, 923–925. Kim, S.; Gupta, N.; Pevzner, P. A. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 2008, 7, 3354–3363. Ong, S.; Mittler, G.; Mann, M. Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 2004, 1, 1–8. Gupta, N.; Benhamida, J.; Bhargava, V.; Goodman, D.; Kain, E.; Kerman, I.; Nguyen, N.; Ollikainen, N.; Rodriguez, J.; Wang, J.; Lipton, M. S.; Romine, M.; Bafna, V.; Smith, R. D.; Pevzner, P. A. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res. 2008, 18, 1133–1142.
PR9001146
Journal of Proteome Research • Vol. 8, No. 10, 2009 4427