Preparative Peptide Isoelectric Focusing as a Tool for Improving the Identification of Lysine-Acetylated Peptides from Complex Mixtures Hongwei Xie, Sricharan Bandhakavi, Mikel R. Roe, and Timothy J. Griffin* Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455 Received December 21, 2006
Abstract: Protein sequence database searching of tandem mass spectrometry data is commonly employed to identify post-translational modifications (PTMs) to peptides in global proteomic studies. In these studies, the accurate identification of these modified peptides relies on strategies to ensure high-confidence results from sequence database searching in which differential mass shift parameters are employed to identify PTMs to specific amino acids. Using lysine acetylation as an example PTM, we have observed that the inclusion of differential modification information in sequence database searching dramatically increases the potential for false-positive sequence matches to modified peptides, making the confident identification of true sequence matches difficult. In a proof-of-principle study of whole cell yeast lysates, we demonstrate the combination of preparative isoelectric focusing using free-flow electrophoresis, and an adjusted peptide isoelectric point prediction algorithm, as an effective means to increase the confidence of lysineacetylated peptide identification. These results demonstrate the potential utility of this general strategy for improving the identification of PTMs which cause a shift to the intrinsic isoelectric point of peptides. Keywords: Post-Translational Modification • Acetylation • Peptide Isoelectric Focusing • Free-Flow Electrophoresis • Tandem Mass Spectrometry • False-Positive Sequence Match • Proteomic
Introduction False-positive peptide sequence matches in shotgun proteomics, resulting from searching large-scale tandem mass spectrometry (MS/MS) data against protein sequence databases, are a challenge in high-throughput global protein profiling studies.1,2 The use of physiochemical properties (e.g., accurate mass, reversed-phase microcapillary liquid chromatography (µLC) elution time, peptide isoelectric point (pI)) has been shown to provide more accurate results and increase the confidence of peptide/protein identifications. Several groups3-5 have demonstrated that highly accurate mass information of peptides can provide more confident sequence database search * To whom correspondence should be addressed at University of Minnesota, 6-155 Jackson Hall, 321 Church St. SE, Minneapolis, MN 55455. Tel, 612-624-5249; fax, 612-624-0432; e-mail, tgriffin@umn.edu. 10.1021/pr060691j CCC: $37.00
2007 American Chemical Society
results. Smith et al.6-8 have reported using µLC reversed-phase retention time as a constraint in peptide sequence matches, demonstrating the ability to partially predict the elution times of peptides from reversed-phase columns, and the use of this information in the peptide sequence identification process. The more recently reported accurate mass and time (AMT) tag approach utilizes both µLC retention time and accurate mass to identify peptides and has been successfully applied to global profiling of the human plasma proteome, eliminating the need for MS/MS analysis.9,10 Recently, our group and others have applied peptide pI information, introduced through peptide separations using immobilized pH gradient (IPG) gels,11-14 freeflow electrophoresis (FFE),15-17 or other devices18-20 to assist in the determination of peptide sequence matches. These studies have demonstrated the utility of peptide pI information in reducing false-positive matches and significantly increasing the confidence of peptide identifications. Despite these studies, the challenge of false-positive matches in the identification of post-translationally modified (PTM) peptides has not been explored in depth using these “valueadded” peptide separation approaches. Although accurate mass and reversed-phase µLC retention time information of peptides has been used to map PTMs in complex protein mixtures,21 peptide pI information introduced by preparative IEF separations has not been reported. The identification of PTMs usually relies on MS/MS spectra and sequence database searching employing differential mass shift parameters to identify modified peptide sequences. Modified amino acid residues are detected via the mass increment or deficit as a result of covalent addition or removal of chemical moieties, and the obtained MS/MS spectra of peptides are searched against protein sequence databases with the mass shift of the modified amino acid residues as an additional parameter in the sequence matching process.22,23 The presence of unique ions (such as modification-specific immonium ions) from a peptide fragment can be used as a diagnostic marker for the presence of the specific modification,22,24 which can be confirmed in some cases by MS/MS/MS.25 These studies often rely on the identification of a single peptide sequence to identify the modified amino acids, putting heavy reliance on the ability to determine “true” MS/MS hits from false positives. Because many PTMs also add or eliminate charged functional groups on amino acid residues (for example, acetylation on lysine residues), the intrinsic pI of modified peptides should change, enabling high-resolution isoelectric focusing (IEF) based methods to separate them from their unmodified counterparts. Thus, utilizing the expected shift Journal of Proteome Research 2007, 6, 2019-2026
2019
Published on Web 03/31/2007
technical notes
IEF as a Tool for Identification of Lysine-Acetylated Peptides
in peptide pI may provide a promising strategy for improving the confidence of sequence matches to peptides carrying these PTMs. In this study, we have investigated the potential for falsepositive matches when using differential mass shift parameters in sequence database searching of modified peptides and strategies using preparative IEF to improve the confidence in identifying true matches to modified peptides. We chose lysine acetylation as a representative PTM to study which, despite increasing evidence of the prominence and importance of this modification in cell signaling and control,26 has seen limited attention using large-scale mass spectrometry-based proteomic strategies. A peptide mixture from a yeast whole cell lysate was first fractionated using preparative IEF by FFE prior to µLCMS/MS analysis; the obtained MS/MS spectra were searched against a yeast protein sequence database for acetylated peptides by employing differential mass shift searching parameters. The peptide pI prediction algorithm was adjusted to account for the elimination of charge on lysine residues by acetylation, and these shifted pI values were used to improve the confidence of identifying modified peptides. These results demonstrate the potential of preparative IEF coupled with peptide pI prediction as a general tool in proteomic studies of protein PTMs which cause a shift to the intrinsic peptide pI.
Experimental Section Yeast Protein and Peptide Preparation. One milligram of total protein from the budding yeast Saccharomyces cerevisiae was isolated by boiling cells in 1× SDS sample buffer and vortexing in the presence of glass beads. Total protein was precipitated with 4 vol of acetone and resuspended in 50 mM Tris-HCl, 100 mM NaCl, and 1% SDS. Dissolved samples were diluted to 50 mM Tris-HCl, pH 7.5, 100 mM NaCl, and 0.1% SDS, supplemented with 5 mM TCEP and trypsinized overnight (Promega, Madison, WI) at 37 °C. The resulting peptides were first concentrated and desalted using a mixed-mode MCX cartridge (Waters, Milford, MA) and dried using vacuum centrifugation. Then, the peptides were separated into 96 fractions using an FFE system from BD Biosciences (San Jose, CA) based on preparative IEF of the peptides and collected into 96 microtitier plate wells as described elsewhere.15,16 Immediately after FFE fractionation, the pH of each FFE fraction was measured using a microelectrode (Accument Combination Micro Electrode, Fisher Scientific). A 50-µL aliquot (∼10%) was taken from each of the FFE fractions and processed using an Amicon Ultrafree-MC centrifugal filter device (5 K MW cutoff, Millipore Corporation, Bedford, MA) to remove high MW HPMC polymer as described previously,16 and the purified peptides were reconstituted with 30 µL HPLC load buffer (0.1% formic acid in solution of 2% acetonitrile and 98% water) for µLC-MS/MS analysis. µLC-MS/MS Analysis. All on-line µLC separations were done on an automated Paradigm MS4 system (Michrom Bioresources, Inc., Auburn, CA). Each processed FFE fraction was automatically loaded across a Paradigm Platinum Peptide Nanotrap (Michrom Bioresources, Inc.) precolumn (0.15 × 50 mm, 400-µL volume) for sample concentrating and desalting at a flow rate of 50 µL/min in HPLC buffer A (0.1% formic acid in solution of 5% acetonitrile and 95% water). The in-line analytical capillary column (75 µm × 12 cm) was home-packed using C18 resin (5 µm, 200 Å Magic C18AG, Microm Bioresource, Auburn, CA) as described previously,27,28 with the exception that the electrospray tip was made with a hand-held 2020
Journal of Proteome Research • Vol. 6, No. 5, 2007
torch. Peptides were eluted using a linear gradient of 10-35% HPLC buffer B (0.1% formic acid in solution of 95% acetonitrile and 5% water) over 60 min, followed by isocratic elution at 80% buffer B for 5 min with a flow rate of 0.25 µL/min across the capillary column. Ionized peptides eluting from the capillary column were selected for CID using a normalized collision energy setting of 35% and a data-dependent procedure that alternated between one MS scan followed by four MS/MS scans for the four most abundant precursor ions in the MS survey scan. The m/z values selected for MS/MS were dynamically excluded for 30 s. The electrospray voltage applied was 2.0 kV. Both MS and MS/MS spectra were acquired using a single microscan with a maximum fill-time of 50 and 100 ms for MS and MS/MS analysis, respectively. For MS scans, the m/z scan range was set from 400 to 1800 Da. Sequence Database Searching and Data Analysis. The obtained MS/MS spectra were searched using SEQUEST29 (Bioworks version 3.2, Thermo Finnigan, San Jose, CA) against a yeast sequence database containing all 6139 open reading frames, with a reversed-sequence version of the same database appended to the end of the forward version for the purpose of false-positive rate estimation.30 Search parameters included differential amino acid mass shifts for oxidized methionine (+16 Da), and acetylated lysine (+42 Da) in order to identify potential acetyl groups present at lysine residues. Precursor peptide mass tolerance was (1.5 Da with no tryptic specificity. To each matched peptide sequence, a predicted pI was automatically assigned using the script described below. The search results were validated using the publicly available peptide validation program PeptideProphet (http:// tools.proteomecenter.org/PeptideProphet.php),31 which assigns a comprehensive probability (P) score from 0 to 1 to each peptide sequence match based on its SEQUEST scores (Xcorr, dCn, Sp, RSp) and additional information, including mass difference between the precursor ion and the assigned peptide, and the number of tryptic termini. This version of PeptideProphet did not penalize matches to sequences with internal missed cleavage sites (such as acetylated lysine containing peptides). The peptide sequence match results were organized and interpreted using the software tool Interact,32 allowing up to two missed trypsin cleavage sites for peptides. Peptide pI Calculation. The pI of peptide sequences was calculated according to Shimura algorithm33 using an automated script developed in-house, and peptide pI values were automatically inputted into the Interact results file. Acetylated and unmodified lysine residues were treated differently in the pI calculation to correctly represent their charge state with and without acetylation. A pKa of 10.2 was used to calculate the charge contribution from unmodified lysine residues, while acetylated lysine residues were considered to be uncharged due to the covalent addition of an acetyl group to their amine.
Results and Discussion We applied a recently developed proteomic strategy15,16,34 to investigate lysine acetylation in yeast using FFE to first fractionate the peptide mixture, followed by µLC-MS/MS analysis. In a previous study17 using the same strategy, we demonstrated that the combination of searching with no enzyme specification, and filtering of sequence matches based on tryptic termini, peptide pI information, and Peptide Prophet31 P-score thresholds, provided maximum accuracy for peptide sequence matches
technical notes
Xie et al.
in sequence database searching has the same effect as increasing the number of possible protein sequences within the database, which has also been demonstrated to increase the false-positive potential.11
Figure 1. False-positive rates for peptide sequence matches for unmodified peptides and acetylated peptides both before and after filtering using predicted peptide pI values (pI@), adjusted for lysine acetylation, as described in the text.
in high-throughput proteomic analyses. Therefore, these same data processing procedures were employed in this study, leading to the identification of 4548 unique peptides and 1270 proteins at an estimated false-positive rate of 1%, if no acetylated peptides were considered. Of the 1270 identified proteins, 25% (323) of the proteins have Codon Bias Index (CBI) less than 0.1, and 52% (663) proteins have a CBI less than 0.2, demonstrating the capability of the approach to detect relatively low-abundance proteins (CBI e 0.2)30 from within complex whole cell lysates. Acetylated Peptide Sequences Matches Show Higher FalsePositive Rates. Potential sites of lysine acetylation were investigated by including differential mass shift search parameters (addition of 42 Da to lysine), enabling the identification of both unmodified and acetylated lysine. To filter this data more stringently, only fully tryptic acetylated peptide matches were considered to be correct, as these peptides show a lower falsepositive rate compared to nontryptic and partially tryptic peptides when using no enzyme specificity in the database search.17 Inspection of the results from the database search showed that the estimated false-positive rate was significantly higher when considering acetylated sequence matches compared to the rate for unmodified sequence matches (Figure 1). Even for fully tryptic sequence matches at P-score of 0.9, the false-positive rate for matches to acetylated peptides is still 20.6% (by comparison, the false-positive rate for unmodified peptides is 0.5% at this same P-score threshold). The falsepositive rate does dramatically decrease as the P-score approaches 1.0. Higher false-positive rates have also been reported in the identification of PTMs to peptides in other largescale MS/MS-based studies.35 The high false-positive rate for the lysine-acetylated peptide sequence matches may be partially due to the fact that we are only considering peptides with internal missed trypsin cleavage sites for these peptides, and this class of peptides may inherently have a higher false-positive rate. An additional reason for this increased false-positive rate is that the inclusion of a differential mass searching parameter for a common amino acid residue (such as lysine) effectively increases the number of possible peptide sequences with a mass close to the precursor m/z which are considered as potential matches to each MS/MS spectrum. This inherent consequence to the use of differential mass shift parameters
The high false-positive rate observed for acetylated peptides brings out a fundamental question: How can one identify which of the many possible matches are in fact real? To help answer this question, better strategies for filtering the data are needed. For this particular study, we applied FFE as a firstdimension separation, which fractionates the peptides based on peptide pI. The pI of each detected peptide sequence should be equal to the pH of the FFE fraction that contains the peptide. Our previous studies15,16 have shown that the pI resolution of the FFE system is about 0.5 pH units, and the filtering of sequence matches using a combination of peptide pI with FFE fraction pH effectively reduces the false-positive matches and significantly increases the confidence of identified peptides in complex mixtures. Therefore, we next investigated the potential of peptide pI information to validate the acetylated peptide sequence matches with increased confidence. Adjusted Peptide pI Calculation Provides Accurate pI Prediction for Acetylated Peptides. As a first step toward using pI information to decrease the false-positive rate for acetylated peptides, accurate prediction of the pI of these modified peptides is necessary. Although the predicted peptide pI values for unmodified peptides using the Shimura algorithm33 were demonstrated to be accurate for peptides fractionated by FFE,16 adjustment is required for the pI calculation of acetylated peptides because the addition of an acetyl group eliminates the basic amine group normally present on lysine. To account for this modification, acetylated lysines were treated as uncharged, while unmodified lysines were assigned a pKa of 10.2.33 The Shimura algorithm treats each charged amino acid in the sequence independently and does not factor in the affects of adjacent amino acids or uncharged amino acids. Therefore, the acetylated lysine groups were treated similarly to other uncharged amino acids and did not contribute to the pI prediction. This adjustment shifted the calculated pI of acetylated peptides to a lower value, compared to their unmodified counterparts. Figure 2A shows the difference (∆pI-pH) between the measured FFE fraction pH and the predicted peptide pI plotted against the FFE fraction number from which each acetylated peptide sequence match was derived, using a P-score of 0.75 or above. pI values were calculated using both the “normal” pI calculation (pI) and the “adjusted” pI calculation (pI@) accounting for differential acetylation at lysine. In the normal pI calculation, pKa 10.2 was used for all lysine residues no matter if they were acetylated or not. Examination of Figure 2A clearly shows that when calculating peptide pI values without accounting for acetylation, a large majority of the sequences (∼85%) have calculated pI values which are significantly greater than the measured pH of the FFE fraction from which they were derived. This is an expected result when not adjusting for the charge neutralizing effect of acetylation on basic lysine side chains. Meanwhile, when using the adjusted pI prediction algorithm, the difference between FFE fraction pH and the predicted pI@ for most (75%) of the acetylated peptide sequences is within (0.5 units, demonstrating the effectiveness of these adjustments in calculating the pI of modified peptides. The clustering of the majority of the peptide sequences around the ∆pI-pH value of zero in Figure 2A Journal of Proteome Research • Vol. 6, No. 5, 2007 2021
IEF as a Tool for Identification of Lysine-Acetylated Peptides
technical notes
Figure 2. Difference between the measured FFE fraction pH and the predicted pI for acetylated peptides (∆pI-pH) plotted against the FFE fraction number. (A) Acetylated peptide sequence matches to the forward sequences in the database, with P-score g 0.75; (B) acetylated peptide sequence matches to reversed sequences in the chimeric database (i.e., false-positive matches), with P-score g 0.75.
demonstrates the accuracy of our adjusted pI prediction algorithm, as the “true” matches to acetylated peptides are expected to show a predicted pI which “approximates” the pH of the corresponding FFE fraction if the pI calculation accurately accounts for the charge negating effect of this modification on lysines. The remaining 25% of acetylated peptide sequence matches showed a difference between fraction pH and predicted pI@ larger than (0.5 units, even when using the 2022
Journal of Proteome Research • Vol. 6, No. 5, 2007
adjusted pI prediction; a majority of these peptide matches are most likely false-positive matches. Figure 2B shows the difference between measured pH and predicted pI, plotted against the FFE fraction number, for MS/ MS spectra that matched to a sequence in the reversed version of the sequence database and were assigned a P-score of 0.75 or greater. Since these MS/MS spectra matched to “nonsense” sequences from the reversed database, these are most likely
YAL038W YBR043C YBR118W YBR181C YCR012W
YDR050C YDR146C YDR324C YEL034W YER043C
1 2 3 4 5
6 7 8 9 10
Ribosomal Protein of the Small subunit (1) Ribosomal Protein of the Small subunit (1) ATP dependent RNA helicase (putative) (2) pyruvate decarboxylase (17)
21 21 22 23
3|2 0|3 14|2 0|3 0|1 30|1
28 29 30 31 32 33
K.K@PQVTVGAQNAYLK.A K.KISM*ADNLLSTINK@SEINK.G K.VTK@LDNDLLLR.T K.K@LEDLSPSTHNM*EVPVVK.R K.K@LNLILDDGGDLTTLVHEK.H
K.K@GDTYVSIQGFK.A R.ENSLRK@LQTNLEEQVK.K R.VETGVIK@PGMVVTFAPAGVTTEVK.S K.K@GEQELEGLTDTTVPKR.L K.TVTDK@EGIPAGWQGLDNGPESR.K
acetylated peptide sequencec
MENDK@GQLVELYVPR.K MENDK@GQLVELYVPRK.C R.K@GENMLK@HK.K K.GYK@PVAVPAR.T
K.NILM*GK@IILPSR.S
2.2 2.6 2.9 2.7 3.4
2.1 2.7 3.6 2.5 3.7
3.9
2.7
3.3
2.5 2.8
2.4
1.00 0.99 0.77 0.99
0.769 0.066 0.789 0.846 0.277 0.62
K.K@ASGEIVSINQINEAHPTK.V R.VM*QQLEAELEELK@K.K R.QGK@LEVPGYVDIVK.T K.EK@DIVGAVLK.A K.NK@K@VSDSLYK.L R.IVK@EEIFGPVVTVAK.F
1.00 0.97 0.90 0.96 0.80 1.00
2.8 4.0 2.7 2.2 2.0 3.8
0.78 R.K@EGGLGPINIPLLADTNHSLSR.D 1.00 2.8 0.039 K.GK@NTVSNKWNETLNTELQYYDEDEDLR.R 1.00 1.9 0.016 R.FEK@ISNIMK@NFK.Q 0.99 2.9
3.5
4.5 3.5 2.2 2.2
0.97 2.2
0.97
1.00
0.90 0.99
0.99
0.82 3.4 0.82 2.8
0.99
9.9
6.2
6.1 6.2
4.5
8.5 8.6
4.8
6.2 4.7
9.7 8.6 6.0 5.7 4.6
8.5 6.3 6.2 4.9 4.3
0.3 0.1 0.2 0.2 0.3 0.4
0.3 0.5 0.3
0.3
0.5 0.3 0.2 0.4
7.0 4.5 6.1 6.1 9.5 6.2
7.0 4.2 9.7
6.2
4.7 6.2 9.7 9.9
8.5 6.1 4.6 4.6 4.4
6.2 4.6 4.7 4.6 4.3
cytoplasm membrane cytoplasm/ribosome cytoplasm/nucleolar cytoplasm/ mitochondrion cytoplasm cytoplasm/nucleus nucleolus cytoplasm/ribosome cytoplasm
subcellular localization
9.6
4.5
4.3 4.7
5.6 4.1 4.3 4.3 4.8 4.5
5.5 3.9 5.1
4.7
4.1 4.7 7.0 8.7
glycolysis transcription translation translation amino acid metabolism unknown amino acid metabolism carbohydrate metabolism unknown DNA repair
glycolysis transport translation translation glycolysis
biological process
carbohydrate metabolism 4.4 cytoplasm/ribosome translation 4.7 cytoplasm/ glycolysis mitochondrion 4.9 cytoplasm/ glycolysis mitochondrion 9.1 nucleus chromosomal condensation 9.7 nucleus stress response/ transcription 4.2 cytoplasm/ribosome translation 4.5 cytoplasm/ribosome translation 7.4 nucleolus ribosome assembly 8.6 cytoplasm/nucleus carbohydrate metabolism 4.6 cytoplasm/nucleus carbohydrate metabolism 5.7 cytoplasm redox 4.3 nucleus/cytoplasm protein degradation 4.9 cytoplasm/nucleus transport/ transcription 6.0 cytoplasm/ribosome translation 4.4 cytoplasm signaling 4.5 cytoplasm/ribosome translation 4.4 cytoplasm glycolysis 4.3 cytoplasm protein synthesis 4.6 cytoplasm/ glycolysis/acetate mitochondrion biosynthesis
4.2 4.3 cytoplasm/nucleus
7.0 7.2 cytoplasm 6.2 6.1 nuclear
4.3 4.5 cytoplasm
4.5 4.5 mitochondrion 4.1 4.2 cytoplasm
8.6 6.1 4.2 4.7 4.2
5.9 4.8 4.5 4.4 3.9
0.3 11.3 10.1
0.4
0.4
0.3 0.3
0.3
0.2 0.3
0.2
0.2 0.3
0.2 0.3 0.2 0.2 0.3
0.4 0.2 0.3 0.3 0.3
Xcorr ∆Cn pId pl@d pHd
0.86 3.7 0.77 2.7
0.77 0.95 0.88 0.81 0.99
1.00 0.88 1.00 0.99 1.00
P
0.765 K.LYEVK@GMRWAGNANELNAAYAADGYAR.I 0.99
0.689 0.689 0.228 0.914
0.15
0.009 K.NLRHTLK@LLQLNYISYLK.K
0.926 K.K@VVITAPSSTAPM*FVM*GVNEEK.Y
0.781 K.TTK@EDTVSWFK.Q 0.926 K.LNK@ETTYDEIKK.V
0.674 R.TK@YDITIDEESPRPGQQTFEK.M
0.063 K.NLALYHLIKFATK@VSLDDLILQK.I 0.147 R.AKQK@EQVQQVVMEGK.T
0.534 R.TK@YDVAVDEQSPRPGQQAFEK.M
0.004 K.SLSNNTLK@SETTQELLQTVGFVR.R -0.16 K.LNGQK@PVDEFLEAK.E
0.82 0.092 0.095 0.783 0.662
0.882 0.07 0.881 0.854 0.843
CBI
a Number of other detected peptides without acetylation. b Numbers signify the number of MS/MS spectra collected for unmodified (SMu)/acetylated (SM@) forms of the peptide sequence. c K@, acetylated lysine residue; M*, ox idized methionine. d pI, pI@, “normal” and “adjusted” predicted peptide pI, respectively, as described in the text; pH, FFE fraction pH containing the peptide.
Ribosomal Protein of the Large subunit (5) formin, involved in spindle orientation Ribosomal Protein of the Small subunit (6) alcohol dehydrogenase (10) glutamine-tRNA ligase (7) aldehyde dehydrogenase (9)
27|5 0|1 0|1
YMR242C YNL271C YNL302C YOL086C YOR168W YPL061W
0|5
25 YML028W thioredoxin peroxidase 1 (9) 26 YML088W F-box protein (1) 27 YML121W small GTPase (putative)
0|32 0|7 0|1 2|6
0|4
0|2
50|10
12|1 21|12
8|2
0|5 0|1
12|1
0|1 7|1
21|5 0|1 0|1 69|2 9|1
43|24 0|2 76|2 11|1 25|4
SMu|SM@b
24 YLR134W pyruvate decarboxylase (1)
YJL136C YJL136C YLL008W YLR044C
transcriptional repressor
triosephosphate isomerase (11) transcriptional activator U3 snoRNP protein (4) translation initiation factor eIF-5A (6) S-adenosyl-L-homocysteine hydrolase (putative) (10) Hypothetical protein 5-methyltetrahydropteroyl triglutamate homocysteine methyltransferase (21) hexokinase I (PI) (also called hexokinase A) (15) Mtf1 Two Hybrid Clone 2 ATPase, shows similarity to the Snf2p family of ATPases (1, 2, 3) hexokinase II (PII) (also called hexokinase B) (14) Ribosomal Protein of the Large subunit (3) glyceraldehyde-3-phosphate dehydrogenase 3 (14) glyceraldehyde-3-phosphate dehydrogenase 3 (14) Ubiquitin-Like Protein (1)
pyruvate kinase (29) QuiniDine Resistance translational elongation factor EF-1 alpha (14) Ribosomal Protein of the Small subunit (8) 3-phosphoglycerate kinase (31)
protein name (number)a
20 YIL101C
19 YIL031W
18 YGR192C
17 YGR085C 18 YGR192C
16 YGL253W
14 YGL036W 15 YGL150C
13 YFR053C
11 YER087W 12 YER091C
accession no.
no.
Table 1. Identified Acetylated Peptides
technical notes Xie et al.
Journal of Proteome Research • Vol. 6, No. 5, 2007 2023
IEF as a Tool for Identification of Lysine-Acetylated Peptides
technical notes
Figure 3. Representative MS/MS spectra of identified acetylated peptides. (a) A peptide acetylated at a single lysine residue, derived from a ribosomal protein of the small subunit (YJL136C); (b) a peptide acetylated at two lysine residues, derived from the small GTPase protein (YML121W).
false-positive matches. Despite the use of the adjusted pI prediction algorithm, 88% of these sequences differ by more than 0.5 units when comparing the FFE fraction pH and theirpredicted pI@. Therefore, unlike the matches shown in 2024
Journal of Proteome Research • Vol. 6, No. 5, 2007
Figure 2A, the adjusted pI calculation does not shift these pI@ values closer to the measured FFE fraction pH. This result is in accordance with the expectation that the false-positive matches have more randomly distributed pI values compared
technical notes to the true, acetylated peptide sequences, and thus, the adjusted pI calculation should not bring the predicted pI@ and pH values into agreement. Adjusted Peptide pI Prediction Combined with Preparative IEF by FFE Identifies Lysine-Acetylated Peptides with Improved Confidence. Having determined the accuracy of the adjusted peptide calculation algorithm for predicting the pI of lysine acetylated peptides, we then applied it to help filter the sequence matches from our data set to acetylated peptides. Our filtering process was similar to our previous description using FFE peptide separations with tandem mass spectrometry.16 Only sequence matches with adjusted pI values within (0.5 units of their corresponding FFE fraction pH were accepted for acidic and basic fractions. For peptides in the neutral pH FFE fractions, the calculated average peptide pI was used instead of the measured pH, due to a lack of correspondence between the peptide pI values and the measured pH in this region, which we,15,16 and others,34 have consistently observed. After this pI@ filtering, the false-positive rate of acetylated peptide sequence matches was dramatically decreased at each assigned P-score compared to the results with no pI filtering, as shown in Figure 1. At P-score of 0.75, the false-positive rate was reduced from 26.5% to 5% with the aid of pI@ filtering, resulting in 156 MS/MS spectra matching to acetylated peptides. Since a 5% false-positive rate is still relatively high, further manual interpretation of the tandem mass spectra was done to validate that the major detected MS/ MS spectrum peaks matched expected b and y ions, and two sequence matches were removed which showed poorly matching fragmentation patterns. The final 154 MS/MS matches were derived from 35 unique, lysine-acetylated peptides, as listed in Table 1. Many of these peptides were detected in multiple FFE fractions; thus, they were selected multiple times for MS/ MS analysis. Figure 3 shows two representative spectra of identified acetylated peptides. MENDK@GQLVELYVPR is an N-terminal peptide from Ribosomal Protein of the Small Subunit (YJL136C) with a single lysine-acetylated site. The peptide FEK@ISNIMK@NFK is from Small GTPase (putative) (YML121W) with two lysine-acetylated sites. As aforementioned, lysine-acetylated peptides are expected to shift to more acidic FFE fraction compared to their unmodified counterparts. As shown in Table 1, all the adjusted, predicted pI values of acetylated peptides (pI@ column in Table 1) are lower than those of their pI values predicted without adjustment for acetylation (pI column in Table 1). Comparing the adjusted and nonadjusted predicted pI values for peptides shown in Table 1, the average decrease in predicted pI value was 1.5 units when adjusting for acetylated lysine residues and using previously the described algorithm by Shimura.33 In some cases, both the unmodified and acetylated forms of the same peptide sequence were identified, as shown in the column labeled SMu|SM@ in Table 1. For example, the nonacetylated peptide sequence VETGVIKPGMVVTFAPAGVTTEVK from the EF-1 alpha protein (YBR118W) had a predicted pI value of 6.2 which closely matched the pH of the FFE fraction (6.2) in which it was collected; the acetylated version of this peptide was also identified from an FFE fraction with a measured pH of 4.7, and the adjusted pI prediction algorithm calculated a pI@ value of 4.5 for this peptide. These results nicely illustrate the ability of our adjusted pI prediction algorithm to accurately account for the effect of lysine acetylation. The 35 acetylated peptides are derived from 33 distinct proteins. Of these, 25 proteins also had other nonmodified
Xie et al.
peptides from within their sequence identified in addition to the acetylated sequences shown in Table 1. The other eight proteins (shown in bold in Table 1) identified from only a single sequence match to an acetylated peptide, are low abundance (CBI less than 0.2), which may explain why other nonmodified peptides were not identified from these proteins. Generally, the acetylated peptides were derived from high-abundance proteins, which should be expected given that we did not enrich for the modified peptides from within the mixture, as others have done for proteomic studies of these modifications.36 Although the objective of this study was not to assign functional significance of lysine acetylations within the identified proteins, some interesting conclusions can be drawn from the catalog of modified proteins. For one, a majority of the proteins identified are known to be non-nuclear in their localization (see Table 1), confirming other studies36 which have showed that lysine acetylation occurs on a diverse group of proteins, not limited to nuclear proteins, which are the classic targets of acetylation. A number of the proteins in Table 1 are involved in carbohydrate metabolism, indicating the possible role of lysine acetylation in the regulation of this cellular pathway. Furthermore, lysine acetylation to a number of the proteins shown in Table 1 (Triosephosphate isomerase, aldehyde dehydrogenase, and thioredoxin, to name a few) have also been identified in a recent study profiling these modifications in mouse samples,36 indicating potential conserved mechanisms of acetylation within eukaryotes.
Conclusion In conclusion, our results demonstrate the following points: (1) differential mass shift parameters included in database searching for PTMs on peptides increases the potential for false-positive matches; (2) for the case of lysine acetylation, we were able to effectively adjust for the charge negating effect of this modification and accurately predict the pI of these modified peptides; (3) combined with preparative IEF of complex peptide mixtures, the use of adjusted peptide pI predictions enables more confident identification of lysine-acetylated peptides. Collectively, these results show the utility of preparative IEF fractionation to improve the identification of modifications which alter the intrinsic pI of peptides, providing a potentially powerful tool for proteomic studies of protein PTMs.
Acknowledgment. We thank Dr. Bruce Witthuhn and the Center for Mass Spectrometry and Proteomics at the University of Minnesota for instrumental assistance, and the Minnesota Supercomputing Institute for computational hardware support and maintenance of the Sequest cluster. This work was supported in part by a grant by the Minnesota Medical Foundation, NIH grants AG25371 and DK073731, and a research award to T.J.G. from Eli Lilly and Company. References (1) Rohrbough, J. G.; et al. Verification of single-peptide protein identifications by the application of complementary database search algorithms. J. Biomol. Tech. 2006, 17 (5), 327-32. (2) Qian, W. J.; et al. Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications. Mol. Cell. Proteomics 2006, 5 (10), 1727-44. (3) Qian, W. J.; Camp, D. G., II; Smith, R. D. High-throughput proteomics using Fourier transform ion cyclotron resonance mass spectrometry. Expert Rev. Proteomics 2004, 1 (1), 87-95. (4) Olsen, J. V.; and Mann, M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric
Journal of Proteome Research • Vol. 6, No. 5, 2007 2025
technical notes
IEF as a Tool for Identification of Lysine-Acetylated Peptides
(5) (6) (7) (8)
(9)
(10) (11) (12)
(13) (14) (15)
(16)
(17)
(18) (19) (20)
2026
fragmentation. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (37), 1341722. Dieguez-Acuna, F. J.; et al. Characterization of mouse spleen cells by subtractive proteomics. Mol. Cell. Proteomics 2005, 4 (10), 1459-70. Petritis, K.; et al. Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal. Chem. 2003, 75 (5), 1039-48. Palmblad, M.; et al. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal. Chem. 2002, 74 (22), 5826-30. Shen, Y.; et al. Packed capillary reversed-phase liquid chromatography with high-performance electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry for proteomics. Anal. Chem. 2001, 73 (8), 1766-75. Qian, W. J.; et al. Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach. Mol. Cell. Proteomics 2005, 4 (5), 700-9. Zimmer, J. S.; et al. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. 2006, 25 (3), 450-82. Cargile, B. J.; Bundy, J. L.; Stephenson, J. L., Jr. Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res. 2004, 3 (5), 1082-5. Cargile, B. J.; Talley, D. L.; Stephenson, J. L., Jr. Immobilized pH gradients as a first dimension in shotgun proteomics and analysis of the accuracy of pI predictability of peptides. Electrophoresis 2004, 25 (6), 936-45. Cargile, B. J.; et al. Gel based isoelectric focusing of peptides and the utility of isoelectric point in protein identification. J. Proteome Res. 2004, 3 (1), 112-9. Krijgsveld, J.; et al. In-gel isoelectric focusing of peptides as a tool for improved protein identification. J. Proteome Res. 2006, 5 (7), 1721-30. Xie, H.; et al. A catalogue of human saliva proteins identified by free flow electrophoresis-based peptide separation and tandem mass spectrometry. Mol. Cell. Proteomics 2005, 4 (11), 182630. Xie, H.; Bandhakavi, S.; Griffin, T. J. Evaluating preparative isoelectric focusing of complex peptide mixtures for tandem mass spectrometry-based proteomics: a case study in profiling chromatin-enriched subcellular fractions in Saccharomyces cerevisiae. Anal. Chem. 2005, 77 (10), 3198-207. Xie, H.; Griffin, T. J. Trade-off between high sensitivity and increased potential for false positive peptide sequence matches using a two-dimensional linear ion trap for tandem mass spectrometry-based proteomics. J. Proteome Res. 2006, 5 (4), 1003-9. An, Y.; et al. Solution isoelectric focusing for peptide analysis: comparative investigation of an insoluble nuclear protein fraction. J. Proteome Res. 2005, 4 (6), 2126-32. Heller, M.; et al. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 2005, 4 (6), 2273-82. Horth, P.; et al. Efficient fractionation and improved protein identification by peptide OFFGEL electrophoresis. Mol. Cell. Proteomics 2006, 5 (10), 1968-74.
Journal of Proteome Research • Vol. 6, No. 5, 2007
(21) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. ModifiComb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol. Cell. Proteomics 2006, 5 (5), 935-48. (22) Jensen, O. N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr. Opin. Chem. Biol. 2004, 8 (1), 33-41. (23) MacCoss, M. J.; et al. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sci. U.S.A. 2002, 99 (12), 7900-5. (24) Kim, J. Y.; et al. Probing lysine acetylation with a modificationspecific marker ion using high-performance liquid chromatography/electrospray-mass spectrometry with collision-induced dissociation. Anal. Chem. 2002, 74 (21), 5443-9. (25) Beausoleil, S. A.; et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (33), 12130-5. (26) Kouzarides, T. Acetylation: a regulatory modification to rival phosphorylation? EMBO J. 2000, 19 (6), 1176-9. (27) Moseley, M. A.; et al. Nanoscale packed-capillary liquid chromatography coupled with mass spectrometry using a coaxial continuous-flow fast atom bombardment interface. Anal. Chem. 1991, 63 (14), 1467-73. (28) Gatlin, C. L.; et al. Protein identification at the low femtomole level from silver-stained gels using a new fritless electrospray interface for liquid chromatography-microspray and nanospray mass spectrometry. Anal. Biochem. 1998, 263 (1), 93101. (29) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976-89. (30) Peng, J.; et al. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2003, 2 (1), 43-50. (31) Keller, A.; et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74 (20), 5383-92. (32) Han, D. K.; et al. Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 2001, 19 (10), 946-51. (33) Shimura, K.; et al. Fluorescence-labeled peptide pI markers for capillary isoelectric focusing. Anal. Chem. 2002, 74 (5), 104653. (34) Malmstrom, J.; et al. Optimized peptide separation and identification for mass spectrometry based proteomics via free-flow electrophoresis. J. Proteome Res. 2006, 5 (9), 2241-9. (35) Ong, S. E.; Mittler, G.; Mann, M. Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 2004, 1 (2), 119-26. (36) Kim, S. C.; et al. Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol. Cell 2006, 23 (4), 607-18.
PR060691J