Gel Based Isoelectric Focusing of Peptides and the Utility of Isoelectric Point in Protein Identification Benjamin J. Cargile, Jonathan L. Bundy, Thaddeus W. Freeman, and James L. Stephenson, Jr.* Mass Spectrometry Program, Research Triangle Institute, 3040 Cornwallis Road, Research Triangle Park, North Carolina 27709 Received June 12, 2003
Here we present the theoretical and experimental evaluation of peptide isoelectric point as a method to aid in the identification of peptides from complex mixtures. Predicted pI values were found to match closely the experimentally obtained data, resulting in the development of a unique filter that lowers the effective false positive rate for peptide identification. Due to the reduction of the false positive rate, the cross-correlation parameters Xcorr and ∆Cn from the SEQUEST program can be lowered resulting in 25% more peptide identifications. This approach was successfully applied to analysis of the soluble fraction of the E. coli proteome, where 417 proteins were identified from 1022 peptides using just 20 µg of material. Keywords: isoelectric focusing • mass spectrometry • protein identification • proteomics • peptides
Introduction The field of high-throughput protein identification via whole proteome digestion, and multidimensional separation, often referred to as “shotgun” proteomics1 has recently matured through several major advances. The recent development of biphasic nanocolumns2 that combine strong cation exchange material and reverse phase media in the same 75 micron i.d. capillary has received major attention. Although initially used for the analysis of protein complexes, a yeast lysate was digested after an initial crude fractionation and resulted in the identification of 749 peptides from 189 proteins.2 Further advances in this technology, which include the use of volatile salts and heptafluorobutyric acid, have been made that allowed for the identification of more than 1200 proteins.1,3 A large number of other variations of this multidimensional separation paradigm have also been employed for shotgun proteomics including capillary electrophoresis,4,5 offline cation exchange,6 as well as more complex schemes7 involving more than two separation phases prior to mass spectrometry. Most of these methods have enabled the identification of hundreds to thousands of proteins from thousands of peptides. However, to load enough digested material for the detection and identification of low abundance proteins, the first dimension of chromatography is often significantly overloaded and a loss of resolution occurs. One potentially useful tool for shotgun protoemics that has lacked much examination is the physiochemical properties and the amino acid composition of the separated peptides and the use of this information as an aid in the identification process. In a paper1 describing multidimensional protein identification technology applied to the yeast proteome, it was shown using * To whom correspondence should be addressed. Phone: (919) 844-0462. Fax: (919) 541-7208. Email:
[email protected].
112
Journal of Proteome Research 2004, 3, 112-119
Published on Web 09/17/2003
standards that the average isoelectric point of the peptides eluting off the cation exchange material increases with increasing salt concentration. This information was never utilized in the identification process though (average pI increases with the salt gradient), probably because the isoelectric point of individual peptides within each fraction varies greatly. More recently, it was shown using offline cation exchange that there is a significant correlation between solution phase charge and the general elution time of peptides.6 Although this provides a significant piece of information about the peptide, no attempt was made to predict the relative order of the eluting peptides for any given charge state or to use this information during identification. Another advance has been the ability to partially predict the elution time of peptides off reverse phase columns and the use of this information in the identification process.8,9 This does allow some discrimination between peptides to be made but the ( 10% deviation to “accurately” predict retention time for at least 95% of the peptides does leaves a large time window (a 20 min window for a 100 minute gradient) during which a peptide could theoretically elute.8,9 This problem is further compounded by the fact that peptides of similar size often eluted at the same retention time during the chromatographic run.10 Another potentially informative technique is the combination of peptide isoelectric focusing with mass spectrometry because the isoelectric point of the peptide can be determined experimentally. Several laboratories have demonstrated that the isoelectric point of peptides and proteins can be accurately estimated.11-13 For example, capillary isoelectric focusing has been employed to predict the isoelectric point to approximately (0.1 standard deviations with repeated measurements. One recent advance in this direction that appears promising has been the introduction of capillary isoelectric focusing coupled to mass spectrometry via reverse-phase liquid chro10.1021/pr0340431 CCC: $27.50
2004 American Chemical Society
Gel Based Isoelectric Focusing of Peptides
matography (RPLC), as detailed in two recent reports by C. S. Lee and colleagues.14,15 Using this separation strategy, the authors tentatively identify 1132 proteins14,15 from ca. 1 µg loaded in the first dimension. However, a complex plumbing system using a large number of capillary trap cartridges was needed to couple the first dimension electrophoresis step with RPLC. In addition, capillary electrophoresis is currently limited in the absolute amount of sample that can be loaded making it difficult to scale-up this type of separation for organisms with larger genomes than Saccharomyces cerevisiae. In addition, the authors reported the use of manual validation for protein identified by a single peptide, which constituted a significant number of their reported identifications. Finally, the pI of the peptides was not used to aid in the validation of protein identifications. Here we examine the use of carrier ampholyte slab gels as a first dimension in shotgun proteomics. We show the theoretical contribution of resolving peptides by their isoelectric point before mass analysis provides a means to resolve complex samples, in this case the E.coli proteome. Although a completely orthogonal separation system would be ideal, general trends between mass and isoelectric point show that this is not completely the case in this situation. Nevertheless, a significant amount of resolving power is still theoretically achieved. In addition, the potential benefits for identification of peptides in the E. coli proteome are shown as a function of the predictability of pI and/or the size of the fraction collected from the IEF step. The application of this technique through the use of slab IEF gels is also demonstrated for the soluble fraction of an E. coli lysate. Initial experiments illustrate that the pI of the peptides in each fraction correlate well with the theoretical pI and that the lower pH IEF fractions show relatively small standard deviation around the average pI. The high predictability of pI allows for an extra filter to be used during the identification process and lowers the false positive rate, as would be expected if false positives are truly random peptides and thus have random pI values. In turn, the XCorr or ∆Cn cutoffs can be lowered allowing 25% more peptides to be identified compared to using XCorr and ∆Cn cutoff filter values alone. Though in this report the same sample is loaded in multiple lanes, the amount of material used for the MS analysis (10% of the total starting material) suggests that a single lane of the gel could be used for the whole analysis, thus allowing several samples to processed simultaneously. The resolving power and new identification constraint demonstrate the utility of IEF of peptides in gels to proteomics projects without the need and added complexity of trying to directly couple capillary electrophoretic separations to a LC-MS platform.
Experimental Section Reagents. Acetonitrile (ACN) and water were purchased from Burdick and Jackson (Muskegon, MI). All other chemicals were obtained from Sigma-Aldrich (St. Louis, MO) and were the highest purity and quality available. Sample Preparation. E. coli strain K-12 bacteria obtained from American Type Culture Collection (ATCC, Rockville, MD) were cultured in M9 minimal media supplemented with glucose and MgSO4 and grown to an OD600 of 0.4. The cells were pelleted at 5000g. Protein was extracted by suspending the pellet in a buffer consisting of 8 M Urea in 25 mM Tris, pH 8.0 and vortexing for 10 min followed by three freeze-thaw cycles. The concentration of protein was determined using the
research articles BCA assay (Pierce, Rockford IL). Five hundred micrograms of sample was reduced with 10 mM DTT at 37 °C for 1 h. The urea was then diluted to 1 M using 25 mM Tris, pH 7.6 in 1 mM CaCl2. Sequencing grade trypsin (Promega, Madison WI) was added to a 1:50 ratio and the protein samples were placed in a water bath at 37 °C for 18 h. The digested peptides were de-salted with a C18 Sep-Pak (Waters, Milford MA) and dried in a Speedvac (ThermoSavant, Holbrook, NY). The peptides were then resuspended in 50 µL 1× IEF Sample Buffer (Invitrogen, Carlsbad, CA). Isoelectric Focusing. Isoelectric focusing was performed using a carrier ampholyte slab gel of dimensions 10 cm × 7 cm (Invitrogen, Carlsbad, CA) and a buffer system that allowed separation between a pH of 3-10. The buffer system consisted of 7 mM phosphoric acid for the anode chamber and 20mM lysine and arginine for the cathode chamber. Five micrograms of pI standards 3.6-9.3 (Sigma-Aldrich, St Louis, MO) were loaded into the outer lanes as a refrence for pI calculations. Fifty µg of the E. coli digest was loaded into each of the 10 internal wells to try and avoid overloading any single lane. The gel was run at constant voltage according to the manufacturer’s protocol as follows: 100 V for 1 h, 200 V for 1 h, 500 V for 30 min. Following focusing, the pI marker lanes were excised and stained using Commassie Blue. The rest of the gel was cut into two sections with one section containing 4 of the 10 lanes that contained peptides. This half of the gel was cut into ∼2.5 mm slices for a total of 27 bands/fractions. A second gel was loaded and run exactly the same as the first but was frozen with liquid nitrogen before cutting to minimize distortion and peptide diffusion and cut into 16 bands/fractions. Extraction of Peptides. Each of the gel slices was placed in a 600 µL eppendorf tube. To each tube 150 µL of 1% trifluoroacetic acid (TFA) was added to initiate peptide extraction. The gel slice was then vortexed for 10 min, and was followed by a sonication step for 10 min. The tube was then spun at 13000g for 1 min and the supernatant was collected. Next, 200 µL of 50:50 (v:v) ACN:0.1% aqueous TFA was used to extract the peptides. The aforementioned extraction step was then repeated with 200 µL of an ACN 0.1% TFA solution. For each gel slice, the supernatant fractions were combined and evaporated to ∼100 µL using a Speedvac. In the first experiment, the samples were desalted with a Sep-Pak according to the manufacturer’s protocol. In a second set of experiments, 0.2 µm spin filters (Pall Life Sciences, East Hills, NY) were loaded with a large particle size (>30 µm) C18 media (Alltech, State College, PA) to serve as a rapid and inexpensive alternative to C18 Sep-Pak cartridges. These spin columns were loaded and washed in the same fashion as the Sep-Paks, except centrifugation was employed in the wash steps as opposed to disposable syringes. These samples were evaporated to dryness in the Speedvac and resuspended in 50 µL 1% TFA. LC-MS/MS Analysis. An LC Packings Ultimate Pump, Switchos colum switching device and Famos Autosampler (Dionex Corporation, Sunnyvale, CA) were interfaced to a LCQ DECA XP ion trap mass spectrometer (ThermoFinnigan, San Jose, CA). A 100 µm i.d. × 360 µm o.d. reverse phase column was made as described by Martin et al.,16 except the same material was used to make the frit/plug as was used to pack the column. The material used in the column was a monodisperse 5 µm polymeric small bead RPC medium column packing material (Source 5RPC, gift from Amersham BioSciences, Piscataway, NJ). A capillary trap was made out of the same packing material and was used to facilitate the loading of samples Journal of Proteome Research • Vol. 3, No. 1, 2004 113
research articles
Cargile et al.
injected via the Famos onto the 100 µm column. Five µL of each fraction was loaded onto the capillary trap and washed briefly with 0.1% aqueous formic acid (5 min) before switching inline with the analytical column. An 80 min gradient was used from 15% to 50% B (A: Water with 0.1% formic acid, B: 70% ACN with 0.1% formic acid) at a flow rate of 250 nl min-1. The mass spectrometer was setup to take 1 full scan MS from the mass range of 400-1500 m/z followed by three MS/MS spectra of the top three peaks. The Dynamic Exclusion parameters were set to a repeat count of two with a ( 3 Da window set around the precursor mass and an exclusion time of 5 min. Data Analysis. The MSMS spectra were analyzed using the SEQUEST program supplied with the BioWorks 3.0 software from ThermoFinnigan. Searches were run considering tryptic peptides with three missed cleavages and differentially modified methionines (oxidized + 15.99). The cutoff for XCorr and ∆CN were evaluated in two different ways. Initially, a reversed database was used to find XCorr levels that gave ∼1% false positives as described by Jeng et al.6 From these identified peptides, a pI range was established equal to the 2 standard deviations around the average pI. The data were again filtered as described above but now only peptides falling within the pI range were accepted. The filtering program was written in Visual Basic and allows various criteria, including isoelectric point, XCorr, and ∆CN, to be used as minimal and maximum cutoff values. The program’s output could be saved in text files for use with other programs and customizable reports could be generated in Microsoft Access. The program and the advantages offered over similar software will be described in detail in a future publication. Computer Simulations. All programs for theoretical calculations described in the text were written in C using LabWindows/CVI version 6.0. The E. coli protein database from NCBI was used during the simulations and calculations. All pI calculations were preformed using the algorithm and values described by Bjellqvist et al.13
Results & Discussion Theoretical Considerations. The ability to separate peptides based on their isoelectric points and accurately predict the range of pI’s in specific fraction provides a powerful tool for shotgun proteomics analyses. A plot of the pI vs molecular weight for the whole E. coli proteome digested with trypsin in silico is shown in Figure 1 for differing number of missed cleavages. The difference between Figure 1a and 1b considers no missed cleavages or a perfect tryptic digest and one missed cleavage in a tryptic digest, respectively. Figure 1b not only shows an increase in the absolute number of peptides, but also a large increase in the medium to high molecular weight range of the middle to high pI (6.0-9.5) fractions. Although there are observable correlative trends between pI and mass of the peptides, the data is spread out significantly over the entire mass/pI plot. This feature is important because it divides the number of peptides over a broad pI range for any given mass, and thus greatly reduces the theoretical number of peptides one needs to consider for protein identification if the approximate pI of the peptide was known. Therefore, if a database search algorithm were written that incorporated this information, a dramatic decrease in search time would be expected, a current limitation noted with the cross-correlation based SEQUEST program. In addition, there should be a significant increase in specificity of the identifica114
Journal of Proteome Research • Vol. 3, No. 1, 2004
Figure 1. (a) pI plotted as a function of peptide mass for a theoretical digest of the E. coli proteome assuming no missed cleavages. (b) pI plotted as a function of peptide mass for a theoretical digest of the E. coli proteome with one missed cleavage. (c) Plot of the average pI for peptides of a given mass.
tions, because the measured pI value provides an orthogonal method complementary to tandem mass spectrometry for protein identification. Examining the average number of peptides in a given pI range clearly demonstrates this principle, as shown in Table 1. As the size of the pI range is decreased, the total number of peptides observed in the pI range also decreases. This trend is
research articles
Gel Based Isoelectric Focusing of Peptides Table 1: Distribution of Peptides Across pI Range as a Function of the Number of Missed Cleavages pI unit width
missed cleavages
average number of peptides
pI range with highest number of peptides
pI range with lowest number of peptides
1.00 0.50 0.25 0.10 1.00 0.50 0.25 0.10 1.00 0.50 0.25 0.10
0 0 0 0 1 1 1 1 3 3 3 3
16957 8478 4239 1695 7894 3947 1973 789 26314 13157 6578 2631
38401 (4.0-5.0) 23082 (4.0-4.5) 18424 (6.0-6.25) 13629 (8.7-8.8) 20955 (4.0-5.0) 15356 (4.0-4.5) 10272 (5.75-6.0) 7573 (5.8-5.9) 52608 (8.0-9.0) 33483 (8.5-9.0) 25611 (6.0-6.25) 17209 (8.7-8.8)
329 (12.0-13.0) 0 (12.5-13.0) 0 (several) 0 (several) 9 (12.0-13.0) 0 (12.5-13.0) 0 (several) 0 (several) 1934 (7.0-8.0) 13 (12.5-13.0) 0 (several) 0 (several)
shown for several different missed cleavage rates to examine the effect of adding additional lysine and arginine residues per peptide as well as increasing the total number of theoretical peptides. What is also apparent is that increasing the number of positively charged groups per peptide also increases the pI of the fraction that contains the most peptides. For example, for no missed cleavages and considering 1 pI unit steps, the fraction that contains the most peptides is between 4 and 5.0; whereas for three missed cleavages, the 8.0-9.0 fraction contains the most peptides. As mentioned previously, there are some specific trends between the isoelectric point and the molecular weight of the peptides. Figure 1c shows the average pI of polypeptides that have the same nominal mass. The lines generated for the various number of missed cleavages rates, 0-3, are actually a 10 point running average at any given mass rather than the real points to aid in visualization of the correlations. The four different lines show how the average isoelectric point is affected when missed cleavages are considered in the theoretical digest. The first major trend is that as the peptide increase in mass, the average pI decreases. This phenomenon is caused by the fact that each peptide has primarily one positively charged amino located at the C-terminus (for the line with no missed cleavages). Therefore, as the peptides become longer in length, the probability of more negatively charged amino acids greatly (i.e., glutamic acid and aspartic acid) increases. Although any given peptide may contain a histidine as well as an arginine or lysine before a proline, these peptides will represent only a small fraction of the total number at any mass and will easily average out. This also holds true for missed cleavages, although for each missed cleavage one additional arginine or lysine will be present in approximately half the peptides. These additional charged groups explain the second major trend in this comparison, that there is an increase in the average isoelectric point for any given nominal mass as the number of missed cleavages increases. As more lysine or arginines are added to a peptide without increasing the mass, the pI for that peptide will have to increase since these are very basic residues. This effect is more pronounced for peptides with lower pI and, thus is seen more clearly at higher mass. Although these trends do exist, the use of pI still significantly spreads out the peptides over a wide pI range, which is demonstrated to be of use during the identification process. Use of Carrier Ampholyte IEF for Peptide Fractionation. Implementation of carrier ampholyte isoelectric focusing as part of a multidimensional separation system is routine due to commercially available IEF slab gels. An example of the separation achieved and corresponding peptide identification
from this system is depicted in Figure 2a and 2b. For each fraction, only 20 µg of sample was analyzed by this method. Even with this relatively low amount of protein, 700 peptides from 303 proteins were identified using just Xcorr and ∆CN as cutoff criteria. Most of the identified peptides appear to be found at the high pI region (7.5-9.5) of the gel rather than in the lower pI range. This finding is fairly surprising because most of the larger peptides are expected to be found in this region, and the experimental setup favors identification of these larger peptides for several of reasons. First, the ion trap was only scanned from m/z 400-1500, and since most peptides have a charge of at least +2 or +3 when using a nanospray ionization source, the minimal mass of an identified peptide would be 800 or 1200 Da depending on the charge state and could range as high as 4500 Da. As shown in Figures 1a and 1b, most of the larger peptides have a pI between 3 and 6. Second, there are fewer peptides at higher mass per nominal mass unit, which decreases the amount of specificity required from a tandem mass spectrometry experiment (i.e., fragmentation) needed to uniquely identify a peptide. The most probable reason that such a high proportion of peptides were found in this region could be due to the precipitation of the peptides upon entering the gel matrix. Looking at the observed distribution of peptides in this region suggests that poor solubility is likely the cause due to the large standard deviation of ( 2 pI units observed with this dataset. Although the large pI standard deviation could be caused by poor predictability of the peptide isoelectric points, this possibility is unlikely since the standard deviation of pIs observed in the gel decreases significantly down to less than 0.2 pI units for the three fractions at the acidic end of the gel. One of the reasons that isoelectric focusing was chosen for the first dimension of separation is the high resolving power of this technique. Proteins resolved in IEF gels typically show equal if not greater resolution than those observed in SDSPAGE, and have peak capacities that are greater than 200, as determined by Commassie staining (data not shown). There is little reason to suspect that the peak capacity for the experiment would decrease using peptides. Most peptides were found in only 1 or 2 gel slices, with those observed in two slices likely due to diffusion after the focusing voltage was turned off, or from uneven cutting during band excision across 4 lanes of the gel. In addition, the few peptides that were seen in more than 2 fractions could be associated with abundant peptides that stick to the capillary trap even after washing between fractions. Another possible explanation is that the peptides could have precipitated out of solution during peptide migration from the well to the appropriate lower pH in the gel. Journal of Proteome Research • Vol. 3, No. 1, 2004 115
research articles When calculating total peak capacity for this experiment, the maximum capacity for the isoelectric focusing is limited to the number of gel slices taken, which is 27 in this experiment. Combining this with a peak capacity of ∼100 for the nanocolumn (peak widths of ∼42 s for intense peaks over a 70 min range) gives a peak capacity of 2700. For the ion trap, the peak capacity is equal to the total m/z range used divided by the nominal mass resolving power. For our experiments, the scan range is 1100 m/z units wide (scanning m/z 400 to m/z 1500), with 1 dalton nominal mass resolving power resulting in a peak capacity of 1100. Therefore, the total peak capacity for this method is 2.97 × 106 which is more than enough to cover most of the E. coli proteome. Although this is the theoretical peak capacity, the working peak capacity is significantly lower due to the use of dynamic exclusion. The dynamic exclusion mass width window was set to ( 3 Da, which limits the peak capacity of the mass spectrometer to 183 (1100 m/z divided by a 6 Da window). The dynamic exclusion also negates the peak width of the peptides on the column and replaces it by the time that a particular m/z is on the exclusion list (which in this case is 5 min), thus reducing the overall peak capacity for the column to 14. Combining these numbers gives a peak capacity of 69 174 which is still enough to cover much of the E. coli proteome. pI as an Identification Criterion. To use pI in order to filter the data, a program was written that allowed SEQUEST result files to be filtered by XCorr, ∆Cn, and/or pI. Using the reverse database method to look at the false positive rate, a minimal level for XCorr using a ∆Cn value of 0.08 was determined for each charge state to minimize the misidentifications to approximately 1%. Using this method, a total of 700 peptides were identified from 303 different proteins. The identified peptides from fraction 1 are listed in Table 2. To use pI as a filter for protein identification, an average pI for each fraction and a standard deviation was determined from the above peptide set (see Figure 3a). This pI average and standard deviation was used to determine a working pI range, that was set at two times the standard deviation. To try and alleviate the problem of cross contamination and precipitation of peptides onto the gel so that an accurate pI range could be determined, several different data filters were employed. The first removed any peptide that was present in four or more fractions and is labeled in Figure 3a as “Filtered Peptides”. The second uses only the highest half of the peptides’ pI from each fraction, with the assumption that the lower pI peptides could come from cross contamination from previous fractions. To see if peptides could be identified based on the estimated pI of each fraction, a similar test was done using a pI range derived from the estimate of the pI based on the markers with an error of 0.5 pI units. In all cases, the XCorr value for each charge state was lowered until a false positive rate of 1% was determined. The number of peptides identified in each fraction is shown in Figure 3b. These peptides were then accepted as real hits, since there is no reason to assume that the use of pI would only eliminate the reversed database false positives and not those from the forward orientation of the proteins. This new filtered criteria allowed the identification of 900 peptides and 404 proteins. This was primarily due to the fact that XCorr cutoffs could be lowered from 2.4, 2.5,and 2.8 to 1.1, 1.6 and 1.9 for +1, +2, +3 charge states respectively, whereas the false positive rate remained at 1%. By combining the two data sets using the different filtering criteria, a total of 1022 peptides and 417 proteins were identified from only 20 µg of material, as show in Figure 3c. Here we would have expected the change 116
Journal of Proteome Research • Vol. 3, No. 1, 2004
Cargile et al. Table 2: Peptide Sequences and Their Corresponding Isoelectric Points Identified from the First IEF Fraction peptide
isoelectric point
EGVITVEDGTGLQDELDVVEGMQFDR DYLDGVDVAEGELVVLENVR AEAGDVANAILDGTDAVMLSGESAK GGDGNYGYNAATEEYGNMIDMGILDPTK VALQDAGLSVSDIDDVILVGGQTR ELASEVGSLLTYEATADLETEK TTDVTGTIELPEGVEMVMPGDNIK VEDATLVLSVGDEVEAK VGDTVIEFDLPLLEEK LYTTNADGELITIDTADNK AAGAELVGMEDLADQIK DTTTIIDGVGEEAAIQGR EAGVQEADFLANVDK EM*LIADGIDPNELLNSLAAVK SLYEADLVDEAK DPDVVLLADK DQLLENLQEGMEVK EGVQEDILEILLNLK GMNTAVGDEGGYAPNLGSNAEALAVIAEAVK LQTLGLTQGTVVTISAEGEDEQK DRVEDATLVLSVGDEVEAK ANDAAGDGTTTATVLAQAIITEGLK DEFADGASYLQGK DIADAVTAAGVEVAK GGDTVTLNETDLTQIPK IMIDLDGTENK AGYAEDEVVAVSK AYEDAETVTGVINGK GATVELADGVEGYLR ILSIDTEGLTAEQIR SLDDFLIKQ DAGFQAFADK AEAEQTLAALTEK VGEEVEIVGIK AAEVLVVDTR
3.578 3.713 3.769 3.769 3.769 3.831 3.834 3.834 3.834 3.839 3.916 3.916 3.916 3.916 3.916 3.930 4.002 4.002 4.002 4.002 4.017 4.027 4.027 4.027 4.027 4.027 4.137 4.137 4.137 4.137 4.207 4.208 4.253 4.253 4.371
* Oxidized methionine.
in the number of identifications to be more significant than 28%, but this lower than expected increase in identifications can easily be explained. Because all the peptides from the E.coli database were searched by SEQUEST, many ∆Cn values were artificially low because peptides with pIs outside this range were often the second best match. If all peptides from the results list were removed that were outside the pI range, then a peptide with the appropriate pI could be unmasked even though the XCorr and/or ∆Cn value would be much lower than normally accepted (or considered valid). However, if the false positive rate was also lowered concurrently then these new identifications could be validated to some degree. Although this method would work, it would be better to write an original search algorithm to filter by pI during the actual database search. To this end, we are considering implementing a program to do this type of searching where an initial unfiltered peptide set is searched to get the pI range of each fraction, and then a second search is run using only peptides from the appropriate range. Minimizing Cross Contamination Between IEF Fractions. As can be seen from Table 1, if the pI range of IEF fraction can be minimized, then a decrease in the XCorr and ∆Cn cutoff values should be possible since this can also minimize the false positive rate. Three changes to the original method were made to concurrently try to limit peptides to a unique IEF fraction. The first was freezing the IEF gel in liquid nitrogen immediately after focusing. The rationale behind freezing the gel was that peptides have the potential to diffuse easily through the gel matrix. Second, the acrylamide of the IEF gel was crossed linked
Gel Based Isoelectric Focusing of Peptides
research articles
Figure 2. (a) Number of peptides identified in each fraction plotted as a function of average pI. Twenty micrograms was loaded on column for LCMS analysis. (b) The total ion chromatogram of fraction 1. (c) The mass spectrum at ∼66.5 min is shown from which several precursor were selected for fragmentation. (d) The tandem mass spectrum of m/z 1426.6 is shown. The isoelectric point of the identified peptide, EGVITVEDGTGLQDELDVVEGMQFDR, calculated as described in the text is 3.58 and is one of the lowest pI values in these dataset.
at 4% T rather than 20% T as found in Tris-Tricine gels (gels normally used to resolve peptides), thus making it very fragile and prone to mechanical damage. Treatment with liquid nitrogen made the gel rigid, and eliminated this difficulty. However, one problem with freezing the gel was that the plastic backing has a tendency to shatter if cooled too quickly. Another potential solution is to construct a device to simultaneously cut all the bands from the gel at once, thus minimizing the amount of time the peptides have to diffuse. Current efforts are underway to build such a device. The second place that peptides likely were able to cross contaminate the samples was during the desalting step just before being loaded into an autosampler. Initially, this cleanup was performed using C18 Sep-Paks. Due to the expense of this item, a single Sep-Pak was employed to desalt eight or more samples consecutively. Although the Sep-Pak was thoroughly washed between samples, the majority of the cross contamination is thought to have occurred during this cleanup step because when a new Sep-Pak was used with IEF fraction 8, the standard deviation of the average pI value was significantly lowered. As an alternative to C18 Sep-Paks, we designed an inexpensive spin column to desalt the IEF peptide fractions. These spin columns consisted of a spin filter that had a 50 µL
volume of a large particle size (∼30 µm) C18 reverse phase column packing material placed on the filter surface. Because the spin filters can be rinsed and reused with the addition of new media, these spin columns are very inexpensive when bulk C18 media is used. These spin column can also be processed in parallel, without the use of a vacuum manifold. This resulted in a significant reduction of the standard deviation of the average pI value for each fraction in subsequent experiments. Finally, between each injection of the IEF fractions, the nanocolumn was extensively washed by quickly going through gradient elution twice and then equilibrating before the next IEF fraction was injected. A substitute to this extra wash step would be to alternate fractions from each end of the IEF gel with those from the middle. This would make it easy to subtract out peptides with drastically different pI values before filtering the data. When these three methods were combined, an improvement was seen in the relative standard deviation of the IEF fractions as shown in Figure 4a. To better illustrate how the peptides tend to cluster in pI groups, a plot of the all the pI values of the peptides in each fraction is also shown (see Figure 4b). The majority of the standard deviations barely overlap in more than one adjacent fraction and significantly improved the use of pI as a filtering criterion. The exceptions Journal of Proteome Research • Vol. 3, No. 1, 2004 117
research articles
Cargile et al.
Figure 4. (a) Graph of the average pI and standard deviation from IEF fractions after minimization of cross contamination effects. The larger standard deviation (black box) comes from the use all peptides in the calculation. The smaller standard deviation (red box) represents removal of statistical outliers from the data. Statistical outliers are thought to originate from cross-contamination of the peptide trap and precipitation of the peptides as they entered the gel matrix. This removal is partially justified because removal of less than ∼5% of the peptides in a fraction reduces the standard deviation by a factor of 2. (b) The original data is plotted as the pI of each peptide versus fraction number it was found in. The expected pI based on the isoelectric markers has been fit to a straight line (red line), and the average pI of each fraction calculated from the data represents the second line (blue line).
Figure 3. (a) Graph of the average peptide pI versus IEF fraction number. Error bars indicate the standard deviation for each fraction. Peptide filters were employed to limit cross contamination and peptide precipitation artifacts. (b) The number of peptides identified from each fraction are shown using three different data filters. The different filtering criteria are described in the text with XCorr meaning XCorr cutoff only, Experimental pI + XCorr meaning the average pI of a fraction ( two standard deviations, and theoretical pI + XCorr meaning the estimated pI based on the stained markers ( 0.5 pI units and the XCorr. 3c) The total number of unique peptides found from each data filter.
where we believe that peptides simply precipitated out of solution. Although there is some improvement observed, a significant amount of work remains to be done to further minimize the cross contamination, and reduce peptide precipitation in the gel.
were around pI of 6.0 where the lack of peptides in the pI range of ∼7.0-8.0 likely caused some distortion due to the nature of the carrier ampholyte system, and at high pH around the well
The use of isoelectric focusing to fractionate peptides an as aid in the identification of proteins has been demonstrated. With improvement of peptide solubility in the gel matrix, IEF-
118
Journal of Proteome Research • Vol. 3, No. 1, 2004
Conclusions
research articles
Gel Based Isoelectric Focusing of Peptides
RPLC-MS could become the method of choice for multidimensional proteome separations and analysis, due to the high resolving power of this technique. Furthermore by using immobilized pH gradients17 that are have a higher load capacity, better resolution, and are more reproducible, very thin band fractions can be obtained that could significantly increase the dynamic range of the analysis, thus increasing the probability of identifying low abundance peptides. The additional benefit of providing criteria to limit false positives and thus lower XCorr and ∆CN values further increases the appeal of this technique for use in proteomics. If a search algorithm was designed to take advantage of the pI criteria during the database matching, then a further increase in identification specificity is expected and should increase the number of identifications greater than the 28% observed here. Even without the development of such an algorithm, this combination of bioanalytical methods provides a significant advantage over similar separation schemes such as cation or anion exchange reverse phase LC-MS because of the increased separating power and database search constraint of isoelectric point. There are several questions that still need to be addressed to increase the utility of this technique, the most important being how well can the pI of peptides be predicted. This will significantly affect the useful size of the IEF fraction that can be obtained from the gels. If pI is predicted accurately, then very small bands can be sliced from the gel to increase resolution between peptides and further restrict the pI during the database search. However, if pI is poorly predicted, then large bands, such as those used in this paper will be optimal for use in filtering protein identification data. On the basis of the observations from these experiments, it seems likely that pI prediction algorithms are fairly accurate for peptides. This leads to the question then of whether combining this approach with accurate mass measurements and limiting the number of peptides considered would eliminate the need for MS/MS on instruments with less resolving power than an FTICR. We are currently investigated this as an alternative high-throughput proteomics strategy for proteome profiling.
Acknowledgment. The authors wish to acknowledge the Internal Research and Development Program from the Re-
search Triangle Institute for funding of this research. In addition, we wish to recognize James E.H. Powell from Amersham Biosciences for providing the monodisperse polymeric small bead RPC medium column packing (Source 5RPC) for the reverse phase separation work.
References (1) Wolters, D. A.; Washburn, M. P.; Yates, J. R., 3rd Anal. Chem 2001, 73, 5683-5690. (2) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., 3rd Nat. Biotechnol. 1999, 17, 676-682. (3) Washburn, M. P.; Wolters, D.; Yates, J. R., 3rd Nat. Biotechnol. 2001, 19, 242-247. (4) Tong, W.; Link, A.; Eng, J. K.; Yates, J. R., 3rd Anal. Chem. 1999, 71, 2270-2278. (5) Figeys, D.; Ducret, A.; Yates, J. R., 3rd; Aebersold, R. Nat. Biotechnol. 1996, 14, 1579-1583. (6) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43-50. (7) VerBerkmoes, N. C.; Bundy, J. L.; Hauser, L.; Asano, K. G.; Razumovskaya, J.; Larimer, F.; Hettich, R. L.; Stephenson, J. L., Jr. J. Proteome Res. 2002, 1, 239-252. (8) Palmblad, M.; Ramstrom, M.; Markides, K. E.; Hakansson, P.; Bergquist, J. Anal. Chem. 2002, 74, 5826-5830. (9) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; PasaTolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Anal.Chem. 2003, 75, 1039-1048. (10) Shen, Y.; Zhao, R.; Belov, M. E.; Conrads, T. P.; Anderson, G. A.; Tang, K.; Pasa-Tolic, L.; Veenstra, T. D.; Lipton, M. S.; Udseth, H. R.; Smith, R. D. Anal.Chem. 2001, 73, 1766-1775. (11) Shimura, K.; Kamiya, K.; Matsumoto, H.; Kasai, K. Anal. Chem. 2002, 74, 1046-1053. (12) Shimura, K.; Zhi, W.; Matsumoto, H.; Kasai, K. Anal. Chem. 2000, 72, 4747-4757. (13) Bjellqvist, B.; Hughes, G. J.; Pasquali, C.; Paquet, N.; Ravier, F.; Sanchez, J. C.; Frutiger, S.; Hochstrasser, D. Electrophoresis 1993, 14, 1023-1031. (14) Chen, J.; Lee, C. S.; Shen, Y.; Smith, R. D.; Baehrecke, E. H. Electrophoresis 2002, 23, 3143-3148. (15) Chen, J.; Bagley, B. M.; DeVoe, D. L.; Lee, C. S. Anal. Chem. 2003, In Press. (16) Martin, S. E.; Shabanowitz, J.; Hunt, D. F.; Marto, J. A. Anal. Chem. 2000, 72, 4266-4274. (17) Bjellqvist, B.; Ek, K.; Righetti, P. G.; Gianazza, E.; Gorg, A.; Westermeier, R.; Postel, W. J Biochem. Biophys. Methods 1982, 6, 317-339.
PR0340431
Journal of Proteome Research • Vol. 3, No. 1, 2004 119