pICarver: A Software Tool and Strategy for Peptides Isoelectric

We present the development of a new tool and strategy that generates a fractionation ... The software tool is freely available on www.expasy.org/tools...
2 downloads 0 Views 3MB Size
pICarver: A Software Tool and Strategy for Peptides Isoelectric Focusing Ali R. Vaezzadeh,† Ce´line Hernandez,*,‡ Oscar Vadas,§ Jacques J. M. Deshusses,† Pierre Lescuyer,†,| Fre´de´rique Lisacek,‡ and Denis F. Hochstrasser†,| Biomedical Proteomics Research Group, Department of Bioinformatics and Structural Biology, Geneva University, 1 Michel Servet, 1211 Geneva, Switzerland, Swiss Institute of Bioinformatics, Geneva University, 1 Michel Servet, 1211 Geneva, Switzerland, Department of Structural Biology and Bioinformatics, Geneva University, 1 Michel Servet, 1211 Geneva, Switzerland, and Clinical Proteomics Laboratory, Department of Genetics and Laboratory Medicine, Geneva University Hospitals, Geneva, Switzerland Received April 9, 2008

The use of isoelectric focusing as first dimension of separation is a new trend in shotgun proteomics. In all applications using this approach, peptides are separated into equitable fractions, whereas theoretical distribution of peptides according to pI is heterogeneous. We present the development of a new tool and strategy that generates a fractionation scheme resulting in almost even distribution of peptides per fraction, based on theoretical and experimental data. The “pICarver” software tool also increases the throughput of the approach by reducing the number of fractions and merging the peptidepoor regions. A set of isoelectric point fluorescent peptide markers was also developed in combination with the pICarver program to calibrate the pH gradient of commercially available strips. These markers enhanced the precision of pICarver predications. The overall strategy allowed detecting false positive identification and post-translational modifications. The software tool is freely available on www. expasy.org/tools/pICarver. Keywords: Isoelectric Focusing • Shotgun Proteomics • Tandem Mass Spectrometry • Isoelectric Point • Fluorescent pI Markers

Introduction Landmark developments in mass spectrometry (MS)-based proteomics have promoted this field to play a major role in life science research.1,2 In combination with powerful bioinformatics tools, proteomics now allows the identification, characterization and even quantitation of thousands of proteins directly from a variety of complex biological samples in different states. Current proteomics techniques employ highly developed separation technologies and sophisticated mass spectrometers. One of the mostly used proteomics approaches is the so-called “Bottom-Up” or “Shotgun” proteomics analysis, which involves proteolytic digestion of the proteins immediately after their isolation from a cell or a tissue.3 Given the complexity of the peptide mixture in shotgun approaches, a single reverse-phase LC (RPLC) does not provide enough peak capacity to resolve hundreds of thousands of peptides. Therefore sample fractionation prior to tandem mass spectrometry (MS/MS) analysis is beneficial. The concept of using isoelectric focusing (IEF), where molecules are separated based * Corresponding author: Ce´line Hernandez, Swiss Institute of Bioinformatics, Geneva University, Medical Center (CMU), 1 rue Michel Servet, CH1211 Geneva 4, Switzerland. Tel, +41 22 379 5838; fax, +41 22 379 5858; e-mail, [email protected]. † Department of Bioinformatics and Structural Biology, Geneva University. ‡ Swiss Institute of Bioinformatics, Geneva University. § Department of Structural Biology and Bioinformatics, Geneva University. | Department of Genetics and Laboratory Medicine, Geneva University Hospitals.

4336 Journal of Proteome Research 2008, 7, 4336–4345 Published on Web 09/11/2008

on their isoelectric points (pI), for the separation of peptides is an increasing trend. Among the advantages of combining IEF and shotgun proteomics, one can mention high resolving power and reproducibility of IEF, as well as the high sensitivity and broad analytical range of the shotgun technology. The use of IEF allows the benefit of the “added value” of pI as a validation and filtering criteria to identify false positives and post-translational modifications (PTM).4-6 Different IEF techniques have been employed substituting the gold standard Strong Cation Exchange (SCX) as the first dimension of separation for shotgun proteomics. Some of them include immobilized pH gradient (IPG)-IEF,7-9 Capillary IEF (CIEF),10,11 Solution IEF,12,13 Free Flow Electrophoresis (FFE) 14-17 and Off-Gel electrophoresis.18,19 In the first reported use of IPG as first dimension of separation for shotgun proteomics, Cargile and colleagues separated peptides from cytosolic fraction of Escherichia coli peptides on an 18 cm 3-10 IPG strip.20 The IPG strip was cut into equidistant fractions. Peptides were extracted and submitted to LC-MS/MS analysis. More than 6000 peptides (corresponding to more than 1200 proteins) were identified. While they were able to identify a considerable number of peptides in some fractions (more than 350), only a limited number of peptides (less than 10) could be identified in several other fractions. Horth and co-workers used an Off-Gel electrophoresis device to separate E. coli peptides and reported similar heterogeneity in the number of identified peptides per fraction.18 They identified 3454 peptides from 670 proteins. How10.1021/pr8002672 CCC: $40.75

 2008 American Chemical Society

pICarver: A Software Tool and Strategy for Peptides IEF ever, up to 210-fold difference was observed in the number of peptides identified between two different fractions. Four fractions in their experiment contained less than 5 identified peptides. In a study by Malmstro¨m et al. on analysis of Drosophila melanogaster by FFE, while some fractions contained up to 1400 unique peptides, 8 fractions contained less than 50 peptides.15 There are other examples of such heterogeneous numbers of peptides identified in each fraction by similar use of IEF as the first dimension of separation for shotgun produced peptides, using various samples and technologies.5,7,9 The theoretical pI distributions of the in silico peptidome of most species show that the tryptic peptides are not spread evenly along the pH gradient. Stephenson’s group demonstrated that around 80% of E. coli proteome is represented by at least one peptide within a pI range of 3.5-4.5, whereas the peptide content in areas around pI 7 is very low. Theoretical pI distribution map of Staphylococcus aureus tryptic peptides shows that 45% (corresponding to 16 101 peptides) are in pI zone 3.5-4.5 (Figure 5). These observations point to the need to address and take into account such information before launching a “Shotgun IEF” experiment. The theoretical distribution of peptides or data obtained previously could be used to plan the fractionation scheme, in term of sizes and peptide content, and to obtain an almost even distribution of peptides per fraction. The advantages of such an approach would be to (i) increase the throughput by avoiding analysis of empty fractions, (ii) save MS instrument time, (iii) adapt the LC gradient to the expected number of peptides, (iv) optimize the use of pI as a validation tool, and (v) avoid the suppression effect due to overloading. Here, we present the “pICarver” software program, which is designed to provide an optimal fractionation scheme for IEF separation, in order to obtain similar numbers of peptides per fraction. All experiments were performed using the IPG-IEF approach. The pICarver program takes into account an input list containing the pI information of the theoretical or experimentally obtained peptides. The user can select different pH gradients, separation lengths and fractionation schemes. The program then generates a scheme for fractionating the corresponding IPG strip or other liquid or gel mediums. Although the gradient of commercially available IPG strips is usually highly reproducible, some irregularities are often observed in the exact position of the pH gradient of the gel on the plastic backing. The purity and the concentration of the sample can also affect the quality of IEF. To count for these irregularities and maintain a quality control measure, we developed a set of fluorescent peptidic pI markers. The markers are then used to adjust the theoretical and the existent gradient of the IPG strip according to the position of the markers on the strip. The developed software and pI markers are then used for analysis of S. aureus proteome.

Materials and Methods Reagents and Chemicals. All chemicals purchased were of the highest purity grade, unless otherwise stated. LiChrosolv water (Merck, Darmstadt, Germany) was used for the preparation of all buffers and solvents. Acetonitrile (AcN) was purchased from Biosolve (Westford, MA). Trifluoroacetic acid (TFA), R-cyano-4-hydroxycinnamic acid, 1,4-dithioerythritol (DTE), ammonium bicarbonate, iodoacetamide, glycine, porcine trypsin, Tris, Bovine Serum Albumin (BSA), Rabbit Phosphorylase b (Phos b), Chicken Ovalbumin and Bovine β-Casein

research articles were from Sigma-Aldrich (St. Louis, MO). IPG strips and ampholines were purchased from GE Healthcare (Piscataway, NJ). SDS-PAGE precast gels and molecular mass markers were purchased from Bio-Rad (Hercules, CA). Amino acids were from Novabiochem (Switzerland). Peptide synthesis grade DMF, DIEA and TFA were purchased from Biosolve. HBTU was from Iris Biotech. S. aureus Strain Growth Conditions. S. aureus strain N315 was used as a model system for the development of the tool and strategy. The bacteria were grown with agitation at 37 °C in 200 mL of Mueller Hinton Broth (MHB) in a 1000-mL flask. At stationary phase (OD540-nm ) 6 corresponding to 2-3 × 109 cells/mL), cells were chilled on ice and harvested by centrifugation at 8000g for 5 min at 4 °C. For total protein extracts, cells were lysed with 20 mg/mL hydrolytic enzyme lysostaphine (Ambicin, Applied Microbiology, Tarratown, NY) for 15 min at 37 °C, in Tris-EDTA (TE) buffer. Insoluble material was removed by centrifugation at 5000g for 10 min. For preparation of crude membrane extracts, 20 mL of culture aliquots was washed in 1.1 M saccharose-containing buffer,21 then suspended in 2 mL aliquots of the same buffer containing 50 µg/mL lysostaphin for 10 min at 37 °C. Protoplasts were recovered after centrifugation (30 min at 8000g) and hypo-osmotic shock was applied in the presence of 10 µg/mL DNase I (Fluka, Buchs, Switzerland) to decrease the viscosity of the medium. Crude membrane pellets were obtained after ultracentrifugation at 50 000g for 50 min in a Beckman Optima TLX (Beckman Coulter Intl SA, Nyon, Switzerland). Synthesis of Fluorescent Peptidic pI Markers. Peptides were prepared manually by Solid phase Peptide Synthesis using standard Fmoc/tBu strategy.22 An amount of 0.25 mmol of Fmoc-protected amino acids activated with 0.24 mmol HBTU was coupled with 0.05 mmol Rink amide 4-methylbenzhydrylamine resin (Fluka) in the presence of 0.3 mmol DIEA. Couplings were allowed to react for 60 min with occasional stirring. Fmoc-amino acids were protected by the following groups: Arg(Pbf), Asp(OtBu), Cys(Trt), Glu(OtBu), His(Trt), Lys(Boc). Removal of Fmoc protecting group was done with a 20% piperidine solution in DMF for 5 and 15 min, followed by DMF wash. Acylation of peptide was done with 0.5 mmol acetic anhydride in the presence of 0.6 mmol DIEA. Peptides were cleaved from the resin with 4 mL of TFA solution containing 3% water and 3% tri-isoporpylsilane (Aldrich, Switzerland) as scavengers. After 4 h, reaction mixture was filtered and the resin was rinsed twice with 2 mL of TFA. Solution was concentrated by TFA evaporation and the peptides were precipitated and washed with cold diethyl ether before lyophilization. Peptide sequences after deprotection were DDEHACG-NH2, Ac-DHHACG-NH2 and RKHGCA-NH2 which, respectively, will be referred to as peptide markers 1, 2, and 3. The purity of each peptide was verified by analytical HPLC and MALDI-TOF MS. All markers were soluble in water and had low GRAVY values of -1.4, -1.0, and -1.28 for markers 1, 2, and 3, respectively. The peptides were dissolved in water at concentrations of 2 or 3 mg/mL. The pH was controlled by addition of 3 µL of 1 M triethanolamine-bicarbonate. A 10 mg/mL solution of iodoacetamido fluorescein in DMF was added in 20% molar excess. The solution was left at room temperature for 10 min and subjected to microwave heating. The tube was placed in a beaker with 500 mL of water and subjected to irradiation in a kitchen microwave (FUNAI, Hamburg, Germany) with a maximum output power of 850 W and a frequency of 2.45 GHz. The irradiation took place during 6 min at a microwave’s reduced Journal of Proteome Research • Vol. 7, No. 10, 2008 4337

research articles

Vaezzadeh et al.

Figure 1. pICarver software tool with its different functions and windows. The program can be used to determine a fractionation scheme for Shotgun IEF experiments.

power of 175 W. The temperature of the bath rose to 57-59 °C. After cooling, the peptides were subjected to purification. Purification of fluorescent peptides was done by reversephase HPLC on Waters equipment using a Macherey-Nagel C8 column (4 × 250 mm, 300 Å, 5 µm particle size) at 0.6 mL/ min. Solvent A was 0.1% TFA in HPLC grade water. Solvent B was 90% acetonitrile with 0.1% TFA. Elution was done with a 60 min linear gradient of 20-80% solvent B. The purified peptides were subjected to an extra-level of purification using preparative thin layer chromatography (TLC). TLC was performed on Silica gel 60 devoid of fluorescent indicator. The solution of fluorescent mixture corresponding to 150 µg of peptide was distributed on a 9 cm line. First migration was obtained with a 2:1 mixture of CHCl3/methanol in order to remove nonreacted fluorescein derivative. The fluorescent peptides remain at the origin. A second migration was obtained with the following solvent mixtures: AcN/Me2CO/AcOH/H2O, 30:10:2:20 for marker 1; Me2CO/H2O/NH4OHcon, 60:12:1.5 for marker 2; and Me2CO/H2O/NH4OHcon, 51:22:7.5 for marker 3. The silica containing the fluorescent peptide was scraped and extracted with 50% trifluoroethanol supplemented according to the peptide nature with acetic acid for markers 1 and 2 and with ammonia for marker 3. The pI values of the peptides were determined by repetitive focusing of S. aureus peptides and the markers on different size and pH gradient IPG strips. Markers were visualized using a 9400 Typhoon scanner (GE Healthcare) with a laser at 527 4338

Journal of Proteome Research • Vol. 7, No. 10, 2008

Figure 2. Fluorescent peptidic markers 1-3 mixed with 100 µg of S. aureus peptides and focused on 13 cm 3-10 linear IPG strips.

nm and an emission filter at 532 nm. The pI values were calculated according to the position of the markers on the IPG strip’s gradient. These values were validated using the predicted pI of the identified background S. aureus peptides, obtained by extraction and analyzed by LC-MS/MS. The pI values of the identified peptides were estimated and compared to the IEF obtained values. The following pI values were calculated: Marker 1, 3.82; Marker 2, 5.01; and Marker 3, 8.56. Sample Preparation. Protein samples (standard protein mixture or S. aureus protein extracts) were solubilized in 300 µL of 50 mM ammonium bicarbonate at pH 8.5 containing 0.05% SDS. The samples were then reduced by 45 mM DTE and alkylated with 100 mM iodoacetamide and digested by trypsin at a protease-to-protein ratio of 1:25 using a microwave. A domestic microwave oven (FUNAI, Hamburg, Germany) with a maximum output power of 850 W and a frequency of 2.45 GHz was set on reduced power (175 W). Samples were placed

pICarver: A Software Tool and Strategy for Peptides IEF Table 1. Percentage of Peptides Correctly Predicted by the pICarver Softwarea fraction

Strip A

Strip B

Strip C

Strip D

1 2 3 4 5 6 7 Average

100 83 55 74 24 33 89 66

100 86 61 63 74

97 90 76 56 64

96 74 74 75 87

77

76

81

a

The best match was obtained when the gradient was adjusted with the pI markers.

in Eppendorf tubes in a holder placed in a beaker containing 500 mL of 25 °C water and irradiated for 6 min. After digestion, the vials were removed from the microwave oven and the reaction was immediately quenched with 1 M formic acid. The final water bath temperature was measured to be 55 °C immediately after microwave irradiation. IPG-IEF. After digestion, peptides were concentrated and desalted using an Oasis HLB 1 cc 10 mg solid-phase extraction cartridge according to the manufacturer’s protocol (Waters, Milford, MA). A total of 100 µg of purified peptides was resuspended in 100 µL (for 7 cm strips) or 200 µL (for 13 cm strips) of IEF buffer containing 4 M urea and 0.2% 3-10 pH ampholines. IPG strips were rehydrated overnight with the peptide solution. IEF was performed on an Ettan IPGphor II system (GE Healthcare) with the following conditions: for 7 cm linear 3-10 strip, 30 min step at 500 V, linear gradient from 500 to 6000 V in 2 h and 6000 V up to 9 kVh, and for 13 cm linear 3-10 strip, 30 min step at 500 V, linear gradient from 500 to 8000 V in 3 h and step at 8000 V up to 30 kVh. Peptide Extraction. After isoelectric focusing, IPG strips were washed 3 times for 10 s in 3 distinct high-boiling point petroleum ether baths in order to remove the paraffin oil. Each strip was then manually cut with a scalpel, and gel pieces were placed in polypropylene tubes containing 80 µL of 0.1% TFA. After 3 times of 30 min incubation with 0.1% TFA, all extracts were pooled. The samples were then cleaned on Oasis HLB µ-Elution 96-Well Plate according to the manufacturer’s protocol. Purified sample was then evaporated to dryness, resuspended in 25 µL of HPLC buffer A (0.1% formic acid in 3% AcN) and stored at -20 °C. Mass Spectrometry. Five microliters of peptide solution of each fraction was loaded on a 10 cm long homemade column with an i.d. of 100 µm, packed with C18 reverse phase (5 Å, YMS-ODS-AQ200, Michrom BioResource, CA). The elution gradient ranged from 4% to 38% solvent B (0.1% formic acid in 80% AcN) in 40 min and samples were eluted directly on a MALDI target using a homemade spotting robot. Matrix (5 mg/ mL ACCA in 50% AcN, 0.1% TFA, 10 mM NH4H2PO4) was applied and dried. Peptides were analyzed in MS and MS/MS mode using 4800 MALDI-TOF/TOF tandem mass spectrometer (Applied Biosystems) with a Nd:YAG laser at 355 nm operating at 200 Hz repetition. Totally, 800 and 1500 consecutive laser shots were accumulated for MS and MS/MS spectra, respectively. Argon was used for CID at a gas pressure of (2-4) × 10-6. Data-dependent MS/MS analysis was performed automatically on the 15 most intense ions from MS spectra. External calibration with Lysozyme C was performed in MS and MS/ MS (precursor at 1753.6 m/z).

research articles After MS/MS analysis, peak lists from each fraction were created with embedded software (4800 explorer 3.0 peak-toMascot) with the following settings: peptide mass range from 60-to-precursor minus 20, minimum S/N 0.5 and maximum 200 peaks per precursor. Database Searching. Peak lists of all the fractions of the same strip were merged together before database searching with Phenyx23 (GeneBio, Switzerland). When known protein mixture was used, searching was performed against UniProtKB/SwissProt/Release54.1 of 21-Aug-2007. For S. aureus samples, searching was performed against a homemade database containing all predicted ORFs from S. aureus genome-sequenced strain N31524 and proteins from other strains with less than 90% identity (11 470 entries extracted from UniProtKB Release 8.5 of 22-Aug-2006, UniProtKB/Swiss-Prot/Release50.5 of 22-Aug2006, UniProtKB/TrEMBL/Release 33.5 of 22-Aug-2006). On Phenyx submission Web page, MALDI-TOF/TOF was selected as instrument type. Two search rounds were selected, both with trypsin selected as enzyme, oxidized methionine as variable modification and carbamidomethylation of cysteine as fixed modification. In the second round, deamidation was also selected as variable modification. In the first round, one missed cleavage with normal cleavage mode was selected, whereas in the second round, three missed cleavages with halfcleaved mode were selected. “Turbo” was only selected in the first round. The minimum peptide length allowed was 6 amino acids. Parent ion tolerance was 0.4 Da. The acceptance criteria were slightly lowered in the second round search (round 1, AC score 8.0, peptide Z-score 6.5, p value 1 × 10-7; round 2, AC score 8.0, peptide Z-score 6.0, p value 1 × 10-7). The peptide false positive rate was estimated to be 3.5% for single-peptide hits by plotting false positive rates obtained from the entire Swiss-Prot TrEMBL database without species restriction against true positive high-score peptides on a ROC-like curve. Singlepeptide hits were manually checked to contain nonredundant “High scoring peptide” with p value < 1 × 10-7 and Z-score > 8. Software Development. The pICarver program’s interface, designed by the SIB, is shown in Figure 1. The program receives a text file (.txt) as input that contains the list of all proteins and their corresponding peptides with their predicted pI values in three adjacent columns. Then the user can select the pH gradient from a scroll down menu. The gradient can be visualized in a separate window. In the gradient window, two functions calculate the position of a particular pI on the strip and vice versa. This allows to calculate the pI of a peptide according to its position on the pH gradient or to locate the position of a specific pI. The length of the IPG strip should be entered in millimiters. As an extra option, known pI markers can be added to calibrate the theoretical gradient to the experimental conditions. Three fractionation schemas are defined for the program: Exact- Fractions of the same size. The size is determined by the user. The desired amount of peptides per fraction function is deactivated in this mode, MultipleMultiples of a specific size determined by the user. This mode is especially useful when fractionation is robotized or Off-Gel technology is used where the fraction size is specifically limited. For larger fractions, extracts of several wells should be merged. In this mode, the user can determine how many peptides per fraction are desirable. In Minimum-, fractions can be of any size but not smaller than the entered minimum fraction size. The user can change the total number of fractions by changing the number of desired peptides for each fraction. Values smaller Journal of Proteome Research • Vol. 7, No. 10, 2008 4339

research articles

Vaezzadeh et al.

Figure 3. (A) Strip A cut in 7 fractions of 10 mm. The number of expected peptides per each fraction and the peptides obtained experimentally are shown. (B) Strip B cut according to the pICarver using a theoretical input list. (C) Strip C cut according to pICarver using an input list obtained experimentally. (D) Strip D cut according to the pICarver with a calibrated pH gradient by the pI markers using an input list obtained experimentally.

than 1 mm are not accepted since physically they are arduous to cut. The last parameter to enter is the desired number of peptides per fraction. This number can be determined according to the usual observed peak capacity of the LC-MS/ MS platform. Taking into account the minimal fraction size, the program will inform the user if the number of peptides in one or several fractions exceeds the entered value. 4340

Journal of Proteome Research • Vol. 7, No. 10, 2008

Results Fluorescent Peptidic pI Markers. Before selecting 5-iodoacetamido fluorescein, various fluorophores (N-(1-pyrene) maleimide and Eosin-5-maleimide) were tested (data not shown). Fluorescein was chosen because of the high purity of the available reagent, its high quantum yield and its visibility both

pICarver: A Software Tool and Strategy for Peptides IEF

research articles

Figure 4. (A) Same size fractionation scheme of Strip A. Number of peptides per fractions is predicted. (B) pICarver fractionation scheme of Strip B. A similar number of peptides per fraction is predicted. (C) Number of S. aureus peptides identified in 13 fractions of the same size. (D) Peptides identified in 10 fractions cut according to the pICarver. Nonredundant (NR) peptides are in blue and redundant peptides are in red. The cumulative curve of the total number of unique peptides (NR Cumul.) is shown in green.

by eye and by the laser scanner. Different peptidic pI markers were synthesized and tagged by 5-iodoacetamido fluorescein. These peptides were tested by focusing on IPG strips of different sizes and pH gradients to determine their experimental pI. The fixation of the fluorescein group on the peptides resulted in pI shifts. Three peptides were retained, which presented a good focusing ability and high solubility. Peptide DDEHACG-NH2 was named Marker 1, peptide Ac-DHHACGNH2 was named Marker 2 and peptide RKHGCA-NH2 was named Marker 3. Markers 1 and 3 had free N-termini and marker 2 had an acetylated N-terminus. Acetylation resulted in more acidic peptides than their nonacetylated form. The predicted pI values of the nonfluorescent markers were 4.06, 6.64, and 9.49 for markers 1, 2, and 3, respectively. The experimentally obtained pI values of fluorescein-labeled markers were 3.82, 5.01, and 8.56 for markers 1, 2, and 3. All fluorescein-coupled peptides had an m/z value of greater than 1000 and ionized well in the MALDI-TOF instrument. The fluorophore added 387 Da to the mass of the peptides. No sign of nonspecific conjugation of the fluorophore was observed. The markers focused in sharp yellow bands visible by eye at high concentrations. They could also easily be detected with the 9400 Typhoon scanner. The reproducibility of the markers focused on pH gradient 3-10 is demonstrated in Figure 2. Application to Standard Protein Mixture. The goal of using the pICarver software and related approach was to reduce the number of fractions and consequently the MS instrument time without any loss of information. Besides the throughput increase, an almost even distribution of peptides per fraction was desirable. One of the main issues was which input peptide list should be used. Therefore, a model experiment was planned

using four standard proteins: Bovine serum albumin, Rabbit phosphorylase b, Chicken ovalbumin and Bovine beta-casein. The mixture was digested and separated on four 7 cm linear 3-10 IPG strips. Four fractionation scenarios were designed to assess the performance of pICarver, relying on: (A) the theoretical list of peptides with seven fractions of even size (1 cm), (B) pICarver using the theoretical tryptic peptide list of the protein mixture with five fractions of various sizes, (C) pICarver with a list of peptides obtained experimentally (derived from two archived experiments using the same technology on the standard mixture) in five fractions of various sizes, and finally (D) pICarver with same conditions as C but with an adjusted gradient using the fluorescent peptidic pI markers. The exact mode with a 10 mm fraction size was selected for strip A and minimum mode was selected for other strips. The minimum fraction size of 4 mm was used for strips B, C, and D. To obtain the desired amount of peptides per fraction, values 100, 66, and 66 were selected for strips B, C, and D, respectively. In the scheme proposed for fractionation of strips C and D, a 4-fold difference in the fraction sizes was observed. The volume of solvent used for extraction of peptides was adapted to the fraction sizes. The position of the three pI markers with known pI values of 3.82, 5.01, and 8.56 was entered in the program to calibrate the commercial IPG strip gradient for strip D. The theoretical and experimental lists are provided in Supplementary Data 1 in Supporting Information. A similar amount of total peptides were identified in all strips. Strip A had a heterogeneous peptide distribution with 6-fold difference between fractions 1 and 6. The peptides distribution was more homogeneous when pICarver was used. The maximum differences in the number of peptides per Journal of Proteome Research • Vol. 7, No. 10, 2008 4341

research articles fraction were 2.7, 1.3, and 1.4, for strips B, C, and D, respectively. It was observed that 70% of the identified peptides were common within the four strips. On average, 85% of the identified peptides existed in the experimentally obtained list, whereas only 24% of the identified peptides matched those in the theoretical list. A precision value was calculated by averaging the percentage of peptides identified correctly in the predicted pICarver fractions. This value was used to measure the accuracy of the pICarver prediction to which fraction belong the identified peptides. As demonstrated in Table 1, the lowest precision values were obtained for strip A and highest values were obtained for strip D, where pI markers were used. The complete list of identified peptides for each strip is available in Supplementary Data 2 in Supporting Information. Application to S. aureus Proteome. The D fractionation scenario was the best according to the results obtained with standard proteins, (pICarver and an experimental list of peptides). The calibration of the pH gradient improved the matching between the pICarver predicted distribution and the one obtained experimentally (Table 1). To create a practical list of S. aureus peptides, results from 15 Shotgun IPG-IEF experiments analyzed by different LC-MS/MS platforms were pooled together. The merged list contained a total of 16 184 peptides corresponding to 1968 proteins, which represents around 76% of the S. aureus proteome. Only peptides identified at least in three experiments were selected for the input list of the pICarver. The input list contained 4503 tryptic and semitryptic peptides with one to two missed cleavages corresponding to 893 proteins. This list is provided in Supplementary Data 3 in Supporting Information. S. aureus peptides were separated on two 13 cm linear 3-10 IPG strips in parallel. The first strip (A) was cut in 13 fractions of 10 mm. The second strip (B) was cut according to pICarver by selecting the minimum mode with smallest fraction size of 4 mm. A 416 was the entered value for the desired number of peptides per fraction in order to obtain a similar distribution of peptides among all fractions. After the focusing, the strips were scanned and the position values of the three pI markers were added and the pH gradient was adjusted. As illustrated in Figure 4, fractions of different sizes were obtained. For example, fraction 9 was almost 10 times the size of fraction 6. Results shown in Figure 4 point out that although there are slightly more peptides identified in Strip A, the total number of proteins in both strips were similar (371 in strip A and 369 Strip B) with 85% similarity between the two. However, a better homogeneity in peptide distribution per fractions was achieved using the pICarver comparing to same size fractions. Even though in fraction 2 of Strip A almost 300 peptides were identified, fraction 10 contained no peptides. The cumulative curves in Figure 4 also show a constant and linear increase in the number of total peptides from fraction to fraction in Strip B comparing to Strip A. The use of pI markers in Strip B confirmed the results with the standard proteins model and improved the percentage of correctly predicted peptides. The complete list of identified peptides for each strip is available in Supplementary Data 4 in Supporting Information.

Discussion pI Markers. Although the commercially available IPG strips are usually reproducible, the position of the commercial gel gradient on the plastic backing may vary and the quality of the focusing can differ depending on the nature and purity 4342

Journal of Proteome Research • Vol. 7, No. 10, 2008

Vaezzadeh et al. of the samples. Therefore, internal quality control elements are essential. For this purpose, proteins, for which pI values were established, have been used in slab gel IEF as standard substances, that is, pI markers.25 The number of naturally fluorescent proteins that can be excited with visible lasers typically used is limited. Additionally, the labeling of proteins at their amino groups with a fluorescent dye would likely produce heterogeneous products as a result of the variation of the number and the sites of the label. Shimura and coworkers previously reported the use of fluorescent-dye labeling of commercially available peptides at their N-termini as fluorescent pI markers.26 Even though their labeled peptides allowed calibration of the pH gradient, the utility of their peptides was limited by the narrow pH range covered and their poor solubility. The development of fluorescent pI markers was also reported by two separate groups. One used dansylated peptides and some dansylated components of commercially available carrier ampholytes.27 The other used low molecular-mass compounds that can be excited with UV light, but their chemical structure was not reported.28 Nakhleh et al. introduced low molecular mass amphoteric dyes but encountered problems ranging from marker precipitation to covalent interaction with other molecules present in the gel.29 Development of azo dye compounds was also reported by Stastna et al.30 We initiated a project to develop a set of fluorescent pI markers by labeling synthetic peptides that were specifically designed for the purpose of being used with the Shotgun IEF technology. A good pI marker should have a good focusing ability, low hydrophobicity, sufficient purity, high stability and good detectability by MS. Besides their use as an internal control to check the quality of focusing and reproducibility, the pI markers help to fine-tune the theoretical pH gradient of the commercial IPG that may be irregular due to fabrication or focusing. The developed pI markers were required to have the same behavior during the focusing, similar retention time in the LC and comparable ionization in the MS instrument as the samples peptides. In this way, the markers could be used in all different steps of the Shotgun IEF pipeline. We initially intended to develop markers in the low populated pI areas of the peptide 2D map of S. aureus and other species (Figure 5). The major hindrance to this goal was a pI shift induced by the addition of the fluorophore group on the peptide. This shift was inconsistent and depended on the peptide sequence and the focusing conditions. It also depended on the pH gradient of the IPG strip. For example, in the acidic range (3-5), most of the native and basic sites can be expected to be fully protonated, so that any shift in pI should be minor. Fluorescein was chosen as the fluorophore. Fluorescein has an absorption maximum at 494 nm and emission maximum of 521 nm. We obtained three pI markers at two extremities (Markers 1 and 3) of the gradient and one almost in the middle (Marker 2). These fluorescent pI markers were highly sensitive. The fluorescence was easily detected with Typhoon scanner at concentrations as low as 50 ng. Their stability and hazardless nature comparing to radioactive material, as well as their low price, make them excellent markers to be used throughout the Shotgun IEF pipeline. pICarver Tool. To date, all applications using IEF, as the first dimension of shotgun proteomics, separate peptides in fractions of equitable sizes. This situation persists despite the important heterogeneity of the theoretical pI distribution of

pICarver: A Software Tool and Strategy for Peptides IEF

Figure 5. Theoretical peptide 2D maps of (A) S. aureus and (B) human proteomes.

the peptides, which is an intrinsic physiochemical property that can be predicted. Peptides naturally tend to cluster in 4 distinct zones with gaps at around pI 3.0, 5.0, 7.5, and 9.4. This observation is consistent in most species as shown for human and S. aureus proteomes in Figure 5. In addition, previous experiments showed uneven number of peptides per fractions. The pICarver program is designed to take this information into account in order to increase the throughput and decrease the analysis time by merging peptide-poor regions and cutting smaller fractions in peptide-rich regions. This generates an even distribution of peptides per fraction and increases the resolution of the separation. The program also provides the list of peptides in each predicted fraction, which can be used to validate the identified peptides and further study possible outliers and detect false positives and PTMs. pICarver software uses an input list provided by the user to learn the distribution of the peptides. The input list can contain the theoretical tryptic peptides of a certain sample. This list is generated by in silico proteolysis of the proteins derived from genome information. However, only a small portion of these peptides will be detected in reality. The main reason can be that in silico digestion does not count for incomplete digestion, proteome dynamic range and ionization efficiency. Recently, some groups have developed prediction tools to predict the proteotypic peptides, which are usually identified using specific platforms.31,32 Mallick and co-workers used more than 600 000 peptide identifications generated by four proteomic platforms to generate more than 16 000 proteotypic peptides for more than 4000 yeast proteins. Characteristics of these peptides were used to develop a computational tool to predict proteotypic peptides. Yet, in-solution digestion usually results in many semitryptic and missed-cleavages. These peptides are not

research articles considered in the prediction tools. The comparison of strips B and C in Figure 3 obviously shows that the input list used in strip C, which was an experimentally acquired list, is much closer to reality than the purely theoretical tryptic peptides list used for strip B. Although 85% of the peptides identified in all strips existed in the experimentally obtained list, only 24% matched the purely theoretical list. The same phenomenon was observed for S. aureus. The experimental list was obtained by merging results from 15 experiments using the same technology. To reduce the number of random peptides, only peptides detected 3 times or more were selected for the input list. It is interesting to note that repeating this kind of shotgun experiment can result in high coverage of the proteome.33 A 74% coverage of S. aureus proteome (1968 proteins of 2575 ORFs) was obtained by pooling data from 15 distinct experiments. This list contains all the semitryptic and missed-cleaved peptides obtained previously. The semitryptic peptides cannot be predicted in silico. Nonetheless, the copy number of all peptides is assumed to be similar. Information from mRNA microarray can be useful to add an extra level of information, but the direct correlation between these data is not straightforward.34 The use of pICarver, as shown in Figures 3 and 4, allowed reducing the number of fractions without any significant loss of information. Less fractions equals less sample preparation, less LC time and less MS instrument time. Additionally, the prior knowledge of an approximate number of expected peptides would allow adapting the LC gradient for fractions which exceed the desired number of peptides. We aimed to obtain a more homogeneous distribution of the peptides and to reduce the number of fractions without loss of information. However, if the goal is to identify more peptide, the LC gradient should be adapted according to pICarver information. Then, the pICarver fractionation would even result in better coverage of the proteome. Figure 3 shows that, as expected, the same size fractionation (strip A) resulted in an uneven distribution of peptides. The use of pICarver eliminated this problem as shown in Figure 3 for strips B-D. However, there were still some differences probably due to random sampling or to the inaccuracy of the input list. Alternatively, if the fractions are not cut in a complete cubical form, it might result in the redundancy of two adjacent fractions. In addition, some abundant peptides may overload their pI section of the gel and be found in other fractions. Further tests should be done to determine criteria with which the peptides of the input list should be chosen. It is important to note that peptides with any kind of PTM were excluded for the distribution analysis. The reason is that most PTMs induce a shift in the pI and peptides are found in different fractions than the ones predicted by pICarver. But this shift can be used to detect the modified peptides and even false positives. Carbamidomethylation of cysteines, oxidation of methionines and deamidation of asparagine and glutamine were often detected. Most of these PTMs result in cathodic shift in the pI of the peptide. Modifications such as carbamidomethylation do not contribute a negative charge but neutralize the positive charge of other residues. Deamidation results in cathodic shift as well due to the generation of negatively charged aspartic or glutamic acid residues. Many peptide sequences were matched as deamidations of Asn (N) or Glu (Q) residues. This common modification usually arises from sample preparation. In effect, the basic N and Q residues are converted into acidic Asp and Glu residues resulting in a cathodic pI shift. Half-cleaved peptides also show Journal of Proteome Research • Vol. 7, No. 10, 2008 4343

research articles a cathodic pI shift comparing to their fully tryptic counterparts. The reason is the loss of positive arginine or lysine residues. Considering an error value, peptides generated by pICarver could be compared with the theoretical list in order to detect false positives and PTMs. The error value should depend on the pI range of the strip. It should be higher in basic parts, since the low focusing quality in this zone is well-known. For example, peptide HHLNQVDTIFQR, with a calculated pI value of 6.65, was predicted to be found in fraction 8 of Strip A (Figure 3), while it was identified in fraction 6. The reason was the deamidation of Q, which resulted in a cathodic shift of two fractions. Peptide pI prediction algorithms, which take account of the PTMs in their calculations, should facilitate the prediction of the correct position of this kind of peptides. As shown in Figure 3, the comparison of strips C and D, demonstrates the usefulness of pI markers to correct the pH gradient and decrease the deviation of the peptides from their predicted fractions. Strip D was cut by pICarver with the same parameters as strip C. In addition, the three pI markers’ positions were entered in the software before creating the fractionation scheme. Another advantage of the fluorescein pI markers is that they are visible by eye. This advantage makes the direct match of the fluorescent bands on the gel to the fractionation scheme possible. Experience proves that the beginning of the gel on the plastic backing is not necessarily always the beginning of the theoretical gradient. Although the deviation is higher for the S. aureus samples when compared to the standard proteins mixture, it is still significantly lower than the strip which was cut in conventional same size fractions.

Conclusion Here, the development of the pICarver software tool was presented. This tool performs the IEF fractionation of shotgun produced peptides in a rational manner using prior experimental and theoretical knowledge. The use of pICarver and synthesized fluorescent pI markers in Shotgun IEF experiments generate an almost even number of peptides per fraction. They save user and instrument time for analyzing peptide-poor fractions. Additionally, the prior knowledge of the number of peptides can help to optimize the resolution of the twodimensional separation and thus lead to the identification of low-abundance proteins. Another possible use of the pICarver is its ability to perform targeted experiments. In other words, the program is capable of precisely determining the position of one or several specific peptides. This will radically increase the throughput of the pipeline and decrease the time line by avoiding to scan the whole pH gradient. The targeted approach relying on pICarver can be used in Multiple Reaction Monitoring (MRM)35 type of experiments. An interesting complementary future development would be the adaptation of the LC gradient to extend the computation to peptide-rich regions. pICarver would perform better since the prior knowledge of the sample complexity would allow choosing an appropriate LC gradient thereby increasing the resolution and the number of peptides identified. The development of an accurate pI prediction algorithm is needed to precisely determine the pI shift induced on the peptides due to PTMs. There is no doubt that the extent of the pI shift upon PTM depends on the size and the native pI of the peptide, as well as the size and the nature of the modifying group. Such information is available for proteins but not for peptides.36,37 The PTM effect must be more important on peptides due to 4344

Journal of Proteome Research • Vol. 7, No. 10, 2008

Vaezzadeh et al. their small size and low buffer capacity. In our research work using standard proteins and bacteria material, the PTM effect was simply excluded with negligible consequences, but with other samples such as eukaryotes or human proteins, it should be taken into consideration. Another important issue is the copy number. The pICarver considers all peptides to be in equal concentration. This might result in wrong fraction size predictions or in the redundancy of high-abundance peptides in several fractions. The problem could be tackled while exploiting the correlation between the transcriptomics data and proteomics data. pICarver software tool can also be used in other Shotgun IEF approaches, such as Off-Gel and FFE, to design the fractionation scheme and to adapt the consequent LC resolving power to peptide density in each fraction. This work is an example of the usefulness of previously obtained proteomics data in planning and performing new experiments using bioinformatics tools. Recently, a similar approach was also proposed for Gas-Phase fractionation.38 The development of the pICarver software can be a role model for similar paradigms in the use of proteomics data to pave the road to highthroughput proteomics. The pICarver program is freely accessible on www.expasy.org/tools/pICarver.

Supporting Information Available: Supplementary data 1, theoretical and experimental list of peptides derived from standard proteins mixture used as input for pICarver software; Supplementary data 2, list of peptides identified in each fraction of strips A, B, C and D using standard proteins model; Supplementary data 3, list of experimentally obtained S. aureus peptides used as input for pICarver software; Supplementary data 4, list of S. aureus peptides identified in each fraction of strips A and B. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Aebersold, R.; Mann, M. Nature 2003, 422, 198–207. (2) McCormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R. Anal. Chem. 1997, 69, 767–776. (3) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R. Nat. Biotechnol. 1999, 17, 676– 682. (4) Cargile, B. J.; Talley, D. L.; Stephenson, J. L., Jr. Electrophoresis 2004, 25, 936–945. (5) Krijgsveld, J.; Gauci, S.; Dormeyer, W.; Heck, A. J. J. Proteome Res. 2006, 5, 1721–1730. (6) Xie, H.; Bandhakavi, S.; Griffin, T. J. Anal. Chem. 2005, 77, 3198– 3207. (7) Cargile, B. J.; Sevinsky, J. R.; Essader, A. S.; Stephenson, J. L., Jr.; Bundy, J. L. J. Biomol. Tech. 2005, 16, 181–189. (8) Chick, J. M.; Haynes, P. A.; Molloy, M. P.; Bjellqvist, B.; Baker, M. S.; Len, A. C. J. Proteome Res. 2008, 7, 1036–1045. (9) Scherl, A.; Francois, P.; Charbonnier, Y.; Deshusses, J. M.; Koessler, T.; Huyghe, A.; Bento, M.; Stahl-Zeng, J.; Fischer, A.; Masselot, A.; Vaezzadeh, A.; Galle, F.; Renzoni, A.; Vaudaux, P.; Lew, D.; Zimmermann-Ivol, C. G.; Binz, P. A.; Sanchez, J. C.; Hochstrasser, D. F.; Schrenzel, J. BMC Genomics 2006, 7, 296. (10) Mazzeo, J. R.; Martineau, J. A.; Krull, I. S. Anal. Biochem. 1993, 208, 323–329. (11) Kasicka, V. Electrophoresis 2008, 29, 179–206. (12) Tan, A.; Pashkova, A.; Zang, L.; Foret, F.; Karger, B. L. Electrophoresis 2002, 23, 3599–3607. (13) Cantin, G. T.; Venable, J. D.; Cociorva, D.; Yates, J. R., III J. Proteome Res. 2006, 5, 127–134. (14) Moritz, R. L.; Ji, H.; Schutz, F.; Connolly, L. M.; Kapp, E. A.; Speed, T. P.; Simpson, R. J. Anal. Chem. 2004, 76, 4811–4824. (15) Malmstrom, J.; Lee, H.; Nesvizhskii, A. I.; Shteynberg, D.; Mohanty, S.; Brunner, E.; Ye, M.; Weber, G.; Eckerskorn, C.; Aebersold, R. J. Proteome Res. 2006, 5, 2241–2249. (16) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Anal. Chem. 2002, 74, 5383–5392.

research articles

pICarver: A Software Tool and Strategy for Peptides IEF (17) Xie, H.; Bandhakavi, S.; Roe, M. R.; Griffin, T. J. J. Proteome Res. 2007, 6, 2019–2026. (18) Horth, P.; Miller, C. A.; Preckel, T.; Wenz, C. Mol. Cell. Proteomics 2006, 5, 1968–1974. (19) Heller, M.; Ye, M.; Michel, P. E.; Morier, P.; Stalder, D.; Junger, M. A.; Aebersold, R.; Reymond, F.; Rossier, J. S. J. Proteome Res. 2005, 4, 2273–2282. (20) Cargile, B. J.; Bundy, J. L.; Freeman, T. W.; Stephenson, J. L., Jr J. Proteome Res. 2004, 3, 112–119. (21) Scherl, A.; Francois, P.; Bento, M.; Deshusses, J. M.; Charbonnier, Y.; Converset, V.; Huyghe, A.; Walter, N.; Hoogland, C.; Appel, R. D.; Sanchez, J. C.; Zimmermann-Ivol, C. G.; Corthals, G. L.; Hochstrasser, D. F.; Schrenzel, J. J. Microbiol. Methods 2005, 60, 247– 257. (22) Wellings, D. A.; Atherton, E. Methods Enzymol. 1997, 289, 44–67. (23) Colinge, J.; Masselot, A.; Giron, M.; Dessingy, T.; Magnin, J. Proteomics 2003, 3, 1454–1463. (24) Kuroda, M.; Ohta, T.; Uchiyama, I.; Baba, T.; Yuzawa, H.; Kobayashi, I.; Cui, L.; Oguchi, A.; Aoki, K.; Nagai, Y.; Lian, J.; Ito, T.; Kanamori, M.; Matsumaru, H.; Maruyama, A.; Murakami, H.; Hosoyama, A.; Mizutani-Ui, Y.; Takahashi, N. K.; Sawano, T.; Inoue, R.; Kaito, C.; Sekimizu, K.; Hirakawa, H.; Kuhara, S.; Goto, S.; Yabuzaki, J.; Kanehisa, M.; Yamashita, A.; Oshima, K.; Furuya, K.; Yoshino, C.; Shiba, T.; Hattori, M.; Ogasawara, N.; Hayashi, H.; Hiramatsu, K. Lancet 2001, 357, 1225–1240. (25) Righetti, P. G. Isoelectric Focusing: Theory, Methodology, and Applications; Elsevier Biomedical Press;Amsterdam, Netherlands; New York, 1983.

(26) Shimura, K.; Kamiya, K.; Matsumoto, H.; Kasai, K. Anal. Chem. 2002, 74, 1046–1053. (27) Kobayashi, N. Tanpakushitsu Kakusan Koso 2004, 49, 1333–1340. (28) Horka, M.; Willimann, T.; Blum, M.; Nording, P.; Friedl, Z.; Slais, K. J. Chromatogr., A 2001, 916, 65–71. (29) Nakhleh, E. T.; Samra, S. A.; Awdeh, Z. L. Anal. Biochem. 1972, 49, 218–224. (30) Stastna, M.; Travnicek, M.; Slais, K. Electrophoresis 2005, 26, 53– 59. (31) Blonder, J.; Veenstra, T. D. Expert Rev. Proteomics 2007, 4, 351– 354. (32) Mallick, P.; Schirle, M.; Chen, S. S.; Flory, M. R.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T.; Kuster, B.; Aebersold, R. Nat. Biotechnol. 2007, 25, 125–131. (33) Liu, H.; Sadygov, R. G.; Yates, J. R., III Anal. Chem. 2004, 76, 4193– 4201. (34) Gygi, S. P.; Rochon, Y.; Franza, B. R.; Aebersold, R. Mol. Cell. Biol. 1999, 19, 1720–1730. (35) Anderson, L.; Hunter, C. L. Mol. Cell. Proteomics 2006, 5, 573– 588. (36) Halligan, B. D.; Ruotti, V.; Jin, W.; Laffoon, S.; Twigger, S. N.; Dratz, E. A. Nucleic Acids Res. 2004, 32, W638–644. (37) Kumar, Y.; Khachane, A.; Belwal, M.; Das, S.; Somsundaram, K.; Tatu, U. Proteomics 2004, 4, 1672–1683. (38) Scherl, A.; Shaffer, S. A.; Taylor, G. K.; Kulasekara, H. D.; Miller, S. I.; Goodlett, D. R. Anal. Chem. 2008, 80, 1182–1191.

PR8002672

Journal of Proteome Research • Vol. 7, No. 10, 2008 4345