“Plasmo2D”: An Ancillary Proteomic Tool to Aid Identification of Proteins from Plasmodium falciparum Amit Khachane, Ranjit Kumar, Sanyam Jain, Samta Jain, Gowrishankar Banumathy, Varsha Singh, Saurabh Nagpal, and Utpal Tatu* Department of Biochemistry, Indian Institute of Science, Bangalore 560012, India Received August 30, 2005
Abstract: Bioinformatics tools to aid gene and protein sequence analysis have become an integral part of biology in the post-genomic era. Release of the Plasmodium falciparum genome sequence has allowed biologists to define the gene and the predicted protein content as well as their sequences in the parasite. Using pI and molecular weight as characteristics unique to each protein, we have developed a bioinformatics tool to aid identification of proteins from Plasmodium falciparum. The tool makes use of a Virtual 2-DE generated by plotting all of the proteins from the Plasmodium database on a pI versus molecular weight scale. Proteins are identified by comparing the position of migration of desired protein spots from an experimental 2-DE and that on a virtual 2-DE. The procedure has been automated in the form of userfriendly software called “Plasmo2D”. The tool can be downloaded from http://144.16.89.25/Plasmo2D.zip. Keywords: Plasmo2D • bioinformatics • proteomics • malaria • heat shock proteins
Introduction Release of the Plasmodium falciparum genome sequence two years ago has made it possible to identify previously unknown genes and pathways in the malarial parasite.1 The complex task of understanding the biology of the parasite has received a definite framework, which is to examine the expression and functions of 5331 genes present in the parasite genome. Plasmodium gene arrays, developed using the genome data, have facilitated expression profiling of parasite genes during different stages of its life cycle.2-4 Proteomic approaches using mass spectrometric analysis have also been initiated for protein identification and examination of their temporal expressions in the parasite.5,6 Efficient use of genomic and proteomic technologies however depends on the supporting bioinformatics tools.7-13 Using Plasmodium falciparum genome data, we have developed a bioinformatics tool that will facilitate analysis of protein expression and identification in the parasite. By comparing the isoelectric point (pI) and molecular weight (MW) of protein of interest with those reported in the Plasmodium * To whom correspondence should be addressed. Phone: 080 22932823. Fax: 080 23600814. E-mail:
[email protected]. 10.1021/pr050289p CCC: $30.25
2005 American Chemical Society
database, the tool allows tentative but rapid identification of proteins. Out of a total of 5331 genes reported in the Plasmodium falciparum genome, the tool allows you to narrow down the identity of protein of interest to a list of a few candidates. In addition to providing a rapid identification of unknown protein spots, the tool can serve as an important accessory to aid final, mass spectrometric identification of proteins. Additionally, the tool can be used in an average biochemistry laboratory on a routine basis. The uniqueness of this tool is that it aids in the identification of proteins from the scanned experimental 2-DE gel image by utilizing genome information of the parasite. Inclusion of the genome information adds a new dimension in the proteome analysis from 2-DE. Conventional manual hunt for the required pI and MW proteins into a protein database is indeed time-consuming and laborious. The approach consists of the following three steps: (1) examination of migration of protein of interest on twodimensional gel electrophoresis (2-DE), (2) superimposition of the experimental gel to a virtual 2-DE generated by plotting all the proteins of P. falciparum from www.plasmodb.org on a pI versus MW scale and (3) matching the position of migration of the protein from the experimental 2-DE with the theoretical 2-DE to make protein identification. The approach allows you to make tentative identification of proteins of interest by 2-DE.14 We have incorporated steps 2 and 3 of this approach in the form of user-friendly software called “Plasmo2D” that performs gel alignment and protein identification computationally. We examined the validity of “Plasmo2D” by using an experimental 2-DE on which parasite lysate were subjected to western blotting using antibodies specific of 3 abundant parasite proteins. Using Western blotting and MALDI-TOF mass spectrometry, we were able to identify spots corresponding to PfHsp70 (70 kDa Heat shock protein of Plasmodium falciparum), PfHsp90 (90 kDa Heat shock protein of Plasmodium falciparum), PfBiP (72 kDa Binding protein of Plasmodium falciparum), Pftubulin, PfPDI (Protein disulfide isomerase of Plasmodium falciparum) among other proteins, by “Plasmo2D”.15,16 The results confirmed the validity and use of “Plasmo2D” for identification of unknown proteins from the malarial parasite. A similar approach can be implemented for analysis of proteomes of other sequenced genomes.
Experimental Section Development of the Tool. Plasmo2D was developed using Visual Basic version 6.0 (Microsoft Inc.). Differing pI and molecular weight markers were subjected to 2-DE on a 10% Journal of Proteome Research 2005, 4, 2369-2374
2369
Published on Web 10/18/2005
technical notes
Proteomic Tool for Protein ID from P. falciparum
resolving gel. The migration distances along pI and MW were noted [Supporting Information Table 1]. pI is plotted on X-axis and MW on Y-axis. The above-derived values were considered as training set to generate mathematical equations using CURVEEXPERT 1.37 software (D. Hyams, Starkville, MS). The mathematical equations derived are as follows: Molecular weight equation: migration dist. (Y-axis) ) ((9.4814173 * MW 1.1576522) 375.104)/(33.710706 + MW 1.1576522) “MW” is Molecular weight pI equation: migration dist. (X-axis) ) (1.1319 * pI) - 3.9028 The standard error of estimation (SE) for pI equation was 0.19 cm, and for the molecular weight equation, the SE is ∼0.21 cm. Theoretical pI and molecular weight of all the proteins encoded by the parasite genome were predicted using the algorithm as described.17 Using the above-generated mathematical equations, the parasite proteome was theoretically plotted (which formed the Virtual 2-D). Once the desired protein spot on a scanned experimental gel image is clicked using the mouse cursor, the corresponding X and Y coordinates of the spot are matched to the spots corresponding to that in the Virtual 2-D dataset. To take into account the SE while calculating migration distance and to compensate for the shift due to post-translational modification, a circular radius of 0.4 cm around the protein spot in the gel is incorporated, thus making sure that the protein of interest is not missed. All the proteins, which fall within the 0.4 cm radius of the clicked spot, are considered a match. The 0.4 cm radius corresponds to a maximum of 0.3 units pI shift. Preparation of Parasite Lysate. P. falciparum infected erythrocytes were metabolically labeled with [35S]-cysteineand methionine, washed twice with PBS and lysed in 10 volumes of NETT buffer (300 mM NaCl, 1 mM EDTA, 10 mM Tris pH 7.5, and 1% Triton X-100) supplemented with protease inhibitors. The lysate was separated from the pellet by centrifugation at 20 000 × g for 20 min at 4 °C and the clarified lysate was solubilized in 2D lysis buffer containing 9.5 M urea, 4% CHAPS, 2% pharmalytes of pI range 3-10 and 65mM DTT.18 A fraction of the total lysate was subjected to 2-DE and phosphorimager analysis. For Western blotting19 and MALDITOF analysis, the infected cells were lysed using 0.15% saponin and the parasite pellet was lysed directly in 2D lysis buffer. 2-DE, In-Gel Proteolytic Digestion and Peptide Mass Fingerprint Analysis by MALDI-TOF. The sample solubilized in 2D lysis buffer was loaded onto 7 cm IPG (Immobilized pH Gradient) strip by active rehydration at 50 V for 10 h. Following this, the rehydrated IPG strips were subjected to IEF on Ettan IPGPhore. After the run was over, the tube gels were incubated in 10 mL of equilibration buffer (6 M Urea, 125 mM Tris, 65 mM DTT, 30% glycerol, and 2% SDS, pH 8.8) for 15 min. The focused IPG strips were then laid horizontally on top of 10% SDS-polyacrylamide gels and sealed with 1% agarose in SDSPAGE running buffer (50 mM Tris, 380 mM glycine, and 0.1% SDS). SDS-PAGE was carried out at 110 V for 2 h and 10 minutes. The proteins from 2DE were transferred onto nitrocellulose membrane and probed with specific antibodies. Alternatively, Coomassie-stained protein spots of interest were cut into 1 mm3 pieces from 2-DE gels, and processed for ingel proteolytic digestion as described.20 In-gel digestion was 2370
Journal of Proteome Research • Vol. 4, No. 6, 2005
carried out using trypsin 20 ng/mL at 37 °C for 12 h. Peptide digests were mixed with equal volume of matrix (R-cyano 4-hydroxy cinnamic acid) and peptide mass spectra were recorded using ‘Ettan MALDI-TOF Pro’ (Amersham Biosciences, Sweden). Protein Identification of peptide fragments was performed by using in built ‘Ettan MALDI Software’ with ‘proteo Matrics LLC’ search engine. The criterion given for search was oxidation at methionine residues, carbamidomethylation at cysteine residues, up to 1 missed cleavage was allowed and the protein was searched against the nonredundant database.
Results “Plasmo2D” analyzes P. falciparum proteome based on their pI and molecular weight. By plotting all the 5331 proteins present in the P. falciparum database as a function of their pI and molecular weight, we generated a virtual 2D for the parasite proteome. The scale on the X-axis was established based on migration of proteins with known pI values after isoelectric focusing and the scale on the Y-axis was established based on distance of migration of molecular weight standards on SDSPAGE (see methods). Figure 1, shows a virtual 2-D where we plotted all the proteins from the Plasmodium database with coordinate on the X-axis being the theoretically calculated pI value and Y-axis corresponding to distance of migration based on their molecular weights. The parasite proteome showed bimodal distribution in the basic range (between pI of 8 to 10) and in the acidic range (between pI 4 to 7). Proteins with relatively basic pI were somewhat more abundant than those with acidic pI values. There were very few proteins with pI values below 4.5 and also very few proteins with pI values above 10. Interestingly, the pI range around pH 7 was also very sparsely populated as shown in Figure 1 (Table). Description of the Tool. “Plasmo2D” is a semiautomatic user-friendly tool that aids in the identification of proteins in the P. falciparum 3D7 proteome from the 2-DE gel image. The tool works on windows operating system. On executing the program, the user has to provide as input the complete path of the 2D image file in .gif or.jpeg format. Subsequently, the image is loaded. The window on which image is being loaded has pI and MW markers marked on it, based on the migration distance data. To minimize gel/image distortion, the user can overlap the markers of the gel image on the marker provided by the software using the image resizes buttons in the software. Now user can click on the desired protein spot to retrieve the information about it. Background Dataset (Virtual 2D) Correlation. The coordinates of the clicked spot are compared with the coordinates of all the proteins in the dataset. All proteins present within 0.4 cm radius are considered a match. The stringency of search can however be adjusted by varying the radius size. The Output. The output window of Plasmo2D displays the list of potential candidate proteins corresponding to the spot of interest. The list provides (a) PlasmoDB protein ID (b) Name of the protein (c) pI (d) Molecular weight (e) Score. The score reflects the confidence level in prediction of the right protein. The scoring system is taken from the data available from the analysis of protein expression for the malarial parasite.21 Here, the expression level of individual parasite proteins, derived from the spectral count of their peptides has been taken into account. The scoring system thereby provides greater score to protein that are expressed at high levels. In addition a facility
technical notes
Khachane et al.
Figure 1. Screenshot of the different interfaces of“Plasmo2D”. The input frame displays the 2-DE gel image. On clicking the spot on the image using the mouse cursor, the virtual 2D dataset is queried at the background and the likely set of proteins are displayed in a different output frame. Theoretical pI and molecular weight of all the annotated P. falciparum protein sequences were plotted (Virtual 2D proteome profile) with pI along X-axis and molecular weight along Y-axis corresponding to their migration distances. Journal of Proteome Research • Vol. 4, No. 6, 2005 2371
Proteomic Tool for Protein ID from P. falciparum
technical notes
Figure 2. Validation of ‘Plasmo2D’ by western blotting and MALDI/TOF mass spectrometry. Plasmodium falciparum-infected erythrocytes were metabolically labeled and lysed in NETT buffer. Panel A shows the labeled protein profile of P. falciparum cell lysate resolved on a 2-DE. Panel B indicates the location of spots in immunoblots probed with anti-PfHsp90 (spot 1), anti-PfHsp70 (spot 2) and anti-PfBiP (spot 3). Panel C shows the lists of proteins identified by ‘Plasmo2D’ corresponding to spots 1, 2, 3, 4, and 5 marked in panel A. It also shows MALDI-TOF spectra and peptide mass fingerprint analysis of the above-mentioned spots. All the mass spectrometry protein identification search results are significant with E-value < 0.5 and Rank 1, where ‘Rank’ is the probability that the candidate protein is the sample protein and ‘E-value’ is the statistical expectation value with a value of representing 1% statistical probability that the protein is a random hit.
is provided to classify the listed proteins according to their intracellular localization and their proposed function. The classification is based on the information available on PlasmoDB (www.plasmodb.org) web site. A screenshot of different interfaces of Plasmo2D is shown in Figure 1(Input and Output). Validity of Plasmo2D: The Tool Identifies Proteins of Varied Size and pI in the Parasite Proteome. We used “Plasmo2D” to analyze proteins from Plasmodium falciparum resolved on a 2-DE. We selected three among the abundant 2372
Journal of Proteome Research • Vol. 4, No. 6, 2005
protein spots (labeled as Spots 1, 2, and 3) for analyses using “Plasmo2D”. Cell lysate was prepared saponin-freed parasites (see methods) by lysing the parasites in 2D lysis buffer and examined by 2-DE and western blotting using anti-PfHsp90, anti-PfHsp70 and anti-PfBiP. The immunoblots were subjected to Plasmo2D analysis. As shown in Figure 2A, several protein spots were visible on the 2-DE of metabolically labeled total parasite lysate. Figure 2B shows the location of spots in immunoblots probed with anti-PfHsp90 (spot 1), anti-PfHsp70
technical notes (spot 2) and anti-PfBiP (spot 3). Figure 2C shows the lists of proteins identified by the program for these spots. Indeed these proteins were included in the list derived by “Plasmo2D”. We confirmed the identities of 3 of these spots also by MALDITOF. Tryptic digestion of the protein spots was performed as described under methods. It also shows peptide mass fingerprints of these proteins. Analysis of these peptide mass fingerprints using search engines revealed the identity of these proteins to be PfHsp90 for spot no. 1, PfHsp70 for spot no. 2, and PfBiP for spot no. 3. The results confirmed the validity of ‘Plasmo2D’ approach. We also examined the efficiency of the program for 2 unknown proteins from 2-DE. The MALDI-TOF analysis was done for spots 4 and 5 (labeled in Figure 2A). The identities of spots 4 and 5 were Pftubulin and PfPDI, respectively. These proteins were also included in the list derived by “Plasmo2D”. In all of the above cases the correct protein ID was given the highest score by plasmo 2D. The analysis confirmed that “Plasmo2D” is able to provide tentative identifications of proteins from their 2-DE profiles.
Discussion Developing efficient tools to organize, analyze, and harness the information content in the genome sequences has become an important priority in the post genomic era. Release of Plasmodium falciparum genome sequence two years ago has triggered a burst of activity among bioinformaticians interested in parasite biology. In addition to purely bioinformatics tools for sequence analysis, alignments, search for homologues, modeling, and prediction of functions, other tools that interface theoretical and experimental approaches have also been developed to aid genomic and proteomic approaches. Identification of low abundance proteins expressed during different stages of the parasite, from protein complexes and from subcellular fractions is an important part of proteomic studies. Mass spectrometric analysis of peptide mass fingerprints is currently the method of choice for protein identification.22 Several bioinformatics tools have been developed to support analysis of peptide mass fingerprints for identification of protein being analyzed. There are several commercial tools available for gel to gel comparisons and analysis of 2-DE (Melanie, Progenesis, 2Dquest, Decode, Gel-Pro analyzer, Gel compart, Protplot, Flicker etc). These tools are more useful in gel warping and spot Quantitation. None of these tools, however, allow identification of protein from a spot on a 2D gel. In this paper, we have described a tool to aid tentative identification of parasite proteins using an approach of 2-DE. This is the first effort to use a synthetic 2-D for protein identification and is particularly relevant to experimentally challenging systems such as Plasmodium falciparum. 2-DE resolves proteins based on unique molecular weight and pI of each protein. We have exploited the same principle to facilitate identification of proteins elaborated on a 2-DE. By theoretically deducing the pI and molecular weight for each protein reported in the Plasmodium database, we have generated a theoretical 2-DE encompassing the entire Plasmodium falciparum proteome. We developed a software called “Plasmo2D”, which allows one to superimpose and link an experimental 2-DE with the theoretical 2-DE. By a click of the cursor on the spot of interest on a 2-DE, “Plasmo2D” analysis provides a list of proteins, which the spot is likely to correspond to. The approach allows one to quickly narrow down the identity of a particular spot on a 2-DE gel to a few proteins. From the knowledge of the sample used for the experimental
Khachane et al.
2-DE one can eliminate out certain proteins from the list obtained. Thus, starting with an image of a 2-DE showing spots corresponding to different parasite proteins, the tool allows one to get a tentative ID for any spot of interest on the gel. The analysis of total lysates from Plasmodium falciparum by western blotting and MALDI-TOF mass spectrometry presented here validates and confirms the value of “Plasmo2D” in proteomic analysis. The sensitivity of “Plasmo2D” analysis is adjusted in the range of 0.3 pI units Thereby, all proteins within a range of 0.3 pI units from the spot of interest will be included in the list. While this result in an increase in the number of predicted identities for a given protein spot, it also allows the tool to cover pI shifts due to post-translational modifications that a protein might undergo. For example, if the pI of a protein of interest has shifted to an acidic value due to phosphorylation, the modified protein spot will still be included in the list of proteins identified. Indeed, acidic shifts in pI due to phosphorylation are commonly observed on 2-DE gels.23 “Plasmo2D” analysis is not suitable for absolute identification of protein spots of interest from an experimental 2-DE but is designed to be an ancillary tool. Further confirmation of protein identification becomes easier with approaches such as peptide mass fingerprinting using MALDI-TOF. The main advantage of this tool lies in its ability to provide quick analysis and easy adaptability. The current tool is designed to analyze a 10% gel run on a standard Ettan IPGPhore (Amersham Biosciences) of 7 cm length; the system has the flexibility to be adapted to other experimental 2-DE systems. The approach can also be expanded for analysis of other organisms whose genome sequences are available. “Plasmo2D” will serve as an important aid in rapid characterization of protein complexes, pathways, and compartments in the malarial parasite.
Conclusions Plasmo2D aims to facilitate analysis of parasite proteomes by providing rapid identification of parasite proteins from a standard 2-DE platform. The main advantage of this tool lies in its ability to provide tentative protein identifications even from a radio labeled protein spot on a 2-DE. Some approaches where Plasmo2D is likely to be of particular value are identification of low abundance protein components in (a) protein complexes, e.g., in a co-immunoprecipitation experiment (b) subcellular organellar fractions of the parasite and in (c) analyzing their stage specific expressions. Abbreviations. PfHsp70, 70 kDa heat shock protein of Plasmodium falciparum; PfHsp90, 90 kDa heat shock protein of Plasmodium falciparum; PfBiP, 72 kDa binding protein of Plasmodium falciparum; MW, Molecular weight.
Acknowledgment. The authors thank Indo-French Center for the Promotion of Advanced Scientific Research (IFCPAR), Department of Biotechnology (DBT), and NMITLI program of Council for Scientific and Industrial Research (CSIR), New Delhi for their financial support. Supporting Information Available: Supporting Information Table 1. Training set data with migration distance is provided. Panel A lists the proteins, their molecular weights (in kDa) and migration distance (in cm) used for molecular weight training data set. Panel B lists the proteins, their pI and migration distance (in cm) used for pI training data set. This Journal of Proteome Research • Vol. 4, No. 6, 2005 2373
technical notes
Proteomic Tool for Protein ID from P. falciparum
material is available free of charge via the Internet at http:// pubs.acs.org.
References (1) Gardner, M. J.; Hall, N.; Fung, E.; White, O.; Berriman, M.; Hyman, R. W.; Carlton, J. M.; Pain, A.; Nelson, K. E.; Bowman S.; et al. Nature 2002, 419, 498-511. (2) Ben Mamoun, C.; Gluzman, I. Y.; Hott, C.; MacMillan, S. K.; Amarakone, A. S.; Anderson, D. L.; Carlton, J. M.; Dame, J. B.; Chakrabarti, D.; Martin, R. K.; Brownstein, B. H. Mol. Microbiol. 2001, 39, 26-36. (3) Bozdech, Z.; Zhu, J.; Joachimiak, M. P.; Cohen, F. E.; Pulliam, B. and DeRisi, J. L. Genome Bio. 2003, 4, R9. (4) Bozdech, Z.; Llinas, M.; Pulliam, B. L.; Wong, E. D.; Zhu, J. and DeRisi, J. L. PLoS Biol. 2003, 1, E5. (5) Lasonder, E.; Ishihama, Y.; Andersen, J. S.; Vermunt, A. M.; Pain, A.; Sauerwein, R. W.; Eling, W. M.; Hall, N.; Waters, A. P.; Stunnenberg, H. G.; et al. Nature 2002, 419, 537-542. (6) Florens, L.; Washburn, M. P.; Raine, J. D.; Anthony, R. M.; Grainger, M.; Haynes, J. D.; Moch, J. K.; Muster, N.; Sacci, J. B.; Tabb, D. L.; et al. Nature 2002, 419, 520-526. (7) Raman, B.; Cheung, A.; Marten, M. R. Electrophoresis 2002, 23, 2194-2202. (8) Appel, R. D.; Vargas, J. R.; Palagi, P. M.; Walther, D.; Hochstrasser, D. F. Electrophoresis 1997, 18, 2735-2748. (9) Cutler, P.; Heald, G.; White, I. R.; Ruan, J. Proteomics 2003, 3, 392-401.
2374
Journal of Proteome Research • Vol. 4, No. 6, 2005
(10) Efrat, A.; Hoffmann, F.; Kriegel, K.; Schultz, C.; Wenk, C. J. Comput. Biol. 2002, 9, 299-315. (11) Gevaert, K.; Vandekerckhove, J. Electrophoresis 2000, 21, 11451154. (12) Hiller, K.; Schobert, M.; Hundertmark, C.; Jahn, D.; Mu¨nch, R. Nucleic Acids Res. 2003, 31, 3862-3865. (13) Halligan1, B. D.; Ruotti1, V.; Jin1, W.; Laffoon3, S.; Twigger1, S. N.; Dratz, E. A. Nucleic Acids Res. 2004, 32, 638-344. (14) O’Farrell, P. H. J Biol. Chem. 1975, 250, 4007-4021. (15) Kumar, N.; Koski, G.; Harada, M.; Aikawa, M.; Zheng, H. Mol. Biochem. Parasitol. 1991, 48, 47-58. (16) Bonnefoy, S.; Attal, G.; Langsley, G.; Tekaia, F.; Puijalon, O. M. Mol. Biochem. Parasitol. 1994, 67, 157-170. (17) Patrickios, C. S; Yamasaki, E. N. Anal. Biochem. 1995, 231 (1), 82-91. (18) Celis, J. E. Cell Biology 1998, A Laboratory Manual, 2nd ed.; Academic Press: California, USA, Vol. 4, 398-404. (19) Banumathy, G.; Singh, V.; Tatu, U. J. Biol. Chem. 2002, 277, 39023912. (20) Kumar, Y.; Uppuluri, N. R. V.; Babu, K.; Phadke, K.; Kumar, P. P.; Ballal, S.; Tatu, U. Curr. Sci. 2002, 82, 655-663. (21) Le Roch, K. G.; Johnson, J. R.; Florens, L.; Zhou, Y.; Santrosyan, A.; Grainger, M.; Yan, S. F.; Williamson, K. C.; Holder, A. A.; Carucci, D. J.; Yates, J. R. Genome Res. 2004, 14, 2308-2318. (22) Kumar, Y.; Tatu, U. Proteomics 2003, 3, 513-526. (23) Khachane, A.; Kumar, Y.; Belwal, M.; Das, S.; Somsundaram, K.; Tatu, U. Proteomics 2004, 4, 1672-1683.
PR050289P