Proteomic and Bioinformatic Analysis of the Root ... - ACS Publications

Aug 30, 2010 - Collectively, the analyzed protein set provides an initial foundation to experimentally dissect the basis of plant parasitism by M. hap...
0 downloads 0 Views 4MB Size
Proteomic and Bioinformatic Analysis of the Root-Knot Nematode Meloidogyne hapla: The Basis for Plant Parasitism Flaubert Mbeunkui,† Elizabeth H. Scholl,‡ Charles H. Opperman,‡ Michael B. Goshe,† and David McK. Bird*,‡ Department of Molecular and Structural Biochemistry and Plant Nematode Genomes Group, Department of Plant Pathology, NC State University, Raleigh, North Carolina 27695. Received June 16, 2010

On the basis of the complete genome sequence of the root-knot nematode Melodogyne hapla, we have deduced and annotated the entire proteome of this plant-parasite to create a database of 14 420 proteins. We have made this database, termed HapPep3, available from the Superfamily repository of model organism proteomes (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY). To experimentally confirm the HapPep3 assignments using proteomics, we applied a data-independent LC/MSE analysis to M. hapla protein extracts fractionated by SDS-PAGE. A total of 516 nonredundant proteins were identified with an average of 9 unique peptides detected per protein. Some proteins, including examples with complex gene organization, were defined by more than 20 unique peptide matches, thus, providing experimental confirmation of computational predictions of intron/exon structures. On the basis of comparisons of the broad physicochemical properties of the experimental and computational proteomes, we conclude that the identified proteins reflect a true and unbiased sampling of HapPep3. Conversely, HapPep3 appears to broadly cover the protein space able to be experimentally sampled. To estimate the false discovery rate, we queried human, plant, and bacterial databases for matches to the LC/MSEderived peptides, revealing fewer than 1% of matches, most of which were to highly conserved proteins. To provide a functional comparison of the acquired and deduced proteomes, each was subjected to higher order annotation, including comparisons of Gene Ontology, protein domains, signaling, and localization predictions, further indicating concordance, although those proteins that did deviate seem to be highly significant. Approximately 20% of the experimentally sampled proteome was predicted to be secreted, and thus potentially play a role at the host-parasite interface. We examined reference pathways to determine the extent of proteome similarity of M. hapla to that of the free-living nematode, Caenorhabditis elegans, revealing significant similarities and differences. Collectively, the analyzed protein set provides an initial foundation to experimentally dissect the basis of plant parasitism by M. hapla. Keywords: LC/MSE • plant-parasite • nematode • data-independent acquisition • computational proteome

Introduction Plant-parasitic nematodes are among the world’s most devastating pests, causing an estimated $125 billion in annual crop losses worldwide.1 The best studied are the root-knot nematodes (RKN: Meloidogyne spp.) which, as a genus, has a very broad host range that spans all major food and fiber crops. Infective second-stage juveniles (J2s) hatch in the soil and mechanically penetrate the root. Once in the root, J2s execute a stereotypical migration path2 through the intercellular space into the vascular cylinder where they elicit formation of a root gall (root-knot). At the center of the gall are typically 5-7 “giant cells” from which the developing nematode uniquely feeds via * To whom correspondence should be addressed. Prof. David McK. Bird, Plant Nematode Genomes Group, NC State University, Raleigh NC 276957253. E-mail: [email protected]. † Department of Molecular and Structural Biochemistry. ‡ Plant Nematode Genomes Group, Department of Plant Pathology.

5370 Journal of Proteome Research 2010, 9, 5370–5381 Published on Web 08/30/2010

an extensible feeding stylet. Migration is accompanied by visible and copious secretion of proteins from the stylet.2 These proteins are derived from the two subventral pharyngeal glands and include enzymes that degrade or modify host tissues (including cellulases and pectinases). Stylet secretions from the dorsal pharyngeal gland include proteins with functions predicted to affect other aspects of host biology as well as secretion products whose roles in parasitism are not so obvious.3 Collectively, genes encoding proteins secreted by plantparasitic nematodes in planta have been termed “parasitism genes”4 and their products dubbed the “parasitome”5 or the “secretome.” In an attempt to directly catalogue the RKN secretome, Jaubert et al.6 subjected proteins sampled from the stylet tip to two-dimensional gel electrophoresis (2-DGE), but only seven of the most abundant proteins could be identified by microsequencing. Navas et al.7 used 2-DGE to examine the total protein variation in 18 isolates of Meloidogyne arenaria, 10.1021/pr1006069

 2010 American Chemical Society

Analysis of the M. hapla Proteome Meloidogyne incognita and Meloidogyne javanica and were able to reliably score up to 203 protein positions. Some of these protein “spots” were subjected to MALDI peptide mass fingerprinting, but database searching failed to reveal the identity of any of the peptide maps, likely reflecting the limited size of the RKN gene/protein databases at the time.8 Using a more complete EST database, Bellafiore et al.9 employed multidimensional liquid chromatography-tandem mass spectrometry (LC/MS/MS) to identify 486 proteins secreted by chemically stimulated M. incognita. This sampling of the secretome revealed approximately 2.5% of the estimated protein-coding genes in M. incognita10 and includes proteins with interesting regulatory domains and intriguing postulated biological function. The completion of full genome sequences for M. incognita10 and Meloidogyne hapla11 has paved the way for generating more comprehensive proteome databases and, in turn, the potential for more comprehensive MS analyses of the proteomes for these species. Studies using 2-DGE coupled to MS analysis have provided reference proteomes from the free-living nematode Caenorhabditis elegans12-15 and the human nematode parasite Brugia malayi,16 which will likely prove useful for comparison with the proteomes of plant-parasitic species. As part of the first-pass automated annotation of the M. hapla genome, we previously reported a computational reckoning of the predicted M. hapla proteome (named HapPep3), and utilizing a number of full-length EST sequences, we were able to demonstrate accuracy of some predictions in this data set. Here, we have taken a proteomic and bioinformatic route to map experimentally confirmed peptide sequences onto the computationally deduced proteome to more thoroughly validate HapPep3. For proteomic analysis, a number of separation steps are usually employed to reduce sample complexity and increase protein detection, but issues based on the nature of the organism being studied and the MS technique can also contribute to proteome coverage despite having a comprehensive database. For example, the use of 2-DGE-MS analysis can be problematic due to the integral variations attributable to experimental conditions with conventional 2-DGE which hinder accurate spot matching and quantification of spot volume;17 in fact, the thousands of spots observed in the gel maps are actually variants of a few hundred of the most abundant proteins.18 In-solution based fractionation approaches such as multidimensional protein identification technology (MudPIT) offer an alternative by fractionating the peptides from proteome digests prior to MS detection.19-30 Since the proteins are digested prior to any fractionation in the MudPIT approach, protein-specific information is lost and often increased identification of lower abundance proteins can elude detection due the presence of highly abundant peptides as encountered in the proteomics analysis of serum and other protein mixtures dominated by a few high-abundance proteins.31-34 Fractionation of protein mixtures by one-dimensional SDS-PAGE has been used to improve the detection of low-abundance proteins35,36 and thus offers alternative separation of the sample prior to LC/MS/MS analysis. Over the past decade, bottom-up LC/MS/MS approaches using data-dependent acquisition (DDA) have been used to study the proteomes of various organisms. Although quite powerful, open MS/MS approaches such as DDA are by nature biased toward detection of the highest abundance peptide components in a given sample tryptic digest; lower abundance peptides are seldom interrogated. In addition, this mode of

research articles acquisition results in a loss of data in the MS mode and poor duty cycles. As a result, a high percentage of proteins tend to be identified with a single peptide match, thus, lowering the statistical confidence of protein identification. More recently, data-independent, parallel, multiplex fragmentation approaches have been reported for the analysis of simple and complex peptide mixtures37-42 that enhance qualitative sequence coverage of proteins and improves detection of low-abundance peptides. Unlike MS/MS-based DDA strategies where individual peptide ions are sequentially selected for fragmentation, dataindependent acquisition (DIA) approaches do not use a precursor selection step prior to collision-induced dissociation (CID), and thus, all peptides are fragmented simultaneously at any given time in a chromatographic separation. This results in a very complex composite spectrum of product ions from all precursor peptides that are correlated postacquisition via retention time alignment and chromatographic profiling to generate a “pseudo-MS/MS” or reconstructed product ion spectra for each precursor ion detected. The foundation of this approach is the fundamental relationship that a product ion derived from a particular precursor ion must exactly co-elute with its unique precursor ion.37 Data in the low energy scan provides intact precursor ion m/z and intensity data, whereas the elevated energy scan provides product ion data, thus, producing high mass measurement accuracy for both intact peptides and their fragments. The data analysis package ProteinLynx Global Server 2.3 (PLGS 2.3) processes this data to generate reconstructed product ion spectra. Although any database search algorithm may be used, an optimized search algorithm known as Ion Accounting was specifically designed for MSE data and incorporated into PLGS 2.3.43,44 Because alternating low and elevated energy scans are acquired over the full elution profile of every precursor during LC/MSE, the quality of LC/MSE derived product ion spectra often exceeds that of corresponding LC/MS/MS data sets, and we have used it successfully to study the secretome of Arabidopsis thaliana42 and quantify stable isotope labeling in planta.45 More recently, we examined both DDA and DIA raw data and the timing of the MS-to-MS/MS switching events to clearly reveal the fundamental limitations of serial MS/MS interrogation and the advantages of parallel fragmentation by LC/MSE for more comprehensive protein identification and characterization.46 In this study, we used an LC/MSE-based analysis to obtain an experimentally observable M. hapla proteome to assess the quality of HapPep3. To minimize protein losses while providing analysis of as many proteins as possible with extensive peptide coverage, we employed a bottom-up approach combining SDSPAGE protein separation with a data-independent LC/MSE analysis to maximize the duty cycle of the mass spectrometer to increase peptide detection.36,37,43 Using this approach, the identification of 516 proteins by 4475 unique peptides was assessed via comparisons of gene ontology, protein domains, signaling and localization predictions, as well as broad physicochemical properties. Additional searches of the acquired LC/ MSE data against Homo sapiens, A. thaliana, and Escherichia coli were conducted to assess M. hapla specificity. We examined reference pathways to determine the extent of proteome similarity of M. hapla to that of C. elegans and the unique proteins identified in M. hapla that may be distinctive to its adaptations for a parasitic lifestyle. To make our experimentally validated M. hapla proteome database available for unrestricted public download and domain searching, here we report its Journal of Proteome Research • Vol. 9, No. 10, 2010 5371

research articles release into the superfamily database (http://supfam.mrclmb.cam.ac.uk/SUPERFAMILY).

Experimental Procedures Chemicals. Methanol (HPLC grade), BCA Protein Assay kit, sodium carbonate, and iodoacetamide were from Thermo Fisher Scientific (Waltham, MA). Ammonium bicarbonate and ammonium formate were from Fluka (Milwaukee, WI). Acetonitrile (HPLC grade) and formic acid (ACS reagent grade) were purchased from Aldrich (Milwaukee, WI). Sequencing grademodified trypsin was from Promega (Madison, WI). All other chemicals were obtained from Thermo Fisher Scientific or Sigma-Aldrich unless otherwise noted. Water was distilled and purified using a High-Q 103S water purification system (Wilmette, IL). Construction of the HapPep3 Database. Construction of the initial HapPep database has been previously described.11 Briefly, the M. hapla genomic assembly and contigs from EST assemblies were used as input for PASA (Program to Assemble Spliced Alignments) to create an initial set of 3000 gene models.47 Gene models with at least 100 amino acids and 2 exons were used to train GlimmerHMM,48 which was then run against the genomic assembly. Resulting predictions were returned to PASA to generate additional gene models for FgenesH training. Coordinates for ab inito predictions from PASA, Glimmer, and Fgenesh were compared and gene models that were found in at least two of the three sources were used to create HapPep1. Further manual curation and refinement was used to create the current freeze, which is HapPep3 and which comprises 14 420 protein sequences. Protein Extraction and Enrichment. M. hapla was maintained and harvested using standard protocols.49 J2 (5000 J2/ plant) were inoculated on Rutgers tomato seedlings grown in a controlled growth room. After 35-60 days, roots were harvested and eggs were recovered using the NaOCl method.49 These eggs were then hatched, and the live J2 were concentrated by centrifugation and then rapidly dispersed by vortexing into 10 vol of ice-cold 6 M guanidine hydrochloride, 10 mM EDTA, 10 mM dithiothreitol (DTT), and 100 mM NH4HCO3, pH 7.8. This suspension was immediately transferred to a chilled French pressure cell, and the nematodes were disrupted at 82 MPa. The lysate was collected on ice and centrifuged at 3000g for 10 min at 4 °C to remove cell debris. Proteins were concentrated 10-fold using a Microcon Centrifugal Filter Device (Millipore, Billerica, MA) with a 10 kDa MWCO as previously described.50 The reservoir was then washed with 50 mM NH4HCO3 and centrifugation was repeated to achieve a final sample volume of 10 µL. Total protein concentration was measured using the BCA Protein Assay Kit (Pierce Biotechnology, Rockford, IL). Samples containing 15 and 25 µg of nematode protein were independently resolved on a 4-12% SDS-polyacrylamide gel (Invitrogen, Carlsbad, CA) and visualized using Colloidal Coomassie Stain (Invitrogen, Carlsbad, CA), revealing a similar staining pattern. After destaining with water, 20 gel slices were excised from each lane, and corresponding slices from each lane were pooled, and then subjected to ingel tryptic digestion. In-Gel Protein Digestion and Sample Preparation. Protein in-gel reduction, alkylation, and tryptic digestion were similar to a previously published procedure.51 Excised gel pieces were washed with acetonitrile and 100 mM NH4HCO3 pH 8.0 (1:1, v/v) twice for 20 min and then with acetonitrile. After decanting the residual acetonitrile, dehydrated gel pieces were rehydrated 5372

Journal of Proteome Research • Vol. 9, No. 10, 2010

Mbeunkui et al. with 50 mM NH4HCO3 pH 8.0 containing 10 mM TCEP and incubated at 37 °C for 30 min. Alkylation of cysteinyl residues was performed by adding 10 mM iodoacetamide to the mixture followed by incubation in the dark at room temperature for 30 min. After two cycles of washing and subsequent dehydration using 50 mM NH4HCO3, pH 8.0, and acetonitrile, respectively, the gels pieces were rehydrated with 50 mM NH4HCO3, pH 8.0, containing 10 ng/µL of trypsin. The digestion was allowed to proceed overnight at 37 °C. Following digestion, the peptides were extracted by adding 1% formic acid in 2% acetonitrile followed by vortexing and subsequent 5 min incubation in a sonicating bath. The gel pieces remained in the extraction solution for 30 min with occasional vortexing. Gel debris was removed by centrifugation and the supernatant containing the extracted peptides was removed and filtered prior to LC/MSE analysis. Liquid Chromatography-Tandem Mass Spectrometry Analysis. Extracted peptides from gel pieces were analyzed by nanoscale capillary LC/MSE using a nanoACQUITY ultraperformance liquid chromatography coupled with a Q-Tof Premier mass spectrometer (Waters Corporation, Milford MA). The nano-LC separation was performed using a C18 reverse-phase column (BEH stationary phase, 1.7 µm particle size) with an internal diameter of 75 µm and length of 250 mm (Waters Corporation), with a binary solvent system comprising 99.9% water and 0.1% formic acid (mobile phase A) and 99.9% acetonitrile and 0.1% formic acid (mobile phase B). Samples were initially preconcentrated and desalted online at a flow rate of 5 µL/min using a Symmetry C18 trapping column (internal diameter 180 µm, length 20 mm) as previously described.45 After each injection, peptides were eluted into the NanoLockSpray ion source (Waters Corporation) at a flow rate of 300 nL/min with the following linear gradients: 2-7% mobile phase B over 1 min, 7-40% B over 90 min, 40-95% B over 1 min, isocratic at 95% B for 6 min, and a return to 2% B over 1 min. Prior to the next injection, the column was re-equilibrated for 12 min at initial conditions (2% mobile phase B). The lockmass calibrant peptide standard, 100 fmol/mL glu-fibrinopeptide B, was infused into the NanoLockSpray ion source at a flow rate of 600 nL/min and was sampled during the acquisition at 30 s intervals. Data were acquired using a dataindependent acquisition mode where full scan (m/z 50-1990) LC/MSE data were collected using the ‘expression’ mode of acquisition with a 1 s scan interval for both normal and elevated-energy data channels.38 Data were collected at a constant collision energy setting of 4 V during low-energy MS mode, whereas a step from 15 to 30 V of collision energy was used during the high-energy MSE mode. Data Processing and Protein Identification. PLGS2.3/IDENTITYE software (Waters Corporation) containing the ion accounting algorithm43,44 was used to search LC/MSE data using search parameters that included the “automatic” setting for mass measurement accuracy (10 ppm for precursor ions and 20 ppm for product ions), a minimum of one peptide match per protein, a minimum of three product ion matches per peptide, and a total of seven product ion matches per protein with the maximum false positive rate (FPR) against the randomized forward database set to 5%. The sole fixed modification was carbamidomethylation (C), and the variable modification parameters were acetylation (protein N-terminus), deamination (N, Q), and oxidation (M). The maximum missed trypsin cleavage was set at 2. All searches were conducted using the HapPep3 protein database as well as control databases for

Analysis of the M. hapla Proteome H. sapiens, A. thaliana, and E. coli, which were used to assess M. hapla specificity. Only proteins matching at least 2 peptides in PLGS2.3/IDENTITYE search were automatically considered as identified. For proteins detected by a single peptide in M. hapla, each LC/MSE product ion spectrum was carefully inspected to confirm that the assignment was based on three or more y- or b-series ions and are included in the Supporting Information. The search results were imported into Microsoft Excel for further analysis and presentation. GO Analysis. Each protein identified in the experimental proteome was used as a query in a BlastP search against the Uniprot (swissprot + trembl) protein database. The top 10 results with significance