Proteomics and Proteogenomics Analysis of Sweetpotato (Ipomoea

Jun 17, 2019 - The proteogenomics results provide evidence for the translation of new open reading frames (ORFs), alternative ORFs, exon extensions, a...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/jpr

Cite This: J. Proteome Res. 2019, 18, 2719−2734

Proteomics and Proteogenomics Analysis of Sweetpotato (Ipomoea batatas) Leaf and Root Thualfeqar Al-Mohanna,† Nagib Ahsan,‡,§ Norbert T. Bokros,† Gizem Dimlioglu,† Kambham R. Reddy,∥ Mark Shankle,⊥ George V. Popescu,*,#,¶ and Sorina C. Popescu*,†

Downloaded via NOTTINGHAM TRENT UNIV on July 17, 2019 at 07:53:39 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Mississippi State, Mississippi 39759, United States ‡ COBRE Center for Cancer Research Development, Proteomics Core Facility, Rhode Island, USA Hospital, Providence, Rhode Island 02903, United States § Division of Biology and Medicine, Brown University, Providence, Rhode Island 02903, United States ∥ Department of Plant and Soil Sciences, Mississippi State University, Mississippi State, Mississippi 39759, United States ⊥ Pontotoc Experimental Station, Mississippi State University, Pontotoc, Mississippi 38863, United States # Institute for Genomics, Biocomputing, and Biotechnology, Mississippi State University, Mississippi State, Mississippi 39759, United States ¶ The National Institute for Laser, Plasma and Radiation Physics, Bucharest RO-077125, Romania S Supporting Information *

ABSTRACT: Two complementary protein extraction methodologies coupled with an automated proteomic platform were employed to analyze tissue-specific proteomes and characterize biological and metabolic processes in sweetpotato. A total of 74 255 peptides corresponding to 4321 nonredundant proteins were successfully identified. Data were compared to predicted protein accessions for Ipomoea species and mapped on the sweetpotato transcriptome and haplotype-resolved genome. The two methodologies exhibited differences in the number and class of the unique proteins extracted. Overall, 39 916 peptides mapped to 3143 unique proteins in leaves, and 34 339 peptides mapped to 2928 unique proteins in roots. Primary metabolism and protein translation processes were enriched in leaves, whereas genetic pathways associated with protein folding, transport, sorting, as well as pathways in the primary carbohydrate metabolism were enriched in storage roots. A proteogenomics analysis successfully mapped 90.4% of the total uniquely identified peptides against the sweetpotato transcriptome and genome, predicted 741 new protein-coding genes, and specified 2056 loci where gene annotations can be further improved. The proteogenomics results provide evidence for the translation of new open reading frames (ORFs), alternative ORFs, exon extensions, and intronic ORF sequences. Data are available via ProteomeXchange with identifier PXD012999. KEYWORDS: sweetpotato, leaf proteome, storage root proteome, tissue-specific, LC-MS/MS, proteogenomics analysis



INTRODUCTION Sweetpotato (Ipomoea batatas, Lam.) is a perennial dicotyledonous plant classified in the order Solanales and family Convolvulaceae. The Ipomoea genus includes I. nil (Japanese morning glory), an experimental model plant for the genus and I. trif ida and I. triloba, the wild relatives of sweetpotato. Sweetpotato is a staple food and a valuable annual crop in the world economies. Although still regarded as an underutilized crop, the sweetpotato yield has increased over the last 15 years to cover the demand for human consumption, animal feed, and derivative industrial products such as biofuels.1 The cultivated sweetpotato is a highly polymorphic alloautohexaploid. Sequencing and assembly of the haplotyperesolved I. batatas and full genomes of the diploids I. nil and I. trifida clarified aspects of Ipomoea ssp. genome organization and the origin and evolution of sweetpotato.2−4 Nevertheless, unresolved issues in the I. batatas current genome assembly © 2019 American Chemical Society

and transcriptome (i.e., reading frame prediction, exon border, and splice junction definition) have indirectly impacted proteomics efforts with the result that few published studies are focused on Ipomoea proteomes. So far, protein-level analyses of I. batatas have mostly relied on gel-based systems to capture differentially expressed proteins in cultivars with distinctly colored flesh,5 in pencil versus storage roots,6 and following mechanical or environmental stress.7,8 A recently published high-throughput comparative study identified 1541 and 1201 proteins, respectively, in two sweetpotato ecotypes; however, the extracted peptides were mapped using a search against Viridiplante with only 0.01% of the approximately 1700 identified proteins matching accessions in I. batatas or I. nil.9 Received: December 10, 2018 Published: May 22, 2019 2719

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

cured at 29 °C for 7 days and stored at 16 °C until processed. Both leaf and roots were lyophilized before protein extraction.

Alongside genome complexity, the chemical composition of sweetpotato has also rendered the mapping of its proteome challenging.10 Sweetpotato storage roots have low protein content compared with other similar crops, whereas leaves have a high content of secondary metabolites, rendering the extraction of proteins of sufficient quality and quantity potentially difficult.11 A survey of the relevant literature5,7,9,12,13 highlights the necessity of method optimization for the maximum achievable protein purity and quantity from sweetpotato. The ideal methodology should allow identification of a large number of proteins with high accuracy and possess high analytical resolution in protein separation.14 Also, extraction methods should reveal tissue-specific mechanisms for growth, development, and adaptation to biotic and abiotic stresses.15,16 Proteogenomics approaches utilize mass spectrometrygenerated data in conjunction with genomic information validate protein-coding loci in a gene-prediction-independent manner.17 Proteogenomics led to revisions of gene annotations in the genomes of experimental model plants with genome sequences completed decades ago. For many crop plants with complex and large genomes, for which gene annotations are rarely available for the entire genome, proteogenomics approaches can offer a practical solution to identify new open-reading frames and refine and upgrade gene models.18 Comprehensive high-throughput proteome analyses combined with a thorough search against all annotated Ipomoea genomes are necessary to identify the full proteome in sweetpotato. As such, a high-quality sweetpotato proteome would facilitate the process of amending and refining the current haplotyperesolved genome assembly and transcriptome. It would also constitute a basic platform from which protein-specific analyses (e.g., protein post-translational modifications and interactions in metabolic and signaling pathways) can be addressed and diverse system-level processes explored. In this work, we identify the composition and functionally characterize the leaf and storage root proteomes of sweetpotato using a high-throughput label-free methodology, evaluate the performance of two methods for organ-specific protein extraction, and contribute to sweetpotato genome annotation efforts by applying a proteogenomic pipeline to map peptides on the haplotype-resolved genome and predicted transcriptome.



Protein Extraction

The sweetpotato samples analyzed in this study were taken from multiple plants, and the organ-specific samples were processed individually, as follows. Approximately 2.5 kg of sweetpotato storage roots was collected from 5 plants; individual roots were cut in slices before freezing in liquid nitrogen. Mature leaves were collected from 10 plants. Samples from multiple biological replicates (leaves or roots) were mixed before protein extraction; approximately 100 g of lyophilized root tissue in total and 5 g of ground leaf tissue in total were obtained and used for protein purification. The processed tissue was aliquoted. Protein purification was performed in 4 independent technical replicates which contain the mixture of biological samples for each organ analyzed. Phenol Procedure (M1). For phenol extraction, we used the protocol described in ref 5, to which we made the following modifications. Phenol was treated with a Tris-HCl buffer to prepare phenol-saturated buffer (pH 8.0) with 0.1% 8hydroxyquinoline. Subsequently, 200 mg of powdered sample was extracted with 750 μL of phenol-saturated buffer and 750 μL of extraction buffer (100 mM Tris base, 900 mM sucrose, and 10 mM EDTA at pH 8.0), vortexed for 1 h at room temperature (RT), and centrifuged at 13 000 rpm for 10 min at 4 °C. The upper phase (330 μL) was mixed with 5 volumes (1650 μL) of ice-cold precipitation solution (100 mM NH4CH3CO2 in 100% methanol) and incubated at −20 °C overnight (16 h).6 The sample was then centrifuged at 6000 rpm for 10 min at RT, and the pellets were collected and mixed with 1 mL of ice-cold precipitation buffer followed by centrifugation at 13 000 rpm for 5 min at RT. The pellets were washed with 1 mL of 80% cold acetone and centrifuged at 13 000 rpm for 5 min at RT. After a similar final wash with 1 mL of 70% ice-cold ethanol, the pellets were resuspended in 300 μL of 70% ethanol and stored at −20 °C. Polyethylene Glycol (PEG) Procedure (M2). For the PEG procedure, we combined and modified several previously described protocols.19−21 Briefly, 500 mg of powdered sample was mixed with 5 mL of buffer (500 mM of Tris-HCl, 2% NP40, 2% β-Mercaptoethanol, 20 mM MgCl2, 1 mM PMSF at pH 8.3) and centrifuged for 15 min at 3000 rpm at 4 °C. The supernatant was filtered through a 2.0 μm filter to remove impurities and insoluble residues, and 50% PEG 4000 was added to the supernatant to the final concentration of 15% PEG 4000. The samples were incubated on ice for 30 min and centrifuged at 12 000 rpm for 15 min at 4 °C.19,21 To precipitate proteins, the supernatant was mixed with cold acetone (at −20 °C, 2 volumes for the root and 4 volumes for the leaf samples), incubated at −20 °C for 30 min, and centrifuged at 13 000 rpm for 5 min.20 The pellets were mixed with 1 mL of 80% cold acetone, centrifuged at 13 000 rpm for 5 min, washed with 1 mL of 70% ethanol, and resuspended in 500 μL of 70% ethanol for storage at −20 °C.

EXPERIMENTAL SECTION

Plant Material and Tissue Collection

Sweetpotato Beauregard cultivar was obtained from the Pontotoc Ridge-Flatwoods Branch Experiment Station, Pontotoc, MS. Slips of Beauregard sweetpotato were transplanted into polyvinyl chloride (PVC) pots (20 cm diameter and 35 cm height) filled with a potting medium containing topsoil and sand mix (1:3 by volume). Ten pots of Beauregard sweetpotato slips were grown under outdoor pot-culture conditions located at the Environmental Plant Physiology Laboratory, MSU. The plants were watered with Hoagland’s nutrient solution three times a day to supply nutrients and water for plant growth. The leaf samples were harvested from 1-month old sweetpotato plants and stored at −80 °C until processed. The storage root samples were grown in a dryland environment on Atwood soil (Fine-silty, mixed, thermic Typic Paleudalfs) for 110 days. The samples were harvested and

Sample Preparation and LC-MS/MS Analysis

Cell pellets were mixed with lysis buffer (8 M urea, 1 mM sodium orthovanadate, 20 mM HEPES, 2.5 mM sodium pyrophosphate, 1 mM β-glycerophosphate, pH 8.0, 20 min, 4 °C), sonicated, and cleared by centrifugation (12 500 rpm, 15 min, 4 °C). Protein concentration was measured (Pierce BCA Protein Assay, Thermo Fisher Scientific, IL, U.S.A.), and 100 μg of protein per sample was subjected to trypsin digestion. It 2720

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 1. Methodology for sweetpotato protein extraction and identification by LC/MS/MC (A) Workflow for protein extraction and purification from leaf and root tissue and peptide/protein identification by LC/MS/MC. (B) Representative SDS-PAGE of total protein preparations obtained from roots (R1−R4) and leaves (L1−L4) using the phenol-based (M1) and PEG4000 fractionation (M2) protocols for protein extraction. The SDS-PAGE gels were stained with Coomassie blue to visualize protein bands. The protein marker and MWs (kDa) are shown.

analysis by a Q Exactive Plus mass spectrometer (Thermo Fisher Scientific, Waltham, MA) followed by hands-free database search. The LC-MS/MS setup was used as described earlier.24 Briefly, the peptides were separated through a linear reversed-phase 90 min gradient from 0% to 40% buffer B (0.1 M acetic acid in acetonitrile) at a flow rate of 3 μL/min through a 3 μm, 20 cm C18 column. The electrospray voltage of 2.0 kV was applied in a split flow configuration, and spectra were collected using a top-9 data-dependent method. Survey full scan MS spectra (m/z 400−1800) were acquired at a resolution of 70 000 with an AGC target value of 3 × 106 ions

is important to note that the cell lysate was diluted 4-fold with 20 mM HEPES buffer, pH 8.0 prior to protein concentration measurement and trypsin digestion. Tryptic peptides were desalted using C18 Sep-Pak plus cartridges (Waters, Milford, MA) and were lyophilized for 48 h to dryness. The dried eluted peptides were reconstituted in buffer A (0.1 M acetic acid) at a concentration of 1 μg/μL, and 5 μL was injected for each analysis. The LC-MS/MS was performed on a fully automated proteomic technology platform22,23 that includes sample pick up by an Agilent 1200 Series Quaternary HPLC system (Agilent Technologies, Santa Clara, CA), MS/MS 2721

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

(Supplemental files SF1, SF2, SF3, and SF4) and added as tracks on the sweetpotato genome and transcriptome annotations downloaded from http://public-genomes-ngs. molgen.mpg.de/SweetPotato/DOWNLOADS/. GO Analysis. Protein coding sequences were retrieved for all unique hits identified from mass spectrometry data from NCBI. The PANTHER HMM Scoring tool30 was used to score protein sequences against the entire PANTHER HMM library which was last updated 2/8/18. Top HMM hits below an E-value cutoff of 0.001 were kept for further analysis resulting in 4286/4321 (99.2%) of sequences being mapped to a PANTHER family. PANTHER Generic Mapping files were generated for individual data sets in R for GO term enrichment. Protein classes were retrieved using the PANTHER.db (v1.0.4) package. GOSLIM terms were identified using the PANTHER GO slim OBO file available from http:// data.pantherdb.org/PANTHER13.1/ontology/ PANTHERGOslim.obo using the orthologyIndex (v2.0) package.

or a maximum ion injection time of 200 ms. The peptide fragmentation was performed via higher-energy collision dissociation with the energy set at 28 NCE. The MS/MS spectra were acquired at a resolution of 17 500, with a targeted value of 2 × 104 ions or maximum integration time of 200 ms. The ion selection abundance threshold was set at 8.0 × 102 with charge state exclusion of unassigned and z = 1 or 6−8 ions and dynamic exclusion time of 30 s. Database Searching

Peptide spectrum matching of MS/MS spectra of each file was searched against the NCBI Ipomoea taxon (txid4119) proteins data set containing 58 282 proteins (NCBI; downloaded 2/12/ 2018) using MASCOT v. 2.4 (Matrix Science, Ltd., London, U.K.). A concatenated database containing “target” and “decoy” sequences was employed to estimate the false discovery rate (FDR).25 Msconvert from ProteoWizard (v. 3.0.5047), using default parameters and with the MS2Deisotope filter on, was employed to create peak lists for Mascot. The Mascot database search was performed with the following parameters: trypsin enzyme cleavage specificity, 2 possible missed cleavages, 10 ppm mass tolerance for precursor ions, and 20 mmu mass tolerance for fragment ions. Search parameters permitted variable modification of methionine oxidation (+15.9949 Da) and static modification of carbamidomethylation (+57.0215 Da) on cysteine. The resulting peptide-spectrum matches (PSMs) were reduced to sets of unique PSMs by eliminating lower scoring duplicates. To provide high confidence data, the Mascot results were filtered for Ions Score (>10). Peptide assignments from the database search were filtered down to a 1% FDR by a logistic spectral score as previously described.25,26 Frequency of interference by common contaminants in a database search is lower than 0.01%. One peptide/protein was considered a positive identification. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE27 partner repository with the data set identifier PXD012999. Submission details: Project Name: Proteomics and proteogenomics analysis of sweetpotato (Ipomoea batatas) leaf and root. Project accession: PXD012999.



RESULTS

Comparative Analysis of Two Protein Extraction Methods from Sweetpotato Leaf and Root Tissues

Root and leaf tissues were collected from mature plants of the sweetpotato cultivar Beauregard. “Beauregard” is the most common cultivar grown in the United States with a high storage root yield.31 Leaves (4th and 5th from the apical meristem) and mature storage roots were collected and used for protein extraction. Two distinct protocols were optimized for the extraction and solubilization of proteinsa phenolbased method 1 (M1) and polyethylene glycol (PEG) 4000 fractionation-based method (M2). For the M1, we started with the method described in ref 5 and modified it for maximum protein concentration in the final prep. PEG4000 fractionation in the M2 is considered to enhance the extraction of lowabundance proteins and was previously used to extract proteins from chemically complex plant tissues.21,32 The workflow for protein extraction is presented in Figure 1A, and the full protocols are detailed in the Experimental Section. Total protein preparations were obtained from leaf and root tissues for four technical replicates per tissue type using M1 and M2. Aliquots of the 16 protein preparations resulted were run on SDS-PAGE gels and proteins were visualized via Coomassie staining to verify protein quantity; overall, total protein yield was higher when using M1 on both leaf and root tissue (Figure 1B). The complex protein mixtures resulted from M1 and M2 were directly analyzed after proteolysis by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) (Table S1). For the LC-MS/MS analysis, 100 μg of total protein per sample was subjected to trypsin digestion and subsequent downstream proteomic analysis. The LC-MS/MS was performed on a fully automated proteomic technology platform that includes an Agilent 1200 Series Quaternary HPLC system connected to a Q Exactive Plus mass spectrometer. Peptide spectrum matching of MS/MS spectra of each file was searched against the NCBI Ipomoea taxon (taxid4119) proteins data set that contains 58 282 proteins (Table S1). A concatenated database containing “target” and “decoy” sequences was employed to estimate the false discovery rate (FDR).25 The resulting peptide-spectrum matches (PSMs) were reduced to sets of unique PSMs by

Bioinformatics

Proteogenomics Analysis. Unique, MASCOT-identified peptides were queried using tBLASTn against either the sweetpotato genome or transcriptome available at http:// public-genomes-ngs.molgen.mpg.de/SweetPotato/ DOWNLOADS/. Parameters from previous, similar analyses were used for peptide mapping.28 Composition-based filtering was turned off, the word size was decreased to 2, and the evalue cutoff was raised to 1000 to increase the number of potential hits returned against query peptides; only 50 hits/ query were generated to reduce the number of incorrect matches reported per query peptide. tBLASTn results were subsequently parsed to identify perfect and imperfect matches along the sweetpotato genome and transcriptome. Perfect matches are defined as hits for a unique peptide along its entire length with no mismatches or gaps. Imperfect matches are defined as hits for any peptide not previously mapped perfectly along >90% of the length of the query peptide with a sequence identity >80%. Multiple hits are allowed for individual peptides as they might originate from nonunique regions within either predicted genomic sequences or transcript assemblies. Sweetpotato peptides were visualized using the Integrative Genomics Viewer.29 Peptide lists were formatted as ".bed" annotations 2722

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 2. Performance of the experimental strategy for the characterization of sweetpotato proteome. (A) The number of sweetpotato peptides and proteins identified from each sample and group analyzed. M1: the phenol-based method; M2: PEG4000 fractionation method; L: leaf tissue, R: root tissue. Numbers (1−4) represent independent replicates for each method used and tissue type analyzed. (B) Differential analysis of method performance for protein extraction. (C) Differential analysis of proteins extracted from sweetpotato leaves and roots. In panels B and C, the numbers represent unique proteins identified, as shown in panel A. (D) A comparison of the current study with literature-derived sweetpotato proteome data. The publications are cited in the main text. (E) The overlap among the sweetpotato (I. batatas) proteome identified in this study and predicted proteomes of other Ipomoea species. The protein accessions present in NCBI for I. batatas (Ib), I. nil (In), I. trif ida (It), I. purpurea (Ip), and I. cavalcantei (Ic) were used for comparison to the sweetpotato proteome (P_Ib). 2723

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 3. Proteogenomic analysis of sweetpotato. (A) Distribution of perfectly/imperfectly mapped peptides per sweetpotato transcripts. (B) Distribution of genomics loci of perfectly/imperfectly mapped peptides. The identity of the chromosome and the number of hits are shown. (C) Categorical analysis of peptides matches in sweetpotato transcriptome and genome. I match, imperfect-match peptide; P-match, perfect-match peptide; T, sweetpotato transcriptome; G, sweetpotato haplotype-resolved genome. (D) Examples of sweetpotato peptides which validate or modify current gene annotations. The following events were mapped: current ORF validation (a), novel ORFs with or without transfrag support (b and c), exon extensions or fused exon with transfrag support (d and e), and intron sequence with transfrag support (f). Continuous lines indicate annotated genes (black) or transfrags (blue); thick lines are exons, and thin lines are introns. Interrupted lines represent intergenic sequences. Peptides are shown in red (perfect-match, PM) or pink (imperfect-match, IM) boxes. Additional information is given in Supplemental Figure 1.

eliminating lower scoring duplicates. Peptide assignments from the database search were filtered down to a 1% FDR by a logistic spectral score as previously described.25,26 Using this workflow, we identified 2682 and 1589 proteins, respectively, in the leaf samples processed by M1 (Data set S1) and M2 (Data set S2). A total of 2643 and 1368 proteins were identified in the root samples processed using the two distinct methods. The similar number of proteins identified in the four technical replicates performed for each method and tissue type illustrates the high replicability of the extraction workflow. Overall, the 3143 leaf and 2928 root proteins identified represent 4324 unique proteinsthe largest data set of I. batata proteins, to date (Figure 2A). Regardless of tissue type, when comparing the performance of the two methods, M1 resulted in a 4.4-fold larger number of unique proteins than M2 (Figure 2B). Notably, 40% of all sweetpotato proteins identified in this study were extracted by both methods, pointing to the high confidence value of our protein identification pipeline. No obvious differences were detected in the methods’ performance when comparing tissue-specific data sets. Both M1 and M2 resulted in approximately equal numbers of proteins when applied to a leaf or root tissue, suggesting that the differences in physical and chemical

characteristics of the plant tissues were not significant factors in the performance of these methods (Figure 2C). A comparative analysis with four published sweetpotato data sets5,6,9,13 showed only limited overlap with our proteome (Figure 2D and Table S1) and highlighted the difficulty of performing discovery proteomics in polypoid crops with complex genome structure. On the other hand, substantial overlap exists between our proteome and predicted I. nil proteins available in the NCBI database (https://www.ncbi. nlm.nih.gov/), as shown by a differential comparison including predicted protein accessions of several Ipomoea species (Figure 3A). Over 96% of identified sweetpotato accessions are shared with I. nil (white-edged morning glory); 1% or less are shared with I. batatas, I. trif ida (wild ancestor), I. purpurea (common morning glory), and I. cavalcantei (red flowered morning glory). Overall, the sweetpotato proteome added 4215 verified proteins in the Ipomoea repertoire, representing 97.5% of the identified sweetpotato proteome. Proteogenomics Analysis

Further, MASCOT-identified peptides were mapped with the recently sequenced sweetpotato genome and transcriptome.2 A tBLASTn search of identified peptides against the annotated sweetpotato transcriptome resulted in 10 746 (75.3%) peptides 2724

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 4. Gene ontology analysis for the evaluation of protein extraction methods. (A) Differential analysis of the protein classes represented in protein preparations resulted from using phenol (M1) or PEG4000-based protein extraction (M2). All displayed protein classes have at least a 2fold difference between examined data sets. A complete analysis of identified protein classes is available in Supplemental Table S9. (B) A subset of the Gene Ontology (GO) Biological Processes (A) and Cellular Component (B) terms identified in the protein sets extracted by M1 and M2. Dots represent the frequency of uniquely identified proteins from either method. Bars connecting dots represent the difference in the number of proteins extracted by either method with all highlighted bars showing at least a 50% increase in term population. Supplemental Table S10 gives values for all identified GO terms.

mapped with no gaps or mismatches along their entire length to 39 251 regions along 11 841 transcripts within the sweetpotato transcriptome (Figure 3A and Table S2). An additional 1407 peptides were mapped imperfectly to 6175 regions within 3874 transcripts (Table S3). Imperfectly mapped peptides are defined as peptides returning an alignment file with a query peptide coverage >90% and a sequence identity >80%. Peptides which were previously perfectly mapped are excluded from this category. These perfect/imperfect-matched peptides represent 12 153/14 275

(85.1%) of all identified peptides and demonstrate a strong overlap between proteomic data and the predicted transcriptome. The 12 153 peptides were mapped to 45 426 unique regions within the sweetpotato genome with 87.9% of hits localized to one of 15 major chromosomes and the rest localized on scaffolds (Figure 3B and Table S4). Mapping the MASCOT-identified peptides against the haplotype genome2 returned similar results, although fewer peptides returned as many high-quality hits. As such, 8530 peptides mapped perfectly to 22 612 regions (Table S5), 2725

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 5. Differential analysis of the sweetpotato leaf and root proteomes. (A) Differential analysis of the protein classes represented in leaf and root protein preparations. All displayed protein classes have at least a 2-fold difference between examined data sets. A complete analysis of identified protein classes is available in Supplemental Table S9. (B, C) Gene ontology terms with significant overrepresentation in the leaf (B) and root proteomes (C).

match and 266 imperfect-match peptides were mapped within the genome but not the transcriptome. These peptide sequences might indicate the presence of protein-coding genes in the respective genomic regions which are missing in the current annotations. When considering the 749 perfectand imperfect-matched peptides mapping solely on the genome, we only found eight that partially overlap (from 18.5% to 93.3%) to predicted exons (Tables S7 and S8). The remaining 741 loci could indicate additional coding regions not currently annotated in the sweetpotato genome. Further, the 1053 peptides with transcriptome perfect-match and genome imperfect-match and the 177 peptides having a genome perfect-match and transcriptome imperfect-match provide experimental validation to the transcriptome assembly and can be leveraged to improve misannotations in the current genome. Finally, the 997 peptides with imperfect-match in

whereas 2316 additional peptides were mapped imperfectly to 9452 regions (Table S6). Together, peptides mapped perfectly or imperfectly against the sweetpotato genome represent 10 846/14 275 (76%) of all identified peptides. Overall, we successfully mapped 12 902 (90.4%) peptides, indicating a very good agreement between the proteome and the haplotyperesolved genome and transcriptome assemblies. A comparative analysis of peptides mapped within the genome and transcriptome annotations (Figure 3C) shows the largest overlap when comparing perfect-match peptides. The identified 7870 perfect-match peptides validate previous sequence predictions within the captured exons. Additional 1823 perfect-match and 233 imperfect-match peptides were only mapped within the transcriptome but not the genome. These peptides are likely covering spliced junctions or represent alternative splicing events. Further, 483 perfect2726

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research both the genome and transcriptome highlight regions where both genome and transcript annotations can be improved. The bed format annotations corresponding to the peptides in Supplemental Tables S5, S6, S7, and S8 were generated (Files SF1 to SF4) and mapped on the sweetpotato genome. Selected examples are illustrated in Figure 3D and Figure S1. A large number of peptides were found to overlap with reported ORFs, offering additional support for the coding potential of these genomic regions. Other types of events found include peptides that support novel ORFs (e.g., chr 3: 12432800− 12432830), intron retention or fused exons within annotated genes (e.g., G1367|TU2294 encoding an uncharacterized protein and G14291|T23336 encoding an ABC transporter), and novel coding sequences near the alkaloids gene cluster GC024 (chr 10: 525809−465744) described in ref 2. Additionally, we found peptides matching transcribed DNA fragments without RNA-seq data (transfrags) that support novel ORFs (e.g., chr 3: 6362681−6362715 on chromosome 3), fused-exon events (e.g., G1367|TU2294 encoding an uncharacterized protein), and intron sequences in annotated gene introns (e.g., G24368|TU39936 encoding a lipid-transfer DIR1-like protein). Overall, the proteogenomics analysis demonstrates the potential of newly discovered peptides to improve annotations in the complex sweetpotato genome at this early stage of genomics discovery.

using M1, whereas M2 resulted in at least 50% population within only 15 CC terms (34%). M1 exclusively identified proteins associated with the peroxisome (12 proteins), nucleoplasm (15 proteins), and tubulin (6 proteins). Protein transport, mRNA processing, translation, and vitamin metabolism were terms showing the largest difference between methods, with more proteins in these GOBP terms extracted by M1. Similarly, M1 extracted more efficiently proteins localized in the membrane, cytosol, nucleus, and ribosomal proteins (Figure 4C and Table S10). Both methods performed equally well for the extraction of phosphatase inhibitors, antioxidant, and receptor activities, carbohydrate phosphatases, and proteins associated with the GO terms “MAPK cascade”, “glycolysis”, “ferredoxin metabolism”, and “cytoskeleton organization”, among others. On the other hand, neither M1 nor M2 effectively extracted ribonucleases, histones, hydrolases, and extracellular matrix glycoproteins; similarly, proteins associated with the GO terms such as “defense response to bacteria”, “cell death”, “signal transducer activity”, and “peroxisomal transport” had low representation in M1 and M2 proteomes. To obtain an overall view of the leaf and root proteomes, we mapped protein classes across organ types. The analysis revealed that roots allowed for both a larger variety and number of protein classes to be extracted from sweetpotato relative to leaves (Figure 5A and Table S9). Overall, 72 protein classes were preferentially extracted from root tissue, compared with 65 classes from leaf tissue; only 15 classes were identified in both tissues. The roots yielded nine unique protein classes, among which enzymes including Ser/Thr receptors, ribonucleases, and polymerases had a good representation. The leaf tissue yielded 13 unique protein classes including G-protein coupled receptors and other nonreceptor tyrosine kinases, proteases, and protease inhibitors. The comparative analysis of the tissue-specific proteomes uncovered a large percentage of common accessions (40%), including diverse enzyme classes and proteins localized in the ribosomes, organelles, and cytosol (Table S10). Statistical overrepresentation testing of the proteins unique to leaf samples identified terms associated with cellular biosynthetic processes, including protein translation and amino acid biosynthesis as the most significant overrepresented categories (Figure 5B). Translation-associated proteins corresponded to ribosomal proteins, aminoacyltRNA synthetases, and translation-release factors. Proteins associated with the generation of metabolites and energy included oxidases, metalloproteases, carbohydrate phosphatases, and chloroplastic chlorophyll A-B binding proteins. A similar analysis of the root-specific proteins identified showed a significant overrepresentation of GO terms associated with protein localization, amino acid biosynthesis, and primary carbohydrate metabolism (Figure 5C). Of note, proteins in lipid metabolism were also enriched in root preparations, highlighting the presence of lipid and fatty acidscompounds rarely considered in the context of sweetpotato nutritional qualities due to their low abundance.33

Functional Analysis of the Leaf and Root Sweetpotato Proteomes

To obtain information on the composition of the sweetpotato proteome, we performed a gene ontology (GO) analysis to identify protein classes with significant overrepresentation and differential accumulation across the method or tissue type. Comparing M1 with M2 preparations, we found marked differences regarding both the identity of protein class and the number of members per class (Figure 4A and Table S9). Overall, M1 outperformed M2 in the contribution to members in individual protein classes with 121 classes having more members identified using M1 (average 275 proteins/class), compared with 20 classes receiving more members from M2 (average 170 proteins/class). Proteins belonging to 152 unique protein classes were identified across the two methods, with 11 classes common between M1 and M2. Cytoskeletal proteins, transferases, transfer/carriers, G-proteins, and signaling molecules were among the protein classes with the highest representation in both M1 and M2 preparations. DNA-binding proteins (e.g., polymerases and centromere-binding factors) and small GTPases were preferentially extracted by M1; on the other hand, extracellular matrix proteins and ribonucleases were present only in M2 preparations. Analysis of GO biological processes (BP) terms across methodologies identified 156 unique terms across M1 and M2 (Figure 4B and Table S10). M1 outperformed M2 in the identification of proteins within unique BP terms. Of all examined proteins, M1 and M2 identified all constituents within 52 and 16 BP terms completely, respectively, with 12 terms equally represented by either method. Further, of the 156 BP terms identified, 154 (99%) were populated using hits identified using M1. This is in contrast to the 90 BP terms (58%) which were at least halfway populated using hits identified using M2. Examining GO cellular component (CC) terms yielded similar observations. Within the 44 identified CC terms, 17 and 1 terms were fully populated using M1 and M2, respectively. All CC terms were at least halfway populated

Metabolic Pathways Associated with Sweetpotato Root Proteomes

Unique root-localized sweetpotato proteins identified were mapped on KEGG metabolic and information processing pathways; mapped accessions and the representative enriched KEGG pathways are shown in Table 1 and Figure 6A. Root 2727

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research Table 1. Sweetpotato Gene Homologs of Ipomoea nil in Significantly Enriched KEGG Pathways NCBI gene ID

KEGG gene ID

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109152986 109156821 109157615 109161092 109161467 109168075 109170373 109172049 109172658 109173713 109174894 109176227 109182279 109185388 109186904 109192356 109192482 109193125

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109156098 109156186 109160304 109161355 109176207 109176802 109180845 109182049 109182230 109183836 109186699 109188105 109188366 109192865 109193189

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109156159 109172311 109185370 109189921 109192398

ini: ini: ini: ini: ini:

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109148834 109149365 109149832 109154093 109155125 109156048 109156421 109157052 109159594 109160064 109171635 109172544 109172635 109172682 109172957 109174651 109174794 109177688 109177744

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

description

Endocytosis 109152986 phosphatidylinositol 4-phosphate 5-kinase 9 109156821 Ras-related protein RABE1c-like 109157615 ALG-2 interacting protein X-like 109161092 ALG-2 interacting protein X-like 109161467 Ras-related protein Rab7 109168075 Ras-related protein YPT3 109170373 Ras-related protein Rab7 109172049 ADP-ribosylation factor 2 109172658 phospholipase D delta-like 109173713 vacuolar protein sorting-associated protein 2 homolog 3 109174894 vacuolar protein-sorting-associated protein 37 homolog 1-like 109176227 IST1 homolog 109182279 dynamin-related protein 1E 109185388 AP-2 complex subunit mu 109186904 Ras-related protein RABA2b-like 109192356 Ras-related protein Rab7-like 109192482 Ras-related protein RABF1 109193125 AP-2 complex subunit alpha-1-like Protein Processing in the Endoplasmic Reticulum 109156098 uncharacterized LOC109156098 109156186 SEC12-like protein 2 109160304 dnaJ protein homolog 109161355 dolichyl-diphosphooligosaccharide–protein glycosyltransferase subunit 2-like 109176207 protein disulfide-isomerase 5-2 109176802 protein disulfide-isomerase-like 109180845 protein transport protein SEC31 homolog B-like 109182049 protein transport protein SEC23 109182230 GTP-binding protein SAR1A 109183836 protein transport protein Sec24-like At4g32640 109186699 17.3 kDa class I heat shock protein-like 109188105 protein disulfide isomerase-like 1-6 109188366 protein disulfide isomerase-like 2-3 109192865 protein disulfide-isomerase 5-1 109193189 18.2 kDa class I heat shock protein-like SNARE Interactions in Vesicular Transport 109156159 Golgi SNAP receptor complex member 1-2 109172311 vesicle-associated membrane protein 711-like 109185370 syntaxin-22-like 109189921 syntaxin-132 109192398 25.3 kDa vesicle transport protein Biosynthesis of Amino Acids 109148834 pyruvate kinase, cytosolic isozyme 109149365 enolase 1, chloroplastic 109149832 tryptophan synthase beta chain 2 109154093 probable pyruvate kinase, cytosolic isozyme 109155125 enolase-like 109156048 arogenate dehydratase/prephenate dehydratase 1, chloroplastic-like 109156421 cytosolic enolase 3 109157052 probable ribose-5-phosphate isomerase 2 109159594 2-isopropylmalate synthase A-like 109160064 pyruvate kinase, cytosolic isozyme-like 109171635 bifunctional L-3-cyanoalanine synthase/cysteine synthase 1, mitochondrial 109172544 acetylglutamate kinase, chloroplastic-like 109172635 glutamine synthetase cytosolic isozyme 1-1-like 109172682 3-dehydroquinate synthase, chloroplastic-like 109172957 arogenate dehydratase/prephenate dehydratase 2, chloroplastic 109174651 homoserine kinase-like 109174794 pyruvate kinase 1, cytosolic 109177688 asparagine synthetase [glutamine-hydrolyzing] 2 109177744 phosphoserine aminotransferase 2, chloroplastic-like 2728

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research Table 1. continued NCBI gene ID

KEGG gene ID

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109177825 109178139 109178694 109181333 109182826 109182914 109183736 109185377 109186421 109186666 109186864 109187198 109190645

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

109177825 109178139 109178694 109181333 109182826 109182914 109183736 109185377 109186421 109186666 109186864 109187198 109190645

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109148834 109149365 109154093 109155125 109156421 109160064 109169903 109169905 109170029 109171964 109173542 109174794 109176096 109177574 109177825 109178139 109181333 109182311 109186304 109190645 109191870

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

109148834 109149365 109154093 109155125 109156421 109160064 109169903 109169905 109170029 109171964 109173542 109174794 109176096 109177574 109177825 109178139 109181333 109182311 109186304 109190645 109191870

ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid: ncbi-geneid:

109153694 109155915 109160151 109163160 109170029 109170316 109171891 109172425 109173764 109173850 109175775 109178772 109180346 109186304 109186768 109187211 109193076

ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini: ini:

109153694 109155915 109160151 109163160 109170029 109170316 109171891 109172425 109173764 109173850 109175775 109178772 109180346 109186304 109186768 109187211 109193076

description Biosynthesis of Amino Acids ATP-dependent 6-phosphofructokinase 5, chloroplastic glyceraldehyde-3-phosphate dehydrogenase GAPCP1, chloroplastic-like aspartate aminotransferase, cytoplasmic fructose-bisphosphate aldolase 6, cytosolic-like InTSA; tryptophan synthase alpha chain-like anthranilate synthase alpha subunit 2, chloroplastic-like aspartokinase 2, chloroplastic-like ribose-phosphate pyrophosphokinase 1-like homoserine kinase-like anthranilate synthase alpha subunit 1, chloroplastic-like threonine synthase 1, chloroplastic-like citrate synthase, glyoxysomal-like ATP-dependent 6-phosphofructokinase 6-like Glycolysis/Gluconeogenesis pyruvate kinase, cytosolic isozyme enolase 1, chloroplastic probable pyruvate kinase, cytosolic isozyme enolase-like cytosolic enolase 3 pyruvate kinase, cytosolic isozyme-like alcohol dehydrogenase 3 alcohol dehydrogenase 1-like hexokinase-2, chloroplastic aldehyde dehydrogenase family 2 member B7, mitochondrial aldehyde dehydrogenase family 7 member A1-like pyruvate kinase 1, cytosolic acetate/butyrate–CoA ligase AAE7, peroxisomal phosphoenolpyruvate carboxykinase [ATP]-like ATP-dependent 6-phosphofructokinase 5, chloroplastic glyceraldehyde-3-phosphate dehydrogenase GAPCP1, chloroplastic-like fructose-bisphosphate aldolase 6, cytosolic-like dihydrolipoyl dehydrogenase 2, chloroplastic-like hexokinase-1-like ATP-dependent 6-phosphofructokinase 6-like aldose 1-epimerase-like Starch and Sucrose Metabolism granule-bound starch synthase 2, chloroplastic/amyloplastic glucose-1-phosphate adenylyltransferase small subunit 2, chloroplastic sucrose synthase 3 beta-glucosidase 44-like hexokinase-2, chloroplastic probable starch synthase 4, chloroplastic/amyloplastic alpha,alpha-trehalose-phosphate synthase [UDP-forming] 6 4-alpha-glucanotransferase, chloroplastic/amyloplastic alpha-glucan phosphorylase, H isozyme beta-glucosidase BoGH3B-like beta-glucosidase 1-like probable sucrose-phosphate synthase glucan endo-1,3-beta-glucosidase 2-like hexokinase-1-like ectonucleotide pyrophosphatase/phosphodiesterase family member 3-like beta-amylase-like beta-glucosidase BoGH3B-like

accessions associated with intracellular trafficking include 18 endocytosis-related proteins (e.g., dynamin 1E motor protein and multiple ras-related RAB proteins); 15 accessions with roles in protein folding, targeting, and export from the endoplasmic reticulum (e.g., multiple disulfide-isomerases

and Sec transport proteins); and 5 associated with vesicular transport (e.g., Golgi SNAP receptor and syntaxins) (Figure 6B). A significant number of root-localized proteins mapped on pathways for the biosynthesis of amino acids, with 12 proteins mapping on the pathway for Gly, Ser, and Thr and 8 2729

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

Figure 6. Metabolic and genetic information processing pathways enriched in sweetpotato storage roots. (A) KEGG pathways overrepresented in storage root protein preparations. (B) Diagram of protein folding and sorting pathways. Ipomoea nil homologs of sweetpotato storage root proteins identified (and listed in Table 1) are mapped on pathways for endocytosis, SNARE-mediated vesicular transport, and protein targeting and export from the endoplasmic reticulum. (C) Diagram of starch and sucrose metabolism with Ipomoea nil homologs of sweetpotato storage root proteins (Table 1) mapped on specific reactions. Abbreviations: HXK (hexokinase), SPS (sucrose-phosphate synthase), SUS3 (sucrose synthase 3), B-Glc (beta-glucosidase), EDP3-like (ectonucleotide pyrophosphatase/phosphodiesterase family member 3-like), GPAT (glucose-1-phosphate adenylyltransferase small subunit 2, chloroplastic/amyloplastic), GBSS (granule-bound starch synthase 2, chloroplastic/amyloplastic).

synthase, and other enzymes associated with starch and sucrose metabolism (Figure 6C). Also, previously characterized sweetpotato storage proteins such as amylases35 were also purified from roots, alongside other types of storage proteins such as amyloplastic-localized 1,4-α-glucan-branching enzyme 2−2, U-box domain-containing protein 18, and bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin.

proteins on pathways for Phe, Tyr, and Trp synthesis. Further, when considering carbohydrate metabolism, root proteins mapped on Glycolysis/Gluconeogenesis pathway (21 proteins) and Starch/Sucrose metabolic pathway (17 proteins). Notably, sucrose synthase 3, identified as the most abundant transcript in sweetpotato storage roots34 was present in our root preparations alongside hexokinases, β-glucosidases, starch 2730

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research



DISCUSSION The recent sequencing and annotation of cultivated sweetpotato and its wild relatives have brought forward the need to validate predicted gene models and confirm gene expression using complementary approaches. We report here the results of a broad analysis profiling and comparing sweetpotato leaf and storage root proteomes using liquid chromatography combined with tandem mass spectrometry. The resultant proteome, including over 4300 unique protein accessions, provide a much-needed in-depth perspective on this complex crop plant and constitute a baseline in trait improvement research. We performed a proteogenomics analysis that resulted in mapping 90% of uniquely identified peptides against the sweetpotato genome and transcriptome and predicted 741 new coding regions. The analysis revealed 2056 genomic loci that may contain genome/transcriptome miss-annotations or genomic variants. Although the annotations in the haplotype-resolved genome very well support the proteome reported herein, we demonstrate that a proteogenomics approach could further improve the quality of genomic and proteomic annotations in sweetpotato. Given the genetic and chemical richness of sweetpotato, two independent but complementary protein extraction methods were developed to characterize the proteomes of leaves and storage roots. The phenol-based methodology allowed the extraction of a wide range of protein classes, including enzymes and cytoskeleton- and chromatin-binding proteins associated with intracellular signal transduction, transcription, translation, and various metabolic processes. Notably, proteins were extracted preferentially from the nucleus and cytosol, while proteins associated with membranes, cell wall, and intracellular organelles such as the ER and peroxisomes were not well represented in the final leaf and root preparations. On the other hand, while the PEG4000 fractionation extracted approximately 1.8-fold fewer proteins on average by comparison to phenol preparations, the M2 preparations were enriched in DNA-binding, structural, and extracellular matrix proteins. It is not surprising that higher numbers of peptides/ proteins were identified by the M1 method in both tissues (Figure 2A,B). It is important to note that access to a mass spectrometer with high resolution coupled with the use of optimized analytical columns could significantly increase the sequencing depth of any complex biological sample even in the absence of a prefractionation step.24,36−39 However, the PEG4000 fractionation contributed 483 additional accessions to the final sweetpotato proteome. Thus, our results highlight the necessity of using multiple protocols to optimize the extraction of various classes of proteins having distinct biochemical characteristics and localizations and are in line with similar proteomics studies in plants and other organisms.40−42 Overall, we identified 74 255 peptides that were mapped to 4321 accessions. For sweetpotato, gene count has been estimated to 49 063 through de novo transcriptomics.2 We can estimate that proteins extracted and solubilized using the two protocols assayed on mature leaves and fully developed storage roots, potentially validate 15% of predicted sweetpotato transcripts. The fraction of the proteome expressed in specific organs at a certain stage during development is difficult to estimate. Nevertheless, gel-based and gel-free proteomic studies identified up to 2−3% of proteins out of the estimated leaf proteome of the model plant Arabidopsis.42,43 Further optimization of protein extraction

methodologies, coupled with subcellular fractionation and other cell biology approaches,44 may also improve the coverage of the sweetpotato proteome. Our analysis has revealed many unique proteins associated with the two different plant organs analyzed. Proteins identified in leaf preparations were associated mainly with primary metabolism and translation, highlighting the roles of actively growing leaves in protein biosynthesis and generation of precursor metabolites and energy for the whole plant and paralleling previous studies of leaf proteomes.42 The storage root proteome, by comparison, was most enriched in proteins with roles in primary metabolism, intracellular transport, and protein localization. The sweetpotato storage root has a dual function in carbohydrate storage and vegetative propagation. The process of storage root formation and molecular mechanisms for storage proteins and nutrient accumulation have been studied extensively.45,46 Information on specific pathways necessary for storage root formation and accumulation of nutrients remains, however, limited. Our proteomic analysis reflects the role of the root tissue as a nutrient sink when considering the types of proteins we identified and the biological processes associated with them. Previous proteomics studies of cassava storage roots and potato tubers identified energy/metabolic proteins alongside transport and storage proteins as most enriched categories,47,48 supporting a hypothesis wherein similar pathways and proteomes are active in analogous storage organs. Synthesis and accumulation of carbohydrates is a critical process in the growth and development of plants.49 In particular, the growing and maturation of storage organs require the increased activity of the enzymes for starch synthesis in sweetpotato,34 sugar beet,50 carrots,51 and potato.52 As part of this metabolic pathway, sucrose enters the cytosol of cells of developing storage organs where it is converted to glucose 6-phosphate (G6P)a step catalyzed by sucrose synthasethen to UDP glucose (UDPG) and glucose 1P (G1P). G1P is then utilized for starch synthesis in chloroplasts (in leaf cells) or amyloplasts (in root cells), via the metabolites ADP-glucose and amylose, a process catalyzed by pyrophosphatase, starch synthase, and G1P adenylyltransferase. Aside from starch synthesis, cytosolic hexose metabolites generated by sucrose synthase can enter the pathway for cellulose biosynthesis and subsequent hydrolysis to glucose, where the final step is catalyzed by β-glucosidase. Sweetpotato root preparations were enriched in the enzymes catalyzing both the cytosolic and amyloplastic branches of the sucrose cleavage/starch accumulation pathway, a strong indication of the importance of this process in sweetpotato storage root expansion and maturation. Further, sweetpotato root preparations were enriched in proteins acting in the endocytic pathway and the ER−Golgi− Trans Golgi (TGN) vesicle trafficking. While the secretory pathway has been well characterized and proven to be critical in the process of storage protein post-translational processing and accumulation in developing seeds (reviewed in ref 53), the molecular components and mechanisms for storage protein production in storage roots are not well-defined. In seeds, the accumulation of storage substances occurs in fully differentiated cells that have ceased cell division. Storage proteins are assembled in the ER where they are folded and oligomerized before being transported through the endomembrane system (Golgi and TGN) and entering secretory vesicles that will form the protein storage vacuoles. Identification of 2731

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research sweetpotato proteins associated with clathrin-dependent endocytosis, early and late endosomal marker proteins, and membrane-associated SNAREs involved in vesicle fusion during vesicular transport, argue for the importance of these components in sweetpotato root development. Also, our study builds a cellular map of the likely routes for protein and nutrient storing and utilization in storage roots. One important application of proteogenomics is to validate and improve genomic information. In model and crop plants, missing and incorrect gene models were revised,54−57 and stimulus-responsive alternatively spliced transcripts were identified58 using proteomic data. We show here that a proteogenomics approach is particularly useful for sweetpotato and by extension for crops with complex polyploid genomes. Our analysis provides orthogonal evidence for currently annotated genes and transcripts and possible translation of novel ORFs in intergenic sequences and splice variants in annotated genes. To conclude, the protein extraction workflow alongside the proteogenomics analysis described here allowed a thorough characterization of the sweetpotato leaf and root proteomes. A significant proportion of sweetpotato proteins were identified from matches with estimated proteomes of other Ipomoea species and directly mapped on sweetpotato transcriptome, thus validating a significant percentage of predicted genes. These results represent the first successful attempt to investigate the sweetpotato proteome and thus can be considered a reference map for further biochemical characterization and biofortification of this crop.





imperfectly mapped against the sweetpotato transcriptome. Table S4. The table used to generate Figure 3B. Table S5. List of peptides perfectly mapped against the sweetpotato genome. Table S6. List of peptides imperfectly mapped against the sweetpotato genome. Table S7. List of peptides perfectly mapped against the genomic regions lacking gene predictions. Table S8. List of peptides imperfectly mapped against the genomic regions lacking gene predictions. Table S9. Gene ontology analysis of identified sweetpotato protein classes across methods and tissue types. Table S10. Analysis of biological processes (BP), molecular functions (MF), and cellular component (CC) of the identified sweetpotato proteins across methods and tissue types(XLSX) Data set S1. The complete mass spectrometry data set with peptide and the associated proteins identified using method 1 (M1) (XLSX) Data set S2. The complete mass spectrometry data set with peptide and the associated proteins identified using method 2 (M2) (XLSX)

AUTHOR INFORMATION

Corresponding Authors

*E-mail for S.C.P.: [email protected]. Tel.: (662) 3257735. *E-mail for G.V.P.: [email protected]. Tel.: (662) 3257369. ORCID

ASSOCIATED CONTENT

Sorina C. Popescu: 0000-0001-5780-8252

S Supporting Information *

Author Contributions

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.8b00943. Figure S1. Novel peptides mapped on the sweetpotato genome.Visualization is in the Integrated Genome Browser with the annotations tracks: 1) I. batatas Genome sequence, 2) I. batatas annotated transcript, 3) I. batatas transfrags, 4) Perfect matched peptides solely against the genome (S7), 5) Partially matched peptides solely against the genome (S8), 6) perfect matched peptides (Table S5) and 7) partially matched peptides (Table S6) represented (PDF) File SF1. Annotation file for the peptides perfectly mapped against the sweetpotato genome (.bed format). File SF2. Annotation file for the peptides imperfectly mapped against the sweetpotato genome (.bed format). File SF3. Annotation file for the peptides perfectly mapped against the genomic regions lacking gene predictions (.bed format). File SF4. Annotation file for the peptides imperfectly mapped against the genomic regions lacking gene predictions (.bed format) (ZIP) Table S1. Worksheet Proteomic Data: Quantitative results of root and leaf samples proteomic analysis. Worksheet PSM Data: PSM results with matched protein information from the selected source databases. Worksheet Proteome Comparisons: A comparison of the current study with literature-derived sweetpotato proteome data. (the table used to generate Figure 2D). Table S2. List of peptides perfectly mapped against the sweetpotato transcriptome. Table S3. List of peptides

T.A. performed tissue processing and protein extraction methods. N.A. performed mass spectrometry and protein identification. N.T.B. and G.V.P. implemented bioinformatics methods. G.D., N.T.B., K.R.R., and M.S. grew plants and collected the plant material. R.R. and M.S. selected sweetpotato cultivars, supervised the growth conditions, and analyzed plant phenotypes. S.C.P., G.V.P., and N.A. performed the mining and functional analysis of the data and research design. S.C.P. and G.V.P. wrote the manuscript; N.A., N.T.B., T.A., and G.D. edited and contributed to the writing. Notes

The authors declare no competing financial interest. Data are available via ProteomeXchange with identifier PXD012999.



ACKNOWLEDGMENTS Funding for this work was provided by the USDA-National Institute of Food and Agriculture Hatch project MIS-145120 and 043040, and the Mississippi Agricultural & Forestry Experiment Station to S.C.P., R.K.R., and M.S. G.V.P. acknowledges the support from the USDA-ARS Genomics and Bioinformatics Research Unit through the Big Data: Biocomputing, Bioinformatics, and Biological Discovery project 6066-21310-004-25-S.



REFERENCES

(1) Islam, S., Sweetpotato (Ipomoea batatas L.) leaf: its potential effect on human health and nutrition. J. Food Sci. 2006, 71 (2).R13 (2) Yang, J.; Moeinzadeh, M.-H.; Kuhl, H.; Helmuth, J.; Xiao, P.; Haas, S.; Liu, G.; Zheng, J.; Sun, Z.; Fan, W.; et al. Haplotype-

2732

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research resolved sweet potato genome traces back its hexaploidization history. Nature plants 2017, 3 (9), 696. (3) Hoshino, A.; Jayakumar, V.; Nitasaka, E.; Toyoda, A.; Noguchi, H.; Itoh, T.; Shin-I, T.; Minakuchi, Y.; Koda, Y.; Nagano, A. J.; et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat. Commun. 2016, 7, 13295. (4) Hirakawa, H.; Okada, Y.; Tabuchi, H.; Shirasawa, K.; Watanabe, A.; Tsuruoka, H.; Minami, C.; Nakayama, S.; Sasamoto, S.; Kohara, M.; et al. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (HBK) G. DNA Res. 2015, 22 (2), 171−179. (5) Lee, J. J.; Park, K. W.; Kwak, Y.-S.; Ahn, J. Y.; Jung, Y. H.; Lee, B.-H.; Jeong, J. C.; Lee, H.-S.; Kwak, S.-S. Comparative proteomic study between tuberous roots of light orange-and purple-fleshed sweetpotato cultivars. Plant Sci. 2012, 193, 120−129. (6) Lee, J. J.; Kim, Y.-H.; Kwak, Y.-S.; An, J. Y.; Kim, P. J.; Lee, B. H.; Kumar, V.; Park, K. W.; Chang, E. S.; Jeong, J. C.; et al. A comparative study of proteomic differences between pencil and storage roots of sweetpotato (Ipomoea batatas (L.) Lam.). Plant Physiol. Biochem. 2015, 87, 92−101. (7) Jiang, Y.; Chen, C.; Tao, X.; Wang, J.; Zhang, Y. A proteomic analysis of storage stress responses in Ipomoea batatas (L.) Lam. tuberous root. Mol. Biol. Rep. 2012, 39 (8), 8015−8025. (8) Urbany, C.; Colby, T.; Stich, B.; Schmidt, L.; Schmidt, J. r.; Gebhardt, C. Analysis of natural variation of the potato tuber proteome reveals novel candidate genes for tuber bruising. J. Proteome Res. 2012, 11 (2), 703−716. (9) Shekhar, S.; Mishra, D.; Gayali, S.; Buragohain, A. K.; Chakraborty, S.; Chakraborty, N. Comparison of proteomic and metabolomic profiles of two contrasting ecotypes of sweetpotato (Ipomoea batata L.). J. Proteomics 2016, 143, 306−317. (10) Ishida, H.; Suzuno, H.; Sugiyama, N.; Innami, S.; Tadokoro, T.; Maekawa, A. Nutritive evaluation on chemical components of leaves, stalks and stems of sweet potatoes (Ipomoea batatas poir). Food Chem. 2000, 68 (3), 359−367. (11) Shekhar, S.; Mishra, D.; Buragohain, A. K.; Chakraborty, S.; Chakraborty, N. Comparative analysis of phytochemicals and nutrient availability in two contrasting cultivars of sweet potato (Ipomoea batatas L.). Food Chem. 2015, 173, 957−965. (12) Lee, H. M.; Gupta, R.; Kim, S. H.; Wang, Y.; Rakwal, R.; Agrawal, G. K.; Kim, S. T. Abundant storage protein depletion from tuber proteins using ethanol precipitation method: Suitability to proteomics study. Proteomics 2015, 15 (10), 1765−1769. (13) Wang, S.; Pan, D.; Lv, X.; Song, X.; Qiu, Z.; Huang, C.; Huang, R.; Chen, W. Proteomic approach reveals that starch degradation contributes to anthocyanin accumulation in tuberous root of purple sweet potato. J. Proteomics 2016, 143, 298−305. (14) Salekdeh, G. H.; Komatsu, S. Crop proteomics: aim at sustainable agriculture of tomorrow. Proteomics 2007, 7 (16), 2976− 2996. (15) Hashiguchi, A.; Ahsan, N.; Komatsu, S. Proteomics application of crops in the context of climatic changes. Food Res. Int. 2010, 43 (7), 1803−1813. (16) Ahsan, N.; Donnart, T.; Nouri, M.-Z.; Komatsu, S. Tissuespecific defense and thermo-adaptive mechanisms of soybean seedlings under heat stress revealed by proteomic approach. J. Proteome Res. 2010, 9 (8), 4189−4204. (17) Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 2014, 11 (11), 1114. (18) Agrawal, G. K.; Pedreschi, R.; Barkla, B. J.; Bindschedler, L. V.; Cramer, R.; Sarkar, A.; Renaut, J.; Job, D.; Rakwal, R. Translational plant proteomics: A perspective. J. Proteomics 2012, 75 (15), 4588− 4601. (19) Acquadro, A.; Falvo, S.; Mila, S.; Giuliano Albo, A.; Comino, C.; Moglia, A.; Lanteri, S. Proteomics in globe artichoke: protein extraction and sample complexity reduction by PEG fractionation. Electrophoresis 2009, 30 (9), 1594−1602. (20) Hurkman, W. J.; Tanaka, C. K. Solubilization of plant membrane proteins for analysis by two-dimensional gel electrophoresis. Plant Physiol. 1986, 81 (3), 802−806.

(21) Lee, D. G.; Ahsan, N.; Lee, S. H.; Kang, K. Y.; Bahk, J. D.; Lee, I. J.; Lee, B. H. A proteomic approach in analyzing heat-responsive proteins in rice leaves. Proteomics 2007, 7 (18), 3369−3383. (22) Yu, K.; Salomon, A. R. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information. Proteomics 2009, 9 (23), 5350−5358. (23) Yu, K.; Salomon, A. R. HTAPP: High-throughput autonomous proteomic pipeline. Proteomics 2010, 10 (11), 2113−2122. (24) Ahsan, N.; Belmont, J.; Chen, Z.; Clifton, J. G.; Salomon, A. R. Highly reproducible improved label-free quantitative analysis of cellular phosphoproteome by optimization of LC-MS/MS gradient and analytical column construction. J. Proteomics 2017, 165, 69−74. (25) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207. (26) Yu, K.; Sabelli, A.; DeKeukelaere, L.; Park, R.; Sindi, S.; Gatsonis, C. A.; Salomon, A. Integrated platform for manual and highthroughput statistical validation of tandem mass spectra. Proteomics 2009, 9 (11), 3115−3125. (27) Perez-Riverol, Y.; Csordas, A.; Bai, J.; Bernal-Llinares, M.; Hewapathirana, S.; Kundu, D. J.; Inuganti, A.; Griss, J.; Mayer, G.; Eisenacher, M. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 2019, 47 (D1), D442−D450. (28) Zhou, K.; Panisko, E. A.; Magnuson, J. K.; Baker, S. E.; Grigoriev, I. V., Proteomics for validation of automated gene model predictions. In Mass Spectrometry of Proteins and Peptides; Springer: 2009; pp 447−452. (29) Robinson, J. T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E. S.; Getz, G.; Mesirov, J. P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29 (1), 24. (30) Mi, H.; Muruganujan, A.; Casagrande, J. T.; Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 2013, 8 (8), 1551. (31) Seem, J. E.; Creamer, N. G.; Monks, D. W. Critical weed-free period for ‘Beauregard’sweetpotato (Ipomoea batatas). Weed Technol. 2003, 17 (4), 686−695. (32) Afroz, A.; Khan, M. R.; Ahsan, N.; Komatsu, S. Comparative proteomic analysis of bacterial wilt susceptible and resistant tomato cultivars. Peptides 2009, 30 (9), 1600−1607. (33) Abu, O. A.; Tewe, O. O.; Losel, D. M.; Onifade, A. A. Changes in lipid, fatty acids and protein composition of sweet potato (Ipomoea batatas) after solid-state fungal fermentation. Bioresour. Technol. 2000, 72 (2), 189−192. (34) Li, X.-Q.; Zhang, D. Gene Expression Activity and Pathway Selection for Sucrose Metabolism in Developing Storage Root of Sweet Potato. Plant Cell Physiol. 2003, 44 (6), 630−636. (35) Hagenimana, V.; Vezina, L. P.; Simard, R. E. Distribution of amylases within sweet potato (Ipomoea batatas L.) root tissue. J. Agric. Food Chem. 1992, 40 (10), 1777−1783. (36) Michalski, A.; Damoc, E.; Hauschild, J.-P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell. Proteomics 2011, 10 (9), M111.011015. (37) Sun, L.; Zhu, G.; Dovichi, N. J. Comparison of the LTQOrbitrap Velos and the Q-Exactive for proteomic analysis of 1−1000 ng RAW 264.7 cell lysate digests. Rapid Commun. Mass Spectrom. 2013, 27 (1), 157−162. (38) Williamson, J. C.; Edwards, A. V. G.; Verano-Braga, T.; Schwämmle, V.; Kjeldsen, F.; Jensen, O. N.; Larsen, M. R. Highperformance hybrid Orbitrap mass spectrometers for quantitative proteome analysis: Observations and implications. Proteomics 2016, 16 (6), 907−914. (39) Levy, M. J.; Washburn, M. P.; Florens, L. Probing the Sensitivity of the Orbitrap Lumos Mass Spectrometer Using a Standard Reference Protein in a Complex Background. J. Proteome Res. 2018, 17 (10), 3586−3592. 2733

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734

Article

Journal of Proteome Research

B.; Zhou, R.; Liu, T.-Y.; Fan, T.; Gao, B.; Zhang, D.; Hao, G.-F.; Xiao, S.; Liu, Y.-G.; Zhang, J. Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings. Plant J. 2017, 91 (3), 518−533.

(40) Karthikaichamy, A.; Deore, P.; Rai, V.; Bulach, D.; Beardall, J.; Noronha, S.; Srivastava, S. Time for Multiple Extraction Methods in Proteomics? A Comparison of Three Protein Extraction Methods in the Eustigmatophyte Alga Microchloropsis gaditana CCMP526. OMICS: A Journal of Integrative Biology 2017, 21 (11), 678−683. (41) Niu, L.; Yuan, H.; Gong, F.; Wu, X.; Wang, W., Protein Extraction Methods Shape Much of the Extracted Proteomes. Front. Plant Sci. 2018, 9 (802). DOI: 10.3389/fpls.2018.00802 (42) Maldonado, A. M.; Echevarría-Zomeño, S.; Jean-Baptiste, S.; Hernández, M.; Jorrín-Novo, J. V. Evaluation of three different protocols of protein extraction for Arabidopsis thaliana leaf proteome analysis by two-dimensional electrophoresis. J. Proteomics 2008, 71 (4), 461−472. (43) Baerenfaller, K.; Grossmann, J.; Grobei, M. A.; Hull, R.; HirschHoffmann, M.; Yalovsky, S.; Zimmermann, P.; Grossniklaus, U.; Gruissem, W.; Baginsky, S. Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics. Science 2008, 320 (5878), 938−941. (44) Takác,̌ T.; Š amajová, O.; Š amaj, J. Integrating cell biology and proteomic approaches in plants. J. Proteomics 2017, 169, 165−175. (45) Hattori, T.; Nakagawa, S.; Nakamura, K. High-level expression of tuberous root storage protein genes of sweet potato in stems of plantlets grown in vitro on sucrose medium. Plant Mol. Biol. 1990, 14 (4), 595−604. (46) Matsuoka, K.; Matsumoto, S.; Hattori, T.; Machida, Y.; Nakamura, K. Vacuolar targeting and posttranslational processing of the precursor to the sweet potato tuberous root storage protein in heterologous plant cells. J. Biol. Chem. 1990, 265 (32), 19750−19757. (47) Lehesranta, S. J.; Davies, H. V.; Shepherd, L. V.; Nunan, N.; McNicol, J. W.; Auriola, S.; Koistinen, K. M.; Suomalainen, S.; Kokko, H. I.; Kärenlampi, S. O. Comparison of tuber proteomes of potato varieties, landraces, and genetically modified lines. Plant Physiol. 2005, 138 (3), 1690−1699. (48) Sheffield, J.; Taylor, N.; Fauquet, C.; Chen, S. The cassava (Manihot esculenta Crantz) root proteome: Protein identification and differential expression. Proteomics 2006, 6 (5), 1588−1598. (49) Koch, K. Sucrose metabolism: regulatory mechanisms and pivotal roles in sugar sensing and plant development. Curr. Opin. Plant Biol. 2004, 7 (3), 235−246. (50) Fieuw, S.; Willenbrink, J. Sugar transport and sugarmetabolizing enzymes in sugar beet storage roots (Beta vulgaris ssp. altissima). J. Plant Physiol. 1990, 137 (2), 216−223. (51) Tang, G.-Q.; Lüscher, M.; Sturm, A. Antisense Repression of Vacuolar and Cell Wall Invertase in Transgenic Carrot Alters Early Plant Development and Sucrose Partitioning. Plant Cell 1999, 11 (2), 177−189. (52) Geigenberger, P.; Hajirezaei, M.; Geiger, M.; Deiting, U.; Sonnewald, U.; Stitt, M. Overexpression of pyrophosphatase leads to increased sucrose degradation and starch synthesis, increased activities of enzymes for sucrose-starch interconversions, and increased levels of nucleotides in growing potato tubers. Planta 1998, 205 (3), 428−437. (53) Vitale, A.; Denecke, J. The Endoplasmic ReticulumGateway of the Secretory Pathway. Plant Cell 1999, 11 (4), 615−628. (54) Castellana, N. E.; Payne, S. H.; Shen, Z.; Stanke, M.; Bafna, V.; Briggs, S. P. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (52), 21034−21038. (55) Helmy, M.; Tomita, M.; Ishihama, Y. OryzaPG-DB: rice proteome database based on shotgun proteogenomics. BMC Plant Biol. 2011, 11 (1), 63. (56) Chapman, B.; Bellgard, M. Plant Proteogenomics: Improvements to the Grapevine Genome Annotation. Proteomics 2017, 17 (21), 1700197. (57) Castellana, N. E.; Shen, Z.; He, Y.; Walley, J. W.; Cassidy, C. J.; Briggs, S. P.; Bafna, V. An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays. Mol. Cell. Proteomics 2014, 13 (1), 157−167. (58) Zhu, F.-Y.; Chen, M.-X.; Ye, N.-H.; Shi, L.; Ma, K.-L.; Yang, J.F.; Cao, Y.-Y.; Zhang, Y.; Yoshida, T.; Fernie, A. R.; Fan, G.-Y.; Wen, 2734

DOI: 10.1021/acs.jproteome.8b00943 J. Proteome Res. 2019, 18, 2719−2734