Article pubs.acs.org/jpr
Quantitative Proteomics Analysis of Camelina sativa Seeds Overexpressing the AGG3 Gene to Identify the Proteomic Basis of Increased Yield and Stress Tolerance Sophie Alvarez,† Swarup Roy Choudhury,† Kumaran Sivagnanam,‡ Leslie M. Hicks,‡ and Sona Pandey*,† †
Donald Danforth Plant Science Center, 975 North Warson Road, St. Louis, Missouri 63132, United States Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599, United States
‡
S Supporting Information *
ABSTRACT: Camelina sativa, a close relative of Arabidopsis, is an oilseed plant that is emerging as an important biofuel resource. The genome and transcriptome maps of Camelina have become available recently, but its proteome composition remained unexplored. A labeling LC-based quantitative proteomics approach was applied to decipher the Camelina seed proteome, which led to the identification of 1532 proteins. In addition, the effect of overexpression of the Arabidopsis G-protein γ subunit 3 (AGG3) on the Camelina seed proteome was elucidated to identify the proteomic basis of its increased seed size and improved stress tolerance. The comparative analysis showed a significantly higher expression of proteins involved in primary and secondary metabolism, nucleic acid and protein metabolism, and abscisic acid related responses, corroborating the physiological effects of AGG3 overexpression. More importantly, the proteomic data suggested involvement of the AGG3 protein in the regulation of oxidative stress and heavy metal stress tolerance. These observations were confirmed by the physiological and biochemical characterization of AGG3-overexpressing seeds, which exhibit a higher tolerance to exogenous cadmium in a glutathione-dependent manner. The activity of multiple redox-regulating enzymes is higher in seeds expressing enhanced levels of AGG3. Overall, these data provide critical evidence for the role of redox regulation by the AGG3 protein in mediating important seed-related traits. KEYWORDS: quantitative proteomics, iTRAQ, Camelina sativa, seed proteome, AGG3, redox regulation
■
INTRODUCTION Global population growth is leading to an ever-increasing demand for food, feed, and fuel. Higher fuel production has additional challenges because traditional, nonrenewable resources are depleting fast, and rising demands are not only related to population growth but also to lifestyle changes worldwide. Not surprisingly, the development of unconventional and renewable sources of fuel in the forms of solar and wind energy as well as biofuels is more important than ever, and concerted efforts are needed to enable the development of alternative fuels in a timely manner. Camelina sativa (Camelina, false flax, or gold of pleasure) is an oil seed plant of the Brassicaceae family that is emerging as an important biofuel resource.1 Camelina seeds’ high oil content (35−45%, with >90% unsaturated fatty acids) combined with the plant’s short life cycle, low agricultural input requirement, relative tolerance to a variety of biotic and abiotic stresses, and accessibility to genetic manipulations makes the plant an attractive crop for large-scale biofuel production on marginal lands.1−3 The generation of genomics and transcriptomics resources and the evaluation of the natural genetic diversity that exists in different populations have lead the way in efforts to improve the yield potential of Camelina.1,2,4−6 Moreover, given © XXXX American Chemical Society
the high degree of similarity that exists between the Arabidopsis and Camelina genes,3 Camelina may serve as an excellent crop system for translating the knowledge gained from Arabidopsis research over the years to improve its productivity. Many Arabidopsis genes transformed in Camelina have resulted in the expected phenotypes, leading to improved biomass and oil yield.2,7−9 We have previously reported that the overexpression of Arabidopsis G-protein γ subunit 3 (AGG3) results in increased seed and oil production and improved stress tolerance in C. sativa.9 AGG3 is an integral component of the heterotrimeric G-protein signaling complex.10−12 This signaling complex is present in all eukaryotes and regulates a range of fundamental growth and development pathways in each of the species.13−16 Three distinct subunits (Gα, Gβ, and Gγ) constitute the heterotrimer. In metazoan systems, each of the subunits is represented by multiple copies, and the resultant heterotrimeric combinations exhibit enormous diversity.17,18 In plant systems, most of the diversity results from the Gγ proteins because most diploid plants have a single Gα and Gβ.13 The Gγ proteins have Received: February 16, 2015
A
DOI: 10.1021/acs.jproteome.5b00150 J. Proteome Res. XXXX, XXX, XXX−XXX
Article
Journal of Proteome Research
ing previous physiological observations and implying the role of AGG3 in regulating the overall redox status of the plants. Additional proteins related to the regulation of cell division and signaling pathways offer clues to the possible mechanism of yield enhancement by AGG3 protein.
been divided into three groups; although the Group I and Group II proteins are relatively similar to the proteins present in other organisms, group III Gγ proteins, of which AGG3 is the founding member, are plant-specific Gγ proteins.10,11,13,19,20 These proteins are almost twice as large as typical Gγs; the Nterminal half shares the defining characteristics of canonical Gγ proteins, but the extended C-terminal region is plant-specific and is extremely rich in the amino acid cysteine (Cys).11,19 Studies in Arabidopsis have revealed that both N- and Cterminal regions are required for the protein to function.12,21 Group III Gγ proteins control important yield parameters in plants. Multiple naturally occurring mutations in rice homologues of group III Gγ genes have been implicated in the regulation of seed size and branching as well as nitrogen-use efficiency.22−28 The Arabidopsis gene was identified as a regulator of abiotic stress tolerance, especially via controlling abscisic acid mediated responses.10 The overexpression of AGG3 in Camelina resulted in higher biomass production, a significant increase in seed yield (and, consequently, the oil content per plant), as well as improved stress tolerance.9 Transgenic plants grew faster due to an increase in the rate of cell division and improved photosynthetic as well as water-use efficiency.9 However, the molecular basis of these physiological responses, especially those related to increasing seed size and number, remains unclear. Seeds are the most economically important parts of the Camelina plant. Understanding the gene and protein networks that regulate seed physiology is critically important from the perspective of improving both the oil yield and the oil quality. The transcriptome of C. sativa seed was recently described and offers many insights into the conserved and unique aspects of Camelina seed gene expression patterns during both development and germination.4,5,29 A reference genome sequence of Camelina has confirmed its highly undifferentiated hexaploid nature and suggested that Camelina is more similar to polyploid crop species such as brassica, cotton, and wheat.3 Incidentally, C. sativa has one of the highest numbers of protein coding genes reported to date for any fully sequenced plant genome. A total of 89 421 genes are present in the C. sativa genome (641 MB), in comparison to 27 416 genes in Arabidopsis thaliana.3 In contrast to the genomic and transcriptomic studies, no proteomic-scale data exist to date for C. sativa. To fill this knowledge gap, we have utilized a quantitative proteomics approach to map the C. sativa seed proteome. Furthermore, in an attempt to elucidate the proteomic basis of yield enhancement due to the overexpression of AGG3, we have compared the seed proteome of wild-type Camelina seeds with three different overexpression lines. The use of 4-plex iTRAQ allowed for the simultaneous labeling and detection of the four samples used in this comparison: empty vector (EV)-containing Camelina seeds labeled with reagent 114 and seeds from three different transgenic lines constitutively overexpressing AGG3 (AGG3OE)9 labeled with reagent 115, 116, and 117, respectively. The quantitation results of proteins from all samples were combined to generate a detailed seed proteome map of C. sativa. In addition, quantitative protein abundance differences between EV and transgenic lines were analyzed for the effect of AGG3 overexpression on the proteome changes in the context of its proposed physiological roles. Comparative analysis of the protein abundance changes in the EV plants versus AGG3OE transgenic plants reveals the overrepresentation of proteins related to primary and secondary metabolism, stress response, and oxidation reduction processes, corroborat-
■
MATERIALS AND METHODS
C. sativa Growth
A total of three independent C. sativa (variety Suneson) lines overexpressing the Arabidopsis AGG3 gene under the control of the constitutive CaMV promoter (AGG3OE) as well as an empty vector (EV) control line were grown in identical conditions. Seeds were harvested at maturity and used for protein extraction, the results of which were used for proteomics experiments and for enzyme activity assays. For the growth assays, mature dried seeds were sterilized as previously described9 and plated on 1% agar containing 0.5 × MS media (control media, Caisson Laboratories, UT, USA), or control media supplemented with CdCl2 (100 μM) and buthionine sulfoximide (BSO, 2 mM, Sigma, MO, USA). After being stratified at 4 °C for 48 h, the seeds were germinated, grown in darkness for 2 days, and photographed, and their seedling lengths were measured. Protein Extraction
Protein extraction experiments were performed with 0.3 g of mature, dry C. sativa seeds using the published protocol.30 The final pellet was vacuum-dried and solubilized in 200−400 μL of buffer containing 7 M (w/v) urea, 2 M (w/v) thiourea, 40 mM DTT, and 1% (v/v) protease inhibitor mixture (Roche, IN, USA) dissolved in 100 mM HEPES (pH 8.5) for 1 h at room temperature. Insoluble material was removed by centrifugation at 20000g for 30 min. Protein concentrations were determined using the CB-X kit (G-Biosciences, MO, USA) according to the manufacturer’s instructions. Proteins were digested in solution immediately or stored at −80 °C until further processing. Protein Digestion and Labeling
Protein digestion was performed as described previously31 with the following modifications. Briefly, the extracted proteins were reduced with 10 mM DTT at 37 °C for 2 h and alkylated with 55 mM iodoacetamide for 45 min at room temperature in darkness. The samples were then diluted to 1 M urea with 100 mM HEPES (pH 8.5), and proteomics grade trypsin (Sigma) was added to a final enzyme to substrate ratio of 1:20 (w/w). The trypsin digest was incubated at 37 °C for 12 h. After digestion, the peptide mixture was cleaned with a Sep-Pak C18 cartridge (Waters Corporation, MA, USA) according to the manufacturer’s instructions. Isobaric tagging for relative and absolute quantitation (iTRAQ) labeling experiments were performed using 100 μg of each digested sample according to the manufacturer’s instructions (AB Sciex, CA, USA). The EVcontaining seed proteins were labeled with reagent 114, and the three AGG3OE line seed proteins were labeled with reagents 115, 116, and 117, respectively. Labeling reactions were quenched with 1% formic acid (FA), and samples from the same group were pooled together. The pooled sample was dried completely and resuspended in 0.5% TFA, followed by cleaning with a Sep-Pak C18 cartridge. The samples were lyophilized and dissolved in 100 μL of 20% ACN with 5 mM ammonium formate (pH 2.7) for separation by strong cation exchange (SCX). For SCX, 100 μL injections of iTRAQ-labeled pooled peptides were fractionated on a HPLC system B
DOI: 10.1021/acs.jproteome.5b00150 J. Proteome Res. XXXX, XXX, XXX−XXX
Article
Journal of Proteome Research
homology threshold cutoff of 0.0062. Proteins with a fold change 1.2 in all three AGG3OE lines were considered as differentially expressed proteins. Because three independent overexpression lines were analyzed, relatively larger variations in protein expression levels are expected and, therefore, the variation between the replicates was overlooked. The GO annotation of the proteins identified from Camelina seeds was performed using the TAIR GO annotation at https://www. arabidopsis.org/tools/bulk/go/index.jsp. The full set of terms and GO Slim annotations with some modifications were used to functionally categorize the proteins.
(Shimadzu, Kyoto, Japan) equipped with a polysulfethyl column (4.6 mm i.d. × 200 mm, 5 μm, 200 Å, The Nest Group, MA, USA) flowing at 1 mL/min. The gradient was from 100% solvent A (20% ACN, 5 mM ammonium formate, pH 2.7) to 20% solvent B (20% ACN, 0.5 M ammonium formate, pH 3) in 20 min; it then increased from 20% to 100% B in 5 min, and was finally held at 100% B for 5 min. Fractions were collected every 1 min, lyophilized, and dissolved in 20 μL of 5% ACN and 0.1% FA. Fractions from 1 to 25 min were analyzed by nano-LC−MS/MS. LC−MS/MS
Enzyme Activity Assays
iTRAQ-labeled samples were analyzed using an AB Sciex Triple TOF 5600 mass spectrometer coupled with a Waters nanoAcquity UPLC. Each sample (5 μL) was loaded onto a trap column (nanoAcquity 2G−V/M Trap, 5 μm, symmetry C18, 180 μm × 20 mm, Waters) at a flow rate of 15 μL/min for 1 min. Peptide separation was carried out on a C18 column (nanoAcquity, 1.7 μm, C18, 100 μm × 100 mm, Waters) at a flow rate of 0.3 μL/min. Peptides from the iTRAQ samples were separated using a 120 min linear gradient ranging from 2% to 85% B (mobile phase A, 0.1% FA in water; mobile phase B, 0.1% FA in ACN). A full MS scan analysis and product ion MS/MS analysis were performed using information-dependent acquisition (IDA) experiments in high resolution mode (>30 000). A cycle of one full survey scan (MS) (350−1600 m/z) was followed by multiple tandem mass spectra (MS/MS) applied using a rolling collision energy (RCE) spread, which is relative to the m/z ratio and charge state of the precursor ion. Data was acquired for 120 min with a cycle time of 2.0 s (a total of 9585 cycles). The maximum number of candidate ions monitored per cycle was 20, and the ion tolerance was 100 ppm. The switch criteria were set to exclude former target ions for 8 s and to include isotopes within 2 Da. The raw data files (.wiff) were converted to mascot generic files (.mgf) for protein identification and quantification.
Dry seeds (0.25 g) from EV-containing and AGG3OE lines were homogenized in 1 mL of ice-cold 50 mm sodium phosphate buffer (pH 7.5) containing 1 mm polyethylene glycol, 8% (w/v) polyvinylpolypyrolydone, 1 mm PMSF, and 0.01% (v/v) Triton X-100. Homogenates were centrifuged at 14000g for 30 min at 4 °C, and supernatants were collected for the measurement of the activities of antioxidant enzymes. Glutathione reductase (GR), glutathione S-transferase (GST), ascorbate peroxidase (APX), peroxidase (POX), and catalase (CAT) activities were measured according to previously described methods.32−36
■
RESULTS
Detailed Inventory of the Camelina Seed Proteome
Mature seeds are economically the most significant part of Camelina because of their use as a potential source of biofuel. The genome sequence of Camelina has become recently available, and many studies related to its transcript profiling or fatty acid biosynthesis and regulation have also been reported.4−6,8,29,37,38 However, to date, no information exists on its proteome composition or identity. In this study, mature, dry seeds were used to determine the Camelina proteome composition using a gel-free, labeling-based proteomics approach. A total of three biological replicates of the overexpression lines and the empty vector (EV) were processed using iTRAQ labeling followed by LC−MS/MS. A total of 1532 proteins were confidently identified using the criteria described in the Materials and Methods (Table S1 in the Supporting Information). Due to the high degree of sequence conservation between Arabidopsis and Camelina genes, the peptide sequences obtained from the Camelina seed samples were matched using the CLUSTALW alignment to the TAIR database to annotate the proteins. This analysis shows confident sequence alignment for all the proteins identified with an E value of at least