Integrated Analysis of Protein Abundance, Transcript Level, and

Subscriber access provided by UNIV OF DURHAM

Article

An integrated analysis of protein abundance, transcript level and tissue diversity to reveal developmental regulation of maize Haitao Jia, Wei Sun, Manfei Li, and Zuxin Zhang J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00586 • Publication Date (Web): 18 Dec 2017 Downloaded from http://pubs.acs.org on December 19, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

An integrated analysis of protein abundance, transcript level and tissue diversity to reveal developmental regulation of maize

Haitao Jia1, Wei Sun1, Manfei Li1 and Zuxin Zhang1,* 1

National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University,

Wuhan 430070, P.R. China

*To whom correspondence should be addressed: Email: [email protected], Tel: +086 027 8728 2689

ORCID iD Haitao Jia: 0000-0003-1409-6288 Wei Sun: 0000-0001-9375-049X Manfei Li: 0000-0003-3013-183X Zuxin Zhang: 0000-0001-8697-1681

1 / 34

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 38

Abstract: The differentiation and subsequent development of plant tissues or organs are tightly regulated at

multiple

levels,

including

the

transcriptional,

posttranscriptional, translational, and posttranslational levels. Transcriptomes define many of the tissue-specific gene expression patterns in maize, and some key genes and their regulatory networks have been established at the transcriptional level. In this study, the sequential window acquisition of all theoretical spectra-mass spectrometry technique was employed as a quantitative proteome assay of four representative maize tissues, and a set of high confidence proteins were identified. Integrated analysis of the proteome and transcriptome revealed that protein abundance was positively correlated with mRNA level with weak to moderate correlation coefficients, but the abundance of key proteins for function or architecture in a given tissue was closely tempo-spatially regulated at the transcription level. A subset of differentially expressed proteins, specifically tissue specific highly expressed proteins, were identified, e.g., reproductive structure and flower development-related proteins in tassel and ear, lipid and fatty acid biosynthetic process-related proteins in immature embryo, and inorganic substance and oxidation reduction responsive proteins in root, potentially revealing the physiology, morphology and function of each tissue. Furthermore, we found many new proteins in specific tissues that were highly correlated with their mRNA levels, in addition to known key factors. These proteome data provide new perspective for understanding many aspects of maize developmental biology. Raw proteomics data are available via ProteomeXchange with identifier PXD008464.

Keywords: Differentially expressed proteins, Proteome, SWATH-MS, Transcriptome, Zea mays L.

2 / 34


Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Introduction Plant tissues and organs are differentiated from stem cells that are located in specialized structures called meristematic tissues (meristems), including the root apical meristem (RAM), shoot apical meristem (SAM), and vascular system 1. The postembryonic development of a plant depends on the maintenance of these meristems. The SAM has the potential to generate organs, such as leaves, shoots, and flowers, throughout the life cycle of a plant. The SAM in the maize plant forms initially during embryogenesis and persists until it transitions into an inflorescence meristem that gives rise to male florets on the tassel or to female florets on the ear 2. The RAM provides the meristematic cells for future root growth. The developments of diverse specialized organs are temporally and spatially regulated by the selective expression of specific genes encoded by the same genome in different cell types

3, 4

.

Comprehensive and systematic transcriptome profiling provides abundant data for defining organ- and tissue-specific genome expression patterns during the life cycle of maize

5-8

. Some key genes in developmental regulatory networks have been

established at the transcriptional level. As examples, many genes involved in nutrient uptake, hormone transport, and oxidative stress response are related to root architecture and function 9-11; a subset of genes are involved in the hormone response, accumulation of storage reserves, and architecture of the shoot morphology required for embryo development

12, 13

; and numerous factors including transcription factors

and plant hormones play key roles in the regulation of the inflorescence architecture and floral organogenesis by regulating the maintenance of the inflorescence meristem (IM) and the identity, determinants, initiation and outgrowth of the axillary meristems 14, 15

. Defining genes and their expression levels in every organ or tissue is extremely

important to understand maize development. However, mRNA levels in diverse tissues or organs including the primary root, root hair, leaf, kernel and xylem sap do not always directly correspond to the abundance of the corresponding protein

16-24

,

which depend on mRNA transcription and on posttranscriptional regulation, including the processing and turnover of the mRNA and the translation and turnover of the 3 / 34



protein

25-29

. Because proteins are direct contributors to or regulators of biological

processes, cellular components and traits, the combined transcriptome and proteome can synergistically strengthen our understanding of tissue architecture or organogenesis. In the present study, we employed the Sequential Window Acquisition of all Theoretical fragment ions (SWATH) followed by the Mass spectrometry assay (MS) (SWATH-MS) technique to quantity the proteomes in four maize tissues: immature ears (at the V7 stage), immature tassels (at the V8 stage), embryos at 20 days after pollination (20_DAP), and roots of 14-day-old seedlings. Using an integrated analysis of the proteome and transcriptome, we attempted to identify tissue specific highly expressed genes and proteins to provide a glimpse of the regulatory mechanisms underlying the tissue architecture and organogenesis. These data may provide new clues for studying many aspects of the developmental biology of maize. Materials and methods

Plant materials Seeds of maize inbred line B73 were germinated in an incubator. Sowed germinated seeds were grown in a greenhouse (22 ± 4 °C) under a 12/12-h (day/night) cycle. Four tissues, 14-day-old seedling roots, immature tassels at stage V7, immature ears at stage V8, and embryos from seeds 20 days after pollination (20_DAP), were separately collected with three biological replicates and then frozen and stored at −80 °C until required.

Protein extraction Samples were ground to powder in liquid nitrogen and then incubated in lysis buffer (7 M urea, 2 M thiourea, 4% SDS, 40 mM Tris-Cl, pH 8.5, 1 mM PMSF (phenylmethanesulfonyl fluoride), 2 mM EDTA (ethylene diamine tetraacetic acid)) for 5 min. The suspension was sonicated in an ice water bath for 15 min and then centrifuged at 13,000 × g for 20 min at 4 °C. The supernatant was mixed with 4 4 / 34


Page 4 of 38

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


volumes of precooled acetone for overnight protein precipitation. After centrifugation at 13,000 × g for 20 min at 4 °C, the precipitated pellets were collected and air-dried and then resuspended in 8 M urea/100 mM TEAB (tetraethylammonium bromide, pH 8.0).

Protein quantification and digestion Protein concentrations were determined by the Bradford method (Bio-Rad)

30

.

For each sample, 100 µg of protein was dissolved to 500 µL in dissolution buffer TEAB. After reduced with 10 mM DTT (DL-dithiothreitol) at 56 °C for 40 min with gentle shaking and then cooled to room temperature and alkylated with 55 mM iodoacetamide for 30 min in the darkness, trypsin (Promega, Madison, WI, USA) digestion was performed (enzyme/protein 1:50 w/w) at 37 °C overnight. After protein digestion, an equal volume of 0.1% FA (formic acid) was added for acidification. Peptides were loaded on a Strata–X C18 pillar (Phenomenex Inc., CA, USA) three times, washed with 0.1% FA + 3% ACN (acetonitrile) three times, and then eluted with 1 mL of 0.1% FA + 80% ACN. Eluted peptides were dried with a vacuum concentration meter. Peptide samples were stored at −80 °C until required or dissolved in 0.1% FA for LC−MS/MS analysis.

LC-ESI-MS/MS analysis LC-ESI-MS/MS (liquid chromatography–mass spectrometry) analysis was performed on a Triple TOF 5600 plus mass spectrometer (AB SCIEX, Framingham, MA, USA) in two phases: data-dependent acquisition (DDA) was followed by SWATH acquisition on the same sample, with the same gradient conditions and the same amounts of sample used. For DDA, the peptide samples were first loaded on a cHiPLC trap (3 µm, ChromXP C18CL, 120 Å, 0.5 mm × 200 µm) with buffer (0.1% (v/v) formic acid, 2% (v/v) acetonitrile) at 2 µL/min for 10 min. Subsequently, the samples were separated on a cHiPLC column (3 µm, ChromXP C18CL, 120 Å, 20 cm × 75 µm) using an elution gradient of 2–35% acetonitrile at 300 nL/min for 120 min. The trap and column were maintained at 30 °C for retention time stability. The eluent 5 / 34



Page 6 of 38

from the column was analyzed using a Triple TOF 5600 plus mass spectrometer (AB SCIEX, USA) in positive ion mode with a nano-ion spray voltage of 2,300 V. Data-dependent acquisition was performed first to obtain the SWATH-MS spectral ion library. Specifically, a survey scan of 250 ms (TOF-MS) in the range 360–1,460 m/z was performed to collect the MS1 spectra, and the top 30 precursor ions with charge states from +2 to +5 and intensity greater than 150 cps were selected for subsequent fragmentation (MS2) with an accumulation time of 100 ms per MS/MS experiment for a total cycle time of 3.25 s. Mass tolerance for precursor ion selection was set as 50 mDα, and MS/MS spectra were collected in the range 100-1,500 m/z. Selected ions and their isotopes were dynamically excluded from further MS/MS fragmentation for 15 s. Ions were fragmented in the collision cell using rolling collision energy based on their m/z and charge state. For SWATH, the same HPLC conditions were used as in the DDA run as described above. Data were acquired with a 250-ms MS1 scan followed by 90-ms MS2 scan with 32 × 25-a.m.u. isolation windows covering the mass range of 400–1,250 Dα (cycle time of 3.25 s). An overlap of 1 Dα between SWATHs was preselected. The collision energy for each window was set independently as defined by CE = 0.06 × m/z + 4, where m/z is the center of each window, with a spread of 15 eV performed linearly across the accumulation time.

SWATH data analysis ProteinPilot 4.5 (Sciex) was used to search all the DDA mass spectrometry data thoroughly

against

the

B73

RefGen_V3

5a

(https://ftp.maizegdb.org/MaizeGDB/FTP/B73_RefGen_v3/). Raw data for each experimental set were searched in a single batch to create a results file. The outputs of ProteinPilot represent a group of files that were used as the reference spectra library, which contained peptide sequences, charge states, modifications, retention times, confidence scores and the corresponding fragment ions with m/z and intensity. Subsequently, spectral library generation and SWATH data processing were performed using Skyline version 3.5

31

. Any missing values were not used for any

6 / 34


Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


calculation and statistical analysis. Before targeted data extraction, a spectra library document was automatically generated that complied with the following rules. i) Peptides that contained modifications and/or were shared between different protein entries/isoforms were excluded from the selection. ii) Peptides identified by ProteinPilot with a confidence less than 85% were excluded. Peptides covered by at least two spectra were used for peptide and protein quantification. iii) Up to 5 fragment ions ranked by their intensity were chosen. iv) Fragment ions within the SWATH isolation window were excluded from the selection. v) To control the false discovery rate (FDR), a random mass shift Q1 and Q3 m/z strategy was used to create a decoy spectra library. To extract the targeted peaks, a mass to charge tolerance was allowed up to 10 ppm for both the peptide precursor and fragment ion. According to the above instructions, the extracted ion chromatogram (XIC) of each ion was automatically extracted with a retention time width of 5 minutes, and the area under the XIC curve (AUC) was calculated for each individual ion. Fragment ion areas belonging to one peptide were summed to determine the peptide abundance, and a summed abundance of peptides for a given protein was conducted to obtain the protein abundance. To eliminate random errors and sample bias, we normalized all data among samples using a median normalization method

32

. To assess the data

confidence and control the false discovery rate, the mProphet algorithm 33 was applied to each extracted peak. For absolute quantification within a single biological sample, the iBAQ algorithm associated

with

34

was applied to the protein abundance. The SWATH-MS data

this

manuscript

were

uploaded

ProteomeXchange(http://www.proteomexchange.org)

to 35

,

the are

online available

database via

ProteomeXchange with identifier PXD008464.

Bioinformatics analysis Transcriptome data for the four studied tissues were downloaded from Doreen Ware’s

laboratory (http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3826/

samples/)

36

and qTeller (http://www.qteller.com/). Functional annotations of

differentially expressed (DE) proteins were performed using Gene Ontology 7 / 34



Page 8 of 38

(http://bioinfo.cau.edu.cn/agriGO/analysis.php). Threshold criteria (P-value ≤ 0.01 and FDR ≤ 0.05) were used to determine significant GO enrichment. Functional classification of proteins was conducted using the Clusters of Orthologous Groups of proteins (COG) (http://www.ncbi.nlm.nih.gov/COG/) database.

Polyclonal antibody preparation The bacterial strain Escherichia coli Rosetta (DE3), prokaryotic expression vector pET-B2M and polyclonal antibodies were provided and prepared by Wuhan GeneCreate Biological Engineering Co., Ltd. The recombinant plasmid pET-B2M-rec was

transformed

into

E.

coli

Rosetta

using

standard

procedures.

Isopropyl-b-D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5 mmol/L to induce protein expression at 30 °C for 3 h. The cells were harvested by centrifugation at 6,000 × g for 10 min, and the pellet was resuspended in 40 mL of NTA0 buffer with 20 mmol/L Tris-HCl and 0.5 mol/L NaCl (pH 8.0). The resuspended cells were lysed by sonication. The obtained pellets were suspended in NTA0 with 1 mmol/L DTT, washed three times by sonication with the same buffer, and then dissolved in 6 mol/L guanidine hydrochloride solution by sonication. To check the purity of the recombinant proteins, 10 µL of each protein was separated by 12% SDS-PAGE using a mini-PROTEAN electrophoresis instrument (Bio-Rad Laboratories, Philadelphia, PA) and stained with Coomassie Blue R-250. The purified protein was used to immunize female Japanese White rabbits 5 times at two-week intervals. For the first immunization, 500 µg of protein and an equal volume of Freund’s complete adjuvant (Sigma, Aldrich, Germany) were mixed and injected subcutaneously. For the subsequent immunizations, 250 µg of protein was emulsified in Freund’s incomplete adjuvant (Sigma) and injected. Before each immunization, blood was drawn by venous puncture from the ear of the rabbit and allowed to clot for 2 to 3 h at room temperature before serum preparation. Titration of specific polyclonal antibodies was then performed using ELISA.

Western blotting 8 / 34


Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Crude protein was extracted from 0.5 g of collected tissue sample by lysing in 0.5 mL of 1 × sodium dodecyl sulfate-polyacrylamide (SDS) buffer (pH 7.2). The protein sample was subjected to electrophoresis on a 12% (w/v) polyacrylamide gel and transferred to a Hybond P membrane (Amersham, Buckinghamshire, UK). The membranes were blocked overnight at 4 °C with 5% non-fat milk in PBS containing 0.05% Tween 20 (PBS-T). All antibody incubations were performed in PBS-T containing 3% non-fat milk. GRMZM2G354013 and GRMZM2G054916 proteins were detected with anti-GRMZM2G354013 and anti-GRMZM2G054916 serum at a dilution of 1:1,000 (v/v) for 1 h, respectively. Anti-actin (dilution of 1:1,000 (v/v)) was used as an endogenous control. The membrane was incubated for 1 h with an anti-rabbit immunoglobulin horseradish peroxidase-conjugated secondary antibody (Amersham) at a dilution of 1:2,000 (v/v) to visualize the signal.

Quantitative reverse transcription-PCR Approximately 1.0 g of the collected tissue was used for total RNA extraction with Ambion Pure Link Plant RNA Reagent (Life Technologies, Invitrogen, Carlsbad, CA, USA). Approximately 1.0 µg of RNA was reverse-transcribed with M-MLV reverse transcriptase (Life Technologies, Invitrogen) according to the manufacturer's instructions. Quantitative reverse transcription-PCR (qRT-PCR) was performed using an SYBR Green qRT-PCR kit (Bio-Rad, Hercules CA, USA) according to the manufacturer’s instructions with three biological replicates. The maize GAPDH gene (GRMZM2G046804) was used as the internal control. All reactions were conducted on a CFX96 Real-time system (Bio-Rad). The expression levels of 18 representative genes were measured in the four studied tissues using the gene-specific primers listed in Table S5. Results

Data analysis and features of the detected proteins After merging the data from the four tested tissues, a total of 646,658 spectra 9 / 34



were generated from the SWATH experiments (ProteomeXchange identifier PXD008464). We evaluated the detection power of SWATH-MS with different confidence thresholds and false discovery rates (FDRs). It was clear, to some extent, a relatively low threshold led to the identification of more unique peptides and proteins (Figure 1A, 1B). Under the criteria of 0.85 confidence and 0.05 FDR, the number of detected peptides and proteins increased slightly (Figure 1A, 1B) compared with that detected using stricter threshold criteria of 0.90 confidence and 0.02 FDR, but the number of DE proteins did not change much (Figure 1D). Therefore, a cutoff of 0.85 confidence and 0.05 FDR was selected to detect the peptides and proteins in this study, because this cutoff allowed the detection of more proteins to amplify the proteome data but did not obviously increase the number of DE proteins. A total of 117,184 unique spectra were matched with 10,606 unique peptides (ProteomeXchange identifier PXD008464) representing 4,551 proteins (Figure 1E). On average, one protein was repeatedly covered by 25.7 unique spectra representing 2.33 unique peptides. The measured protein abundance was highly reproducible among replicates, and the correlation coefficients varied from 0.84 to 0.90 (Figure S1). There were 3,916, 3,707, 3,702 and 2,871 proteins that were separately identified in ear, tassel, embryo and root, respectively, with 2,269 proteins shared across the four tissues (Figure 1E). Most of the proteins (70%) were covered by no less than 2 unique spectra consisting of mostly 10 to 25 amino acids (Figure S2A, S2B). Approximately 64% of proteins showed >5% sequence coverage, and 84% of proteins had a molecular mass >20 kDa (Figure S1C, D).

10 / 34


Page 10 of 38

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 1. A summary of peptides and proteins detected by SWATH experiments. (A and B) Number of peptides (A) and proteins (B) detected by SWATH experiments with different confidence and FDR cutoffs. Identified proteins were grouped according to Occam's razor. (C and D) Number of shared proteins (C) and number of differentially expressed (DE) proteins (D) in two compared tissues with different confidence and FDR cutoffs. (E) Venn diagram of protein comparisons among the four tissues using 0.85 confidence and 0.05 FDR.

The proteins were clustered into 24 COG (Clusters of Orthologous Groups) categories (Figure S3). Among the COG categories, "Intracellular trafficking, secretion, and vesicular transport" and "Translation, ribosomal structure and biogenesis" represented two of the largest groups, accounting for 20.68-22.88% of the proteins

detected,

followed

by

"Carbohydrate

transport

and

metabolism

(8.03-9.66%)," "Inorganic transport and metabolism (6.98-8.36%)," "Amino acid transport and metabolism (6.98-8.36%)," "Energy production and conversing 11 / 34



Page 12 of 38

(7.99-8.98%)" and "Cell motility (7.98-8.98%)," indicative of active biological processes, including metabolism, transport and protein biosynthesis, in these tissues. By contrast, proteins related to "replication, recombination and repair," "RNA processing

and

modification,"

"Chromatin

structure

and

dynamics"

and

"Posttranslational modification, protein turnover, chaperones" accounted for a low proportion (1.25-1.57%) of the detected proteins, and the low abundance of these subgroups was consistent with their regulatory functions in cells.

Combined analysis of the proteome and transcriptome Based on the transcriptome data from Doreen Ware’s laboratory representing a set of highly confident full-length transcripts

36

and the RNA-seq data downloaded from

qTeller representing a set of short transcripts, a combined analysis of the proteome and transcriptome was sequentially performed. First, the protein abundance in each tissue was compared with the corresponding mRNA level. Those proteins with low mRNA levels (RPKM: Reads Per Kilobase per Million mapped reads < 1) were then excluded, and a total of 4,314 of 4,551 proteins (94.8%) were maintained, including 3,554 in ear, 3,404 in tassel, 3,417 in embryo and 2,370 in root (Figure 2A). Among them, 2,045 proteins were shared across the four tissues, showing that a large portion of the detected proteins were present across the tissues in a ubiquitous manner and suggesting that these proteins are widely required in different tissues. Moreover, a subset of tissue specific highly expressed proteins, including 253 in embryo, 181 in root, 123 in ear and 43 in tassel, were also identified (Figure 2A). Second, although protein abundance was positively correlated with mRNA level in a given tissue, the Pearson's correlation coefficients were low, varying from 0.35 (root) to 0.43 (embryo) (Figure 2B), which indicated that the transcription level was not always indicative of the protein abundance. This weak correlation between protein and mRNA abundance has been widely revealed in plant species

20,23,24,29

and also in humans

37,38

. By

removing those tissue specific highly expressed proteins, a set of 3,714 proteins that were identified in at least two tissues were analyzed to identify the DE proteins among the tissues using a fold-change >2 and P-value

Integrated Analysis of Protein Abundance, Transcript Level, and

Recommend Documents