Subscriber access provided by UNIVERSITY OF ADELAIDE LIBRARIES
Omics Technologies Applied to Agriculture and Food
Domestic fowl breed variation in egg white protein expression: application of proteomics and transcriptomics Barbora Bílková, Zuzana #widerská, Lukáš Zita, Denis Laloë, Mathieu Charles, Vladimir Benes, Pavel Stopka, and Michal Vinkler J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.8b03099 • Publication Date (Web): 08 Oct 2018 Downloaded from http://pubs.acs.org on October 15, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 68
Journal of Agricultural and Food Chemistry
1
Domestic fowl breed variation in egg white protein
2
expression: application of proteomics and transcriptomics
3 4
Barbora Bílková (a), Zuzana Świderská (a,b), Lukáš Zita (c), Denis Laloë (d), Mathieu
5
Charles (d), Vladimír Beneš (e), Pavel Stopka (a), Michal Vinkler (a)
6 7
a) Charles University, Faculty of Science, Department of Zoology, Prague, Czech Republic,
8
EU
9
b) Charles University, Faculty of Science, Department of Cell biology, Prague, Czech
10
Republic, EU
11
c) Czech University of Life Sciences, Faculty of Agrobiology, Food and Natural Resources,
12
Department of Animal Husbandry, Prague, Czech Republic, EU
13
d) GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France, EU
14
e) European Molecular Biology Laboratory, Heidelberg, Germany, EU
15 16
List of abbreviations
17
BGA - between group analysis
18
CIA - co-inertia analysis
19
PLGEM - power low global error model
20
RPKM - Reads Per Kilobase Million
21 22 23 24
Keywords 1 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
25
antimicrobial peptides, avian oviduct transcriptome, bird albumen, chicken breed, egg white
26
proteome
Page 2 of 68
27 28
2 ACS Paragon Plus Environment
Page 3 of 68
Journal of Agricultural and Food Chemistry
29
Abstract
30 31
Avian egg white is essential for protecting and nourishing bird embryos during their
32
development. Being produced in the female magnum, variability in hen oviduct gene
33
expression may affect egg white composition in domestic chickens. Since traditional poultry
34
breeds may represent a source of variation, in the present study we describe the egg white
35
proteome (mass spectrometry) and corresponding magnum transcriptome (high-throughput
36
sequencing) for twenty hens from five domestic fowl breeds (large breeds: Araucana, Czech
37
golden pencilled, Minorca and small breeds: Booted bantam, Rosecomb bantam). In total, we
38
identified 189 egg white proteins and 16391 magnum-expressed genes. The majority of egg
39
white protein content comprised proteins with an antimicrobial function. Despite general
40
similarity, Between-class Principal Component Analysis revealed significant breed-specific
41
variability in protein abundances, differentiating especially small and large breeds. Though
42
we found strong association between magnum mRNA expression and egg white protein
43
abundance across genes, co-inertia analysis revealed no transcriptome/proteome co-structure
44
at the individual level. Our study is the first to show variation in protein abundances in egg
45
white across chicken breeds with potential effects on egg quality, biosafety and chick
46
development. The observed inter-individual variation probably results from post-
47
transcriptional regulation creating a discrepancy between proteomic and transcriptomic data.
48 49 50
3 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
51
Page 4 of 68
Introduction
52 53
The structure of the avian egg is a characteristic apomorphy of present-day birds. The egg is
54
gradually developed by passing through specialised sections of the female oviduct, where
55
distinct egg structures are progressively formed1. Given that the domestic chicken (Gallus
56
gallus f. domestica) is both a model research-species in avian biology and an economically
57
important agricultural species, most of our present knowledge about the egg formation
58
process comes from this species. In chickens, it has previously been reported that distinct
59
parts of the hen oviduct differ in the amount of proteins expressed, corresponding to their
60
respective function in egg formation2. By volume, the egg white makes up the largest part of
61
the chicken egg (ca. two thirds). Correspondingly, the magnum where the egg white is formed
62
is the longest part of the hen oviduct. The magnum is covered with a mucosal tissue formed
63
by tubular gland cells folded in a spiral pattern that enlarges the secreting surface1. During a
64
three-hour period when egg yolk passages through the magnum, the complex egg white
65
substance is secreted, which is involved in both protection and nourishment of the developing
66
embryo.
67 68
To date, only four studies have attempted to describe the complete composition of chicken
69
egg white using modern proteomic methods4-6. Each of these studies was conducted using
70
modern commercially produced eggs only, in which they found between 78 and 202 distinct
71
egg white proteins. Earlier results gained by two-dimensional electrophoresis suggest that egg
72
white protein abundances could differ importantly between both chicken breeds and
73
populations7. Since the egg white contains large amounts of proteins involved in antimicrobial
74
defence8, such variation could potentially affect egg quality, biosafety and chicken embryo
75
development. 4 ACS Paragon Plus Environment
Page 5 of 68
Journal of Agricultural and Food Chemistry
76 77
Furthermore, the whole magnum transcriptome has not yet been described for chickens.
78
While previous research has targeted experimentally induced changes in mRNA expression of
79
genes within the oviduct9, associations between such changes and egg proteome composition
80
in terms of both protein expression and their abundances remain largely unknown. As
81
demonstrated by Kim and Choi10, experimental corticosterone treatment affects both magnum
82
mRNA expression and egg white protein abundance of certain genes, though not necessarily
83
in the same direction. In various systems, the integrative studies of whole proteome and
84
transcriptome data have shown that despite their general consistency11-12 large discrepancies
85
between mRNA expression and respective protein expression can be found13,14, suggesting an
86
important role of post-transcriptional regulation of gene expression15.
87 88
In this study, we attempt to improve our understanding of breed-specific variability in egg-
89
white protein composition in terms of protein identification and quantification of relative
90
protein abundances and fill gaps in our knowledge of the hen-magnum transcriptome-egg
91
proteome relationship. We focus on traditional chicken breeds as these exhibit high
92
phenotypic variability in morphological, physiological and immunological traits16,17 and egg
93
production18. We selected five distinct traditional chicken breeds (Araucana, Booted bantam,
94
Czech golden pencilled, Rosecomb bantam and Minorca) that highly differ in their phenotypic
95
traits such as body size, weight and shape and also area of origin. We describe both the egg
96
proteome and hen magnum transcriptome in order to highlight differences between breeds in
97
both protein and transcript amounts and to test for associations between proteomic and
98
transcriptomic data between individuals.
5 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
99
Page 6 of 68
Methods
100 101
Animals
102
A total of 535 fertilised eggs from hens of five different chicken breeds (Araucana, Booted
103
bantam, Czech golden pencilled, Minorca and Rosecomb bantam) were kindly provided by
104
small non-commercial breeders located in the Czech Republic, EU. All of the chosen breeds
105
can be considered as layer breeds, with relatively high laying capacity from 90-190 eggs per
106
year. Booted bantam and Rosecomb bantam also belong to dwarf fancy breeds, herein
107
referred as the small breeds with adult body weight 0.4-0.8 kg; body weight of Araucana,
108
Czech golden pencilled and Minorca breeds range from 1.2 to 1.7 kg, therefore these are
109
referred to as the large breeds. The eggs were incubated to hatching in an OvaEasy 380
110
Advance EX automatic egg incubator (Brinsea, Weston-super-Mare, UK) at a temperature of
111
37.5°C and 50% humidity. The twenty hens used within this experiment were hatched
112
between 13th May 2015 and 22nd June 2015 (Table S1) and immediately marked with two
113
numbered wing-marks. All hens were housed under standardised conditions at the animal
114
facility of the Czech University of Life Sciences in 0.50 × 0.88 × 0.45 m cages with access to
115
food and water ad libitum (initially in breed-specific flocks of 5-10 animals, later
116
individually). After reaching reproductive maturity (estimated by the onset of egg production;
117
for all hens between 224 and 244 days of age), three eggs (excluding the first three) were
118
collected from each hen. The hens were then euthanised by rapid cervical dislocation.
119
Samples of the magnum tissue (the segment of the oviduct were egg white is formed) were
120
taken within 15 min. after euthanasia, placed into RNAlater reagent (Quiagen, Hilden,
121
Germany) and kept overnight at +4°C and afterwards stored at -80°C. This research was
122
approved by the Ethical Committee of the Faculty of Science, Charles University (reference
123
number 1373/2016-4). 6 ACS Paragon Plus Environment
Page 7 of 68
Journal of Agricultural and Food Chemistry
124 125
Egg white sample preparation
126 127
All eggs were collected from hens within 12 hours after oviposition and immediately stored at
128
5°C. To minimise changes in the abundance of some egg white proteins during storage19, we
129
processed all eggs within two days of oviposition. First, the eggshells were opened with a
130
sterilised knife and the egg whites separated from the yolks. The whites were then
131
homogenised with a glass mechanical hand homogeniser for 60 seconds and divided into six
132
250 µL aliquots that were kept at -20°C until analysis. Prior to analysis, all egg white samples
133
from the three eggs of the same individual were pooled by mixing 4 µL of each sample (12µL
134
of egg white in total). The pooled samples were then diluted in 12µL of PBS and precipitated
135
with 112 µL of 98% ethanol. All precipitated samples were centrifuged at 2800 g and 4°C for
136
15 min. After centrifugation, the supernatant was discarded and the samples dried for 30 min
137
at 37°C. These were then re-suspended in 48 µL of digestion buffer (1% SDS, 100 mM
138
triethylammonium bicarbonate (TEAB) – pH = 8.5) and were cleaved with trypsin (i.e., 1/50,
139
trypsin/protein) at 37 °C overnight.
140 141
nLC-MS2 analysis
142 143
Nano Reversed phase columns were used to elute peptide cations using the same method as
144
Cerna20. The eluting peptide cations were converted to gas-phase ions by electrospray
145
ionisation and analysed on a Thermo Orbitrap Fusion mass spectrometer (Q-OT-qIT,
146
Thermo). Survey scans of peptide precursors from 400 to 1600 m/z were performed at 120K
147
resolution (at 200m/z) with a 5 × 105 ion count target. Tandem MS/MS was performed by
7 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 8 of 68
148
isolation at 1.5 Th with the quadrupole, high-energy collision dissociation (HCD)
149
fragmentation with normalized collision energy of 30, and rapid scan MS analysis in the ion
150
trap. The MS/MS ion count target was set to 104 and the max injection time was 35ms. Only
151
those precursors with a charge state of 2–6 were sampled for MS/MS. The dynamic exclusion
152
duration was set to 45s with a 10 ppm tolerance around the selected precursor and its isotopes.
153
Monoisotopic precursor selection was turned on and the instrument was run at top speed with
154
2s cycles.
155 156
Proteome data analysis
157 158
All data were collected and quantified using MaxQuant software version 1.5.3.821. False
159
discovery rate (FDR) was set to 1% for identification of all peptides and proteins. We set a
160
minimum peptide length of seven amino acids. The Andromeda search engine was used for
161
the MS/MS spectra search against the Uniprot database, with all duplicates removed. Enzyme
162
specificity was set as C-terminal to Arg and Lys, also allowing cleavage at proline bonds and
163
a maximum of two missed cleavages. Dithiomethylation of cysteine was selected as a fixed
164
modification and N-terminal protein acetylation and methionine oxidation as variable
165
modifications. Quantifications were performed with the label-free algorithms21.
166
We were able to detect 266 proteins, of which 67 were assigned as contaminants. To
167
standardise the total abundance of non-contaminating egg white proteins across samples, we
168
normalised the amount of each protein according to the formula Cnet = Craw/(1-Pcont), where
169
Cnet indicates the net amount of protein counts after normalisation, Craw the raw abundance of
170
protein counts and Pcont the proportion of contamination in a given sample. In addition to
171
chicken proteins, we also detected 10 proteins of avian specific viruses. We excluded all
172
proteins occurring in three or less samples from further analysis, leaving a total dataset of 115 8 ACS Paragon Plus Environment
Page 9 of 68
Journal of Agricultural and Food Chemistry
173
egg
white
proteins.
174
For protein classification, we used the online PANTHER library22. To find overrepresented
175
Gene Ontologies (GOs) among the egg white proteins, we launched the PANTHER
176
Overrepresentation Test with Bonferroni correction for multiple testing against the G. gallus
177
(all genes in the database) reference list and annotation data sets, with the GO molecular
178
function and GO biological process complete (GO Ontology database release date: 14. 08.
179
2017).
180 181
Transcriptomic data analysis
182 183
Total RNA was isolated from magnum samples using the High Pure RNA Tissue Kit (Roche,
184
Basel, Switzerland), according to the manufacturer’s instructions. The total amount of
185
extracted RNA was quantified using an Agilent 2100 Bioanalyser with the Agilent RNA 6000
186
Nano Kit (Agilent Technologies, California, USA). Library preparation and stranded paired-
187
end mRNA sequencing was performed at The European Molecular Biology Laboratory
188
(EMBL), Heidelberg. Barcoded stranded mRNA-seq libraries were prepared using the
189
Illumina TruSeq RNA Sample Preparation v2 Kit (Illumina, San Diego, CA, USA),
190
implemented on the liquid handling robot Beckman FXP2. The libraries obtained were pooled
191
in equimolar amounts, with a 1.8 pM solution of this pool loaded on the NextSeq 500
192
Illumina sequencer and sequenced bi-directionally, each read being 85 bases long, thereby
193
generating ~25 million sequence pairs for each library. The sequencing results were submitted
194
to the NCBI Sequence Read Archive (SRA Acc. No. SRP126816).
195
Read sequences were trimmed of sequencing adaptors using Trim Galore! Software
196
(Babraham Bioinformatics, Braham Institute, Cambridge, GB) and low-quality score bases
197
(Phred quality score < 30) were removed from both (3’ and 5’) ends using SICKLE23 9 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 10 of 68
198
software. Reads with a resulting trimmed sequence of less than 30 bp were discarded. The
199
remaining trimmed reads of high quality sequences were then aligned to the G. gallus
200
reference genome assembly Gallus_gallus-5.0 (GCA_000002315.3) using STAR software24
201
for ultrafast transcript assembly. The percentage of uniquely mapped reads ranged from 91.82
202
% to 94.34% (Table S2). Sequence reads were assigned and quantified at the gene level using
203
the featureCounts program from the Subread package25. In order to compare mean expression
204
between genes, the read counts obtained from the featureCounts program were converted to
205
Reads Per Kilobase Million (RPKM).
206 207
Statistical analysis
208 209
Prior to analysis, we transformed both the proteomic and transcriptomic data using a natural
210
logarithmic transformation (y ~log(x+1)). All data were then centred and scaled. First, we
211
performed a PCA on the proteome and transcriptome datasets and analysed the effects of
212
breed on proteome and transcriptome inter-individual variation. Second, we performed a
213
Between Groups Component Analysis (BGA) on the proteomic dataset, with breed groups
214
(breed; small breeds [Booted bantam and Rosecomb bantam]; large breeds [Araucana, Czech
215
golden pencilled, Minorca]) used as grouping factors. BGA focuses on between-group
216
variability by performing a PCA on group means. The importance of the difference between
217
groups is assessed by the ratio of between-group inertia over total inertia26. The statistical
218
significance of differences between groups was checked using the Monte-Carlo permutation
219
test.
220 221
We used the Power Low Global Error Model (PLGEM)27 to identify egg white proteins
222
differing in abundance between small and large breeds. The signal-to-noise ratio was 10 ACS Paragon Plus Environment
Page 11 of 68
Journal of Agricultural and Food Chemistry
223
calculated as it explicitly takes unequal variances into account and penalises those proteins
224
that have higher variance in each group more than those proteins that have a high variance in
225
one group and a low variance in another27. As PLGEM can only be fitted on a set of replicates
226
under the same experimental conditions, we first applied the test to the Czech golden
227
pencilled data. Correlation between mean values and standard deviations was high (r2= 0.96,
228
Pearson=0.883); hence, we continued with the resampled signal-to-noise ratio and calculated
229
differences with the corresponding p-values between the small and large breed groups.
230
For the purpose of consistency, we selected 97 genes-protein pairs that were common to both
231
the datasets (Table S3) when comparing the proteomic and transcriptomic data. We calculated
232
Spearman’s correlation between the mean mRNA expression values and mean protein
233
abundance. To further describe relationships between the proteome and transcriptome
234
datasets, we used Co-Inertia multivariate Analysis (CIA)28,29. CIA is useful for analysing the
235
relationships between two tables (here representing proteomic and transcriptomic data) having
236
the same samples in rows. This method finds the maximum shared structure between two
237
datasets representing the same individual. CIA finds ordinations (dimension reduction
238
diagrams) from the datasets that are most similar. This is done by finding successive axes
239
from the two datasets with maximum covariance. Co-structure between proteome and
240
transcriptome datasets is measured by the RV-coefficient, ranging from 0 to 1, where 1
241
indicates highest and 0 the lowest degree of co-structure. The statistical significance of co-
242
inertia was evaluated with a Monte Carlo permutation test. CIA can be applied to datasets
243
where the number of variables far exceeds the number of samples (which is the case when
244
applied to x-omics data; see29,30). All statistical analyses were undertaken in R software
245
version 3.4.031with the ade4 package32.
11 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
246
Page 12 of 68
Results and discussion
247 248
General description of the hen egg white proteome
249 250
At the proteomic level, we detected 115 egg white proteins at 0.01 FDR, ranging from 109 to
251
112 in each breed-specific proteome (Table S4). The successful identification of these
252
proteins was due to the relatively high number of peptides per identification (9.45 ± 13.19,
253
mean ± SD), high sequence coverage (27.02 ± 19.86 %) and high unique sequence coverage
254
(26.08 ± 19.97%). Furthermore, we obtained a high and significant Spearman’s rank
255
correlation (r = 0.720, P