Subscriber access provided by LUNDS UNIV
Biotechnology and Biological Transformations
Association genetics identifies single nucleotide polymorphisms related to kernel oil content and quality in Camellia oleifera Ping Lin, Hengfu Yin, Chao Yan, Xiaohua Yao, and Kailiang Wang J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.8b03399 • Publication Date (Web): 13 Feb 2019 Downloaded from http://pubs.acs.org on February 13, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 57
Journal of Agricultural and Food Chemistry
1 2 3
Association genetics identifies single nucleotide polymorphisms related to kernel oil content and quality in Camellia oleifera
4
Ping Lin1,2*, Hengfu Yin1,2, Chao Yan1,2,3, Xiaohua Yao1,2 and Kailiang Wang1,2*
5
1. State Key Laboratory of Tree Genetics and Breeding, Research Institute of Subtropical Forestry,
6 7 8 9
Chinese Academy of Forestry, Hangzhou 311400, China 2. Key Laboratory of Forest Genetics and Breeding, Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou 311400, China 3. Experimental Center for Subtropical Forestry, Chinese Academy of Forestry, Fenyi, 336600, China
10
* Corresponding Authors:
11
Ping Lin,
[email protected]; + (86) 571-63320229
12
Kailiang Wang,
[email protected]; + (86) 571-63379095
13
1
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
14
Abstract
15
Camellia oleifera, as an important non-wood tree species for seed oil in China, has
16
received enormous attentions owing to its high unsaturated fatty acid contents
17
benefited to human health. It is necessary to examine allelic diversity of key genes
18
which are associated with oil production in C. oleifera cultivars with a large variation
19
of fatty acid compositions. In this study, we performed the association analysis
20
between four key genes (two CoSAD and two Cofad2) coding fatty acid desaturases
21
and traits including oil content and fatty acid composition. We identified two single
22
nucleotide insertion-deletion (InDel) and 362 single-nucleotide polymorphisms (SNPs)
23
within the four candidate genes by sequencing an association population (216
24
accessions). Single-marker (or haplotype) and traits association tests were conducted
25
by linkage disequilibrium (LD) approaches to detect significant marker-trait
26
associations. Validation population (279 hybrid individuals from six full-sibs families)
27
studies were performed to validate the function of allelic variations significantly
28
associated. In all, 90 single marker-trait and one haplotype-trait associations were
29
significant in association population, and these loci explained 1.87%~17.93%
30
proportion of the corresponding phenotypic variance. Further, six SNP marker–trait 2
ACS Paragon Plus Environment
Page 2 of 57
Page 3 of 57
Journal of Agricultural and Food Chemistry
31
associations (Q < 0.10) from Cofad2-A, CoSAD1 and CoSAD2 were successfully
32
validated in the validation population. The SNP markers identified in this study can
33
potentially be applied for future marker-assisted selection to improve oil content and
34
quality in C. oleifera.
35
Keywords: candidate-gene-based association mapping; Camellia oleifera; single
36
nucleotide polymorphisms (SNPs); oil content (OC) and fatty acid composition;
37
stearoyl-ACP desaturase gene (CoSAD) and Δ12 (ω6)-desaturase gene (Cofad2)
38
3
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 4 of 57
39
Introduction
40
China is the largest country of edible oil consumption in the world. With the economy
41
developing and people’s living standards improving, the per capita possession of
42
edible oil and imports of raw materials for edible oil have kept increasing every year
43
in China 1. Camellia oleifera is planted widely as an important edible oil-bearing tree
44
species in south China. It was reported that the dry seed production was around ~2.2
45
million tons over a cultivated area of four million ha in 2015 2. At present, Camellia
46
oil has become the fourth important plant edible oil produced in China. Camellia oil,
47
nearly 50% of dry kernel weight, is the major product of C. oleifera seeds and is
48
considered to be one kind of the highest quality oil
49
approximately 90% of all fatty acids (FAs) are the unsaturated FAs (UFA), which
50
include oleic acid (C18:1, monounsaturated fatty acid, MUFA) and linoleic acid
51
(C18:2, polyunsaturated fatty acid, PUFA), mainly. Because of its high UFA contents,
52
especially high MUFA content is good for human health, C. oleifera has received
53
much more attention, and the study of factors that influence UFA biosynthesis is very
54
important in C. oleifera breeding. The identification of genes and the allelic variation
4
ACS Paragon Plus Environment
2-4.
In the Camellia oil,
Page 5 of 57
Journal of Agricultural and Food Chemistry
55
associated with oil and FAs biosynthesis in the C. oleifera would provide useful
56
information for breeding programs to improve oil content and quality.
57
The oil and FA biosynthesis has been extensively studied in C. oleifera 2, 5-7. The
58
seed oil of C. oleifera was dominant with C18:1, which was similar to hickory and
59
olive
60
stearoyl-ACP desaturase gene (CoSAD) with a low level of Δ12 (ω6)-desaturase gene
61
(Cofad2) was associated with the C18:1 synthesis in C. oleifera 2. C18:1 biosynthesis
62
is catalyzed by stearoyl-ACP desaturases (SAD) which are encoded by SAD genes, a
63
multigene family with several members
64
impacts the ratio of saturated and unsaturated FAs, and is a major determinant of FA
65
composition 13-21. The Δ12 (ω6)-desaturase (FAD2) desaturates the C18:1 to synthesize
66
the C18:2 in endoplasmic reticulum, which is the key rate-determining procedure for
67
the PUFA synthesis in some oil crops
68
which belong to a multigene family 24. There are at least five fad2 genes in olive (Olea
69
europaea var. sylvestris) 9, three in Camelina sativa 25, two in sesame (Sesamum indicum)
70
26
71
8, 9.
Previous studies had indicated that the expression of a high level of
10-15.
22, 23.
The activity of SAD significantly
The FAD2s are encoded by fad2 genes,
and at least seven in soybean (Glycine max) 27-33, respectively. Linkage disequilibrium (LD)-based association mapping provides a powerful 5
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
34-38.
Page 6 of 57
72
strategy to genetically dissect complex quantitative traits
73
which the LD rapidly decays, association mapping based on candidate gene is
74
appropriate to detect the association between sequence variations and specific
75
interested traits 39. In generally, LD decays rapidly within genes in outcrossing species
76
and can’t extend over the entire gene region, such as tea (Camellia sinensis)
77
Chinese white poplar (Populus tomentosa)
78
trichocarpa)
79
perenne)
80
candidate gene association mapping approach may be suitable to establish
81
associations between polymorphic loci in targeted genes and complex traits.
42.
40,
sunflower (Helianthus annuus)
35,
41
For the species in
34,
black cottonwood (Populus and perennial ryegrass (Lolium
C. oleifera is an oil-bearing tree species with self-incompatibility, and
82
To further explore the functions of CoSAD and Cofad2 genes in Camellia oil FA
83
composition formation, we had isolated two CoSAD and two Cofad2 genes from C.
84
oleifera based on previously publications
85
And the expression levels of these four genes were significantly associated with the
86
oil accumulation and FA composition during the seed developing of C. oleifera 2.
87
Nevertheless, little is known about whether the allelic diversities of these key genes
88
contribute to FA composition variation in the C. oleifera germplasms. In this study, in
2, 6, 43,
respectively (Table 1 and File S1).
6
ACS Paragon Plus Environment
Page 7 of 57
Journal of Agricultural and Food Chemistry
89
order to explore allelic effects, two InDels and 362 SNPs developed from CoSAD and
90
Cofad2 genes were used for single-marker and haplotype-based association mapping
91
through the sequencing of a natural population of C. oleifera. Subsequently, we
92
validated the significant associations in the C. oleifera by a hybrid population to
93
confirm the allelic loci associated with phenotypic traits. This work provides insights
94
into the regulation of seed oil biosynthesis in C. oleifera and benefits the genetic
95
improvement of Camellia oil quality and yield.
96
1. Material and Methods
97
1.1 Plant materials
98
Association population In 2004, the Research Institute of Subtropical Forestry,
99
Chinese Academy of Forestry (Hangzhou, China) maintained a collection of 494
100
accessions from the mostly natural distribution region of C. oleifera, covering nine
101
provinces in China subtropical areas. These accessions were propagated by grafting
102
and established a clonal plantation using a randomized complete block design with
103
three replications in Dongfanghong Forest Farm of Zhejiang Province, Jinhua city,
104
Zhejiang province, China. In this study, a set of 216 unrelated accessions of C.
105
oleifera from the collection were sampled for association analysis. 7
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
106
Validation population In this study, 279 hybrid individuals from six full-sibs
107
families were selected for validation analysis. The progenies were grown in 2009 in
108
Dongfanghong Forest Farm of Zhejiang Province, Jinhua city, Zhejiang province,
109
China, using a randomized complete block design with three replications.
110
Phenotypic evaluation OC and oil quality traits, which depended on FA composition,
111
were the important breeding aims of C. oleifera. In this study, OC, palmitic acid
112
(C16:0), palmitoleic acid (C16:1), stearic acid (C18:0), C18:1, C18:2, linolenic acid
113
(C18:3) and cis-11-eicosenoic acid (C20:1) contents of Camellia oil were evaluated.
114
The 216 accessions of the association population were scored on eight quantitative
115
traits, with at least three ramets per genotype for three years. The mature seeds were
116
collected, and the Soxtec extraction method was performed to measure OC as
117
described 44. The total lipid was extracted from kernel by petroleum ether, and seven
118
kinds of FA components were quantified using gas chromatography according to the
119
previous study 45.
120
The same eight quantitative traits were measured with three replicates in the 279
121
hybrid individuals in 2014, using the same methods for the association population.
122
Data Processing System (DPS 14.50, http://www.chinadps.net/dps_eng/) software 8
ACS Paragon Plus Environment
Page 8 of 57
Page 9 of 57
Journal of Agricultural and Food Chemistry
123
was used to perform the analysis of normal fitting and phenotypic correlations for
124
eight traits 46. The box-plots were made by R (https://www.r-project.org/).
125
1.2 DNA extraction, amplification, and sequencing
126
The genomic DNA (gDNA) was isolated from young leaves (one ramet per clone for
127
association population) using the TaKaRa MiniBEST Plant Genomic DNA Extraction
128
Kit (TaKaRa, Dalian, China) according to the user manual. Based on the cDNA
129
sequences of two CoSAD and two Cofad2 2, the specific primer sets (Table 1) were
130
designed and used for the amplification of gene regions with gDNA as template. PCR
131
was performed in a final reaction volume of 50 μL containing 60 ng of gDNA, 2 ×
132
LAmpTM Master Mix (Vazyme, Nanjing, China) 25 μL, 0.4 mM of forward primer,
133
and 0.4 mM of reverse primer. The PCR procedure was: 94 ℃ for five min, followed
134
by 35 cycles of 94 ℃ denaturation for 30 s, 55 ℃ annealing for 30 s, and 72 ℃
135
extension for two min, with a final extension at 72 ℃ for seven min. The amplified
136
DNA fragments were purified in 1.2% agarose gels and sequenced with an ABI
137
3730XL DNA Analyzer by sequencing primers (Table 2). In total, 7610 bp of gDNA
138
sequences from four unique genes Cofad2-A (GenBank ID: JQ739518.1), Cofad2-B
139
(GenBank ID: KJ995981.1), CoSAD1 (GenBank ID: MH836317) and CoSAD2 9
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
140
(GenBank ID: MH836318), with an average of 1903 bp per gene, were obtained
141
without considering InDels, and the gene length ranged from 1160 bp (Cofad2-B) to
142
3548 bp (CoSAD2; Table 1 and File S1).
143
1.3 SNP discovery, genotyping and linkage disequilibrium (LD)
144
SNP discovery and genotyping were performed using NovoSNP version 3.0.1
145
(http://www.molgen.vib-ua.be/bioinfo/novosnp/) from sequence trace files, and the
146
SNPs with the score > 12 were selected as true variations 47. All un-filtered SNPs data
147
of association and validation populations have been submitted to the European
148
Variation Archive (EVA), and the Project accession number is PRJEB31224
149
(https://www.ebi.ac.uk/eva). Minor alleles with frequency < 5% were filtered before
150
the further analysis. After filtering, the squared correlation of allele frequencies (r2)
151
was measured to estimate the LD between pairs of SNPs, using software package
152
TASSEL version 3.0 48. The decay of LD with physical distance (base pairs) between
153
SNP sites within the candidate genes was evaluated by nonlinear regression analysis
154
of r2 values 34, 49.
155
1.4 Population structure and relative kinship
156
In our previous study, 500 SRAP primer pairs were developed and 46 primer pairs 10
ACS Paragon Plus Environment
Page 10 of 57
Page 11 of 57
Journal of Agricultural and Food Chemistry
157
were used to analyze C. oleifera clones (same to the ones of the association
158
population) genetic diversity
159
kinship matrix was calculated using TASSEL version 3.0
160
The population structure of the association population was evaluated using
161
STRUCTURE
162
(http://web.stanford.edu/group/pritchardlab/structure.html)
163
Model-admixture model
164
run five times across a range of K values from K=2 to K=9, with 10000 Burn-in
165
period and 10000 repeats. The highest likelihood with lnP(D) and α value settling
166
down to be relatively constant were used for estimating the most likely number of
167
subpopulations.
168
1.5 Single marker-trait association analysis
169
In the association population, all association tests between SNP markers and traits
170
were performed, using mixed linear model (MLM) method with 5×104 permutations
171
in the software TASSEL version 3.0 48. The MLM can be described as y = SNP + Q +
172
kinship + e. In the MLM method, the matrix from STUCTURE software (Q) were
173
used to define the population structure, and the kinship matrix were used to evaluate
50.
Based on the SRAP analysis data, The pairwise
version
51
48
and shown in Table S1.
2.3.4
software by
Ancestry
based on the same SRAP analysis data. The program was
11
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 12 of 57
174
the coancestry coefficients, e is the residual. P-values for all association were
175
corrected using Benjamini and Hochberg’s method to control FDR
176
dominance (d) to additive (a) effect were calculated for each significant marker to
177
quantify the mode of gene action according to the previous study
178
tests of significant SNP loci identified in the association population were performed in
179
the validation population using a χ2 test by Data Processing System (DPS 14.50)
180
software 46. And P-values were corrected using Benjamini and Hochberg’s method to
181
control FDR 52.
182
1.6 Haplotype-trait association analysis
183
The haplotype (a block of linked ordered markers) frequencies of locus genotypes
184
were estimated by Haploview version 4.2 54 and the tests of the haplotype association
185
with the traits were performed using the software TASSEL version 3.0
186
correction for multiple testing was performed using Benjamini and Hochberg’s
187
method to control FDR 52.
188
1.7 Quantitative RT-PCR (qRT-PCR) testing
189
qRT-PCR was performed using single-strand cDNA samples, which were synthesized
190
from the total RNA in developing seeds of C. oleifera accessions (ten accessions per 12
ACS Paragon Plus Environment
52.
The ratios of
40, 53.
Inheritance
48.
A
Page 13 of 57
Journal of Agricultural and Food Chemistry
191
particular genotypes ) using PrimeScriptTM RT Master Mix (TaKaRa, Dalian, China).
192
The qRT-PCR program and the analysis of relative expression level of genes were
193
performed as described by Lin et al 2. The primer pairs (Table S2) were individually
194
designed for the CoSAD genes according to the SNP-based association results using
195
Primer 5.0 software. The glyceraldehyde-3-phosphate dehydrogenase gene (GAPHD)
196
was used as the reference gene 55.
197
2. Results
198
2.1 Phenotypic analyses of traits distribution and correlations
199
We extracted and measured the total OC and seven main kinds of FAs composition,
200
which accounted for over 99% of all FAs in Camellia oil. In the association
201
population, OC ranged from 128.17 mg/g to 510.26mg/g with mean 362.42mg/g
202
(Table S3 and S4). The relative contents of C16:0 and C18:1 ranged from 6.70% to
203
12.30% (mean 8.66%) and from 69.10% to 84.70% (mean 78.94%), respectively
204
(Table S3 and S4). Descriptive statistics of the trait distributions were presented in
205
Table S4. In the validation population, OC ranged from 121.61 mg/g to 447.48mg/g
206
with mean 317.93mg/g (Table S5 and S6). The relative contents of C16:0 and C18:1
207
ranged from 6.70% to 11.00% (mean 8.65%) and from 68.70% to 85.80% (mean 13
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
208
78.19%), respectively (Table S5 and S6). The descriptive trait-distribution statistics of
209
the validation population were showed in Table S6. As expected, the distributions for
210
most traits measured in both populations followed an approximately normal
211
distribution (Fig. 1 and Fig. S1).
212
The OC and FA relative contents showed significant correlations in the
213
association population (Table 3 above diagonal). The OC was positively correlated
214
with C18:0 (R = 0.4366, P < 0.01) and negatively correlated with C18:2 (R = -0.3406,
215
P < 0.01) and C18:3 (R = -0.6806, P < 0.01), which resembled the scenario of
216
validation population (Table 3 below diagonal). In addition, OC was significant
217
positively correlated with C18:1 and negatively correlated with C16:0 in association
218
population (P < 0.01). According to the correlation, the FA content traits were divided
219
into two groups. The first group included C16:0, C18:2 and C18:3, and the other
220
group included C18:0 and C18:1. In the both groups, the significant positive pairwise
221
correlations were observed respectively, while between groups, each one in group1
222
showed significant negative pairwise correlations with one in group2 in association
223
population (P < 0.01) (Table 3 above diagonal). Similar FA content correlation results
224
were observed in the validation population (Table 3 below diagonal). It's worth noting 14
ACS Paragon Plus Environment
Page 14 of 57
Page 15 of 57
Journal of Agricultural and Food Chemistry
225
that C18:1 and C18:2 contents showed significant negative correlations and the
226
correlation coefficient (R) were nearly 1, which were 0.9618 and 0.9763 in
227
association and validation populations, respectively (Table 3).
228
2.2 SNP detection and LD test
229
After removing the SNPs with Minimum Allele Frequency (MAF) less than 5%, total
230
364 polymorphic sites including two single nucleotide InDels and 362 SNPs (Table
231
S7) were detected in the four candidate genes in which two genes for CoSAD and two
232
ones for Cofad2, with an average density of one SNP every 21bp (π ranged from
233
0.0005 to 0.00739 and θw from 0.0013 to 0.0116). For these polymorphic sites, 67.03%
234
were derived from the intron regions and 31.87% from the exon regions, and four
235
SNP sites were detected in the 3’ UTR. The polymorphic sites had an average density
236
of one SNP every 38.73bp in the exon regions and 9.90bp in the intron regions,
237
respectively. In these 362 SNP sites, 53.57% were nucleotide transitions (including 70
238
GA and 125 CT) and 45.88% nucleotide transversions (including 35 GT,51
239
AT,48 AC and 33 GC).
240
Pairwise LD between SNP markers were estimated by r2 within the candidate
241
genes (Fig. 2). We showed that the SNPs in LD were in the same gene, and limited 15
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
242
LD of the SNPs within the candidate genes did not extend over the entire gene region
243
(Fig. 2A, B, C, D). We further calculated the average decay distance associated with
244
LD within the candidate genes, and the results showed that the LD distance decayed
245
quickly with the DNA length increasing. The most of SNPs were in linkage
246
equilibrium when the distance of SNPs were over 1.8kb (r2 < 0.3; P < 0.05; Fig. 2E;
247
Table S8). Several loci with the distance over 1.8kb were in significant LD in
248
CoSAD2, such as marker pairs SNP6U.2655 - SNP6U.331, SNP6U.3443 -
249
SNP6U.891, SNP6U.2744 - SNP6U.331, SNP6U.3443 - SNP6U.958, SNP6U.3410 -
250
SNP6U.958, and etc (r2 > 0.4; P < 0.001; Fig. 2E; Table S8). Therefore, the LD
251
between SNPs decayed within the candidate genes in C. oleifera genome, and the
252
SNP markers-based LD association analysis was feasible within the genes.
253
2.3 Population structure
254
The presence of the population deviations from Hardy-Weinberg proportions, can
255
lead to spurious associations and confound association studies. We showed that K=4
256
had the highest likelihood which with the lnP(D) and α value settling down to be
257
relatively constant using software STRUCTURE version 2.3.4. The association
258
population could be subdivided to four subpopulations, which were under the 16
ACS Paragon Plus Environment
Page 16 of 57
Page 17 of 57
Journal of Agricultural and Food Chemistry
259
Hardy-Weinberg equilibrium (HWE). The proportions of membership of the samples
260
in the four subpopulations were 13.5%, 9.0%, 70.1% and 7.4%, respectively (Fig. 3).
261
The expected heterozygosity between individuals of subpopulation 2 was the highest
262
(0.1002), that of subpopulation 3 was the lowest (0.0506), and that of subpopulation 1
263
and 4 were parallel (0.0793 and 0.0714, respectively). In agreement with tests for
264
HWE, all Fst values were over 0.50 (the mean Fst = 0.6093) and suggested there were
265
extensive genetic divergence among the four subpopulations. The details of samples
266
estimated membership in every subpopulation were in Table S9.
267
2.4 Summary of single-SNP and haplotype based associations
268
Single SNP maker-trait associations In total, 2912 (364 SNPs (or InDel) × 8 traits)
269
single-marker association tests were performed with 5×104 permutations using MLM
270
model (Table S10). In all, 132 associations were significant at threshold of P < 0.05
271
(Table S10). After multiple test correction, there were 90 associations were significant
272
with a threshold of Q < 0.05 (Table 4). These loci explained the considerable
273
proportion of the phenotypic variance from 1.87% to 17.93% (Table 4). Of these, 15
274
SNP markers were associated with C16:0 content, 13 ones were associated with
275
C16:1 content. C18:1 and C18:2 had seven significant associations each; ten C18:0 17
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
276
associations, 14 C18:3 associations, and twelve associations respectively with C20:1
277
and OC were observed in the association population (Q < 0.05; Table 4). The 90
278
associations represented 63 SNP loci from four genes. Multiple SNP markers were
279
significantly associated with more than one trait, which suggesting these loci had a
280
pleiotropic effect for certain traits (Table 4). The dominant and additive effects of the
281
twelve of 90 marker-trait pairs were calculated. In these twelve marker-trait pairs,
282
there were eight pairs with the mode of gene action which consistent with under- or
283
overdominance (|d/a| > 1.25). The remaining four markers had the mode of gene
284
action that was partially to fully dominant (0.50 < |d/a| < 1.25). And we couldn’t
285
detect the markers with the codominant (additive) gene action mode (|d/a| < 0.50)
286
(Table 5).
287
Haplotype-trait associations In all, 22 haplotypes were identified from four
288
candidate genes by the software Haploview with a significance threshold of P < 0.01
289
(Table S11). Haplotype-based association tests were performed, and no significant
290
association were found between haplotypes and phenotypic traits with a threshold of
291
P < 0.05 and Q < 0.05 (Table 6). With a significance threshold of P < 0.10, six
292
haplotypes from one candidate gene (CoSAD2) were significantly associated with the 18
ACS Paragon Plus Environment
Page 18 of 57
Page 19 of 57
Journal of Agricultural and Food Chemistry
293
five phenotypic traits excluding C16 :0, C18 :1 and C18 :2 contents (Table 6).
294
Multiple test correction analysis reduced this number to only one (Q < 0.10; Table 6).
295
The association of SNP6U.1577 (in this haplotype block) and the C18 :0 content (Q
0.4; P < 0.001).
826
Fig. 3 Population structure of 216 individuals of association population analyzed
827
by the STRUCTURE program. Numbers on the y-axis indicate the membership
828
coefficient. Each bar on the x-axis represents an individual; colored segments within
829
one bar reflect the proportional contributions of each subpopulation to this individual.
830
The color of the bar indicates the four groups identified by the STRUCTURE
831
software (P1 = red, P2 = green, P3 = blue, and P4 = yellow).
832
Fig. 4 Genotypic effect of significant SNPs for traits in both association and
833
validation populations. Genotypic effects of significant (a) SNP 2.A.705 and (b) SNP
834
6U.1577 for C18:0 content, (c) SNP 6U.1031 for C18:1 content, (d) SNP 6U.2366 for
835
C20:1 content, (e) SNP 40U.548 and (f) SNP 6U.505 for OC in association (left) and
836
validation (right) populations, respectively.
837
Fig.5 Relative expression levels for candidate genes and the associated
838
phenotype levels in different groups representing different significant genotypes (the
839
error bars represent +SD). The relative expression levels of CoSAD2 and the 50
ACS Paragon Plus Environment
Page 50 of 57
Page 51 of 57
Journal of Agricultural and Food Chemistry
840
associated phenotype levels in different groups based on SNP 6U.1577 genotypes (a),
841
SNP 6U.1031 genotypes (b) and SNP 6U.505 genotypes (c), respectively. The relative
842
expression levels of CoSAD1 and OC in different groups based on SNP 40U.548
843
genotypes (d). Every group involved ten C. oleifera individuals. Relative expression
844
levels of qRT-PCR calculated using GAPDH as the reference gene are shown in the
845
left y-axis. The associated phenotype levels are shown in the right y-axis.
846
51
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Fig. 1 The frequency distributions for traits measured in association population. Individual numbers were shown in the right y-axis, and (a) oil content in the kernel, (b) C16:0 content, (c) C16:1 content, (d) C18:0 content, (e) C18:1 content, (f) C18:2 content, (g) C18:3 content, (h) C20:1 content in x-axis, respectively. 191x244mm (300 x 300 DPI)
ACS Paragon Plus Environment
Page 52 of 57
Page 53 of 57
Journal of Agricultural and Food Chemistry
Fig. 2 LD levels among pairwise SNPs in CoSAD2 (A), CoSAD1 (B), Cofad2-A (C), Cofad2-B (D) and decay of LD with distance in base pairs between sites in four candidate genes (E). LD decayed within the candidate genes, and several loci pairs with the distance over 1.8kb were in significant LD which were in the same genes, such as (a) SNP6U.2655 - SNP6U.331, (b) SNP6U.3443 - SNP6U.891, (c) SNP6U.2744 - SNP6U.331, (d) SNP6U.3443 - SNP6U.958, (e) SNP6U.3410 - SNP6U.958, and etc (r2 > 0.4; P < 0.001). 220x280mm (300 x 300 DPI)
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Fig. 3 Population structure of 216 individuals of association population analyzed by the STRUCTURE program. Numbers on the y-axis indicate the membership coefficient. Each bar on the x-axis represents an individual; colored segments within one bar reflect the proportional contributions of each subpopulation to this individual. The color of the bar indicates the four groups identified by the STRUCTURE software (P1 = red, P2 = green, P3 = blue, and P4 = yellow). 169x56mm (300 x 300 DPI)
ACS Paragon Plus Environment
Page 54 of 57
Page 55 of 57
Journal of Agricultural and Food Chemistry
Fig. 4 Genotypic effect of significant SNPs for traits in both association and validation populations. Genotypic effects of significant (a) SNP 2.A.705 and (b) SNP 6U.1577 for C18:0 content, (c) SNP 6U.1031 for C18:1 content, (d) SNP 6U.2366 for C20:1 content, (e) SNP 40U.548 and (f) SNP 6U.505 for OC in association (left) and validation (right) populations, respectively. 155x157mm (300 x 300 DPI)
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Fig.5 Relative expression levels for candidate genes and the associated phenotype levels in different groups representing different significant genotypes (the error bars represent +SD). The relative expression levels of CoSAD2 and the associated phenotype levels in different groups based on SNP 6U.1577 genotypes (a), SNP 6U.1031 genotypes (b) and SNP 6U.505 genotypes (c), respectively. The relative expression levels of CoSAD1 and OC in different groups based on SNP 40U.548 genotypes (d). Every group involved ten C. oleifera individuals. Relative expression levels of qRT-PCR calculated using GAPDH as the reference gene are shown in the left y-axis. The associated phenotype levels are shown in the right y-axis.
ACS Paragon Plus Environment
Page 56 of 57
Page 57 of 57
Journal of Agricultural and Food Chemistry
LD decayed within genes and six significant SNP- oil traits associations were defected in candidate genes 247x206mm (300 x 300 DPI)
ACS Paragon Plus Environment