Association Genetics Identifies Single Nucleotide Polymorphisms

Feb 13, 2019 - Sequences of four candidate genes (PDF). Frequency distributions ... SNP data filter by MAF 0.05 for association population (XLSX). Squ...
0 downloads 0 Views 3MB Size
Subscriber access provided by LUNDS UNIV

Biotechnology and Biological Transformations

Association genetics identifies single nucleotide polymorphisms related to kernel oil content and quality in Camellia oleifera Ping Lin, Hengfu Yin, Chao Yan, Xiaohua Yao, and Kailiang Wang J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.8b03399 • Publication Date (Web): 13 Feb 2019 Downloaded from http://pubs.acs.org on February 13, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 57

Journal of Agricultural and Food Chemistry

1 2 3

Association genetics identifies single nucleotide polymorphisms related to kernel oil content and quality in Camellia oleifera

4

Ping Lin1,2*, Hengfu Yin1,2, Chao Yan1,2,3, Xiaohua Yao1,2 and Kailiang Wang1,2*

5

1. State Key Laboratory of Tree Genetics and Breeding, Research Institute of Subtropical Forestry,

6 7 8 9

Chinese Academy of Forestry, Hangzhou 311400, China 2. Key Laboratory of Forest Genetics and Breeding, Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou 311400, China 3. Experimental Center for Subtropical Forestry, Chinese Academy of Forestry, Fenyi, 336600, China

10

* Corresponding Authors:

11

Ping Lin, [email protected]; + (86) 571-63320229

12

Kailiang Wang, [email protected]; + (86) 571-63379095

13

1

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

14

Abstract

15

Camellia oleifera, as an important non-wood tree species for seed oil in China, has

16

received enormous attentions owing to its high unsaturated fatty acid contents

17

benefited to human health. It is necessary to examine allelic diversity of key genes

18

which are associated with oil production in C. oleifera cultivars with a large variation

19

of fatty acid compositions. In this study, we performed the association analysis

20

between four key genes (two CoSAD and two Cofad2) coding fatty acid desaturases

21

and traits including oil content and fatty acid composition. We identified two single

22

nucleotide insertion-deletion (InDel) and 362 single-nucleotide polymorphisms (SNPs)

23

within the four candidate genes by sequencing an association population (216

24

accessions). Single-marker (or haplotype) and traits association tests were conducted

25

by linkage disequilibrium (LD) approaches to detect significant marker-trait

26

associations. Validation population (279 hybrid individuals from six full-sibs families)

27

studies were performed to validate the function of allelic variations significantly

28

associated. In all, 90 single marker-trait and one haplotype-trait associations were

29

significant in association population, and these loci explained 1.87%~17.93%

30

proportion of the corresponding phenotypic variance. Further, six SNP marker–trait 2

ACS Paragon Plus Environment

Page 2 of 57

Page 3 of 57

Journal of Agricultural and Food Chemistry

31

associations (Q < 0.10) from Cofad2-A, CoSAD1 and CoSAD2 were successfully

32

validated in the validation population. The SNP markers identified in this study can

33

potentially be applied for future marker-assisted selection to improve oil content and

34

quality in C. oleifera.

35

Keywords: candidate-gene-based association mapping; Camellia oleifera; single

36

nucleotide polymorphisms (SNPs); oil content (OC) and fatty acid composition;

37

stearoyl-ACP desaturase gene (CoSAD) and Δ12 (ω6)-desaturase gene (Cofad2)

38

3

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 4 of 57

39

Introduction

40

China is the largest country of edible oil consumption in the world. With the economy

41

developing and people’s living standards improving, the per capita possession of

42

edible oil and imports of raw materials for edible oil have kept increasing every year

43

in China 1. Camellia oleifera is planted widely as an important edible oil-bearing tree

44

species in south China. It was reported that the dry seed production was around ~2.2

45

million tons over a cultivated area of four million ha in 2015 2. At present, Camellia

46

oil has become the fourth important plant edible oil produced in China. Camellia oil,

47

nearly 50% of dry kernel weight, is the major product of C. oleifera seeds and is

48

considered to be one kind of the highest quality oil

49

approximately 90% of all fatty acids (FAs) are the unsaturated FAs (UFA), which

50

include oleic acid (C18:1, monounsaturated fatty acid, MUFA) and linoleic acid

51

(C18:2, polyunsaturated fatty acid, PUFA), mainly. Because of its high UFA contents,

52

especially high MUFA content is good for human health, C. oleifera has received

53

much more attention, and the study of factors that influence UFA biosynthesis is very

54

important in C. oleifera breeding. The identification of genes and the allelic variation

4

ACS Paragon Plus Environment

2-4.

In the Camellia oil,

Page 5 of 57

Journal of Agricultural and Food Chemistry

55

associated with oil and FAs biosynthesis in the C. oleifera would provide useful

56

information for breeding programs to improve oil content and quality.

57

The oil and FA biosynthesis has been extensively studied in C. oleifera 2, 5-7. The

58

seed oil of C. oleifera was dominant with C18:1, which was similar to hickory and

59

olive

60

stearoyl-ACP desaturase gene (CoSAD) with a low level of Δ12 (ω6)-desaturase gene

61

(Cofad2) was associated with the C18:1 synthesis in C. oleifera 2. C18:1 biosynthesis

62

is catalyzed by stearoyl-ACP desaturases (SAD) which are encoded by SAD genes, a

63

multigene family with several members

64

impacts the ratio of saturated and unsaturated FAs, and is a major determinant of FA

65

composition 13-21. The Δ12 (ω6)-desaturase (FAD2) desaturates the C18:1 to synthesize

66

the C18:2 in endoplasmic reticulum, which is the key rate-determining procedure for

67

the PUFA synthesis in some oil crops

68

which belong to a multigene family 24. There are at least five fad2 genes in olive (Olea

69

europaea var. sylvestris) 9, three in Camelina sativa 25, two in sesame (Sesamum indicum)

70

26

71

8, 9.

Previous studies had indicated that the expression of a high level of

10-15.

22, 23.

The activity of SAD significantly

The FAD2s are encoded by fad2 genes,

and at least seven in soybean (Glycine max) 27-33, respectively. Linkage disequilibrium (LD)-based association mapping provides a powerful 5

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

34-38.

Page 6 of 57

72

strategy to genetically dissect complex quantitative traits

73

which the LD rapidly decays, association mapping based on candidate gene is

74

appropriate to detect the association between sequence variations and specific

75

interested traits 39. In generally, LD decays rapidly within genes in outcrossing species

76

and can’t extend over the entire gene region, such as tea (Camellia sinensis)

77

Chinese white poplar (Populus tomentosa)

78

trichocarpa)

79

perenne)

80

candidate gene association mapping approach may be suitable to establish

81

associations between polymorphic loci in targeted genes and complex traits.

42.

40,

sunflower (Helianthus annuus)

35,

41

For the species in

34,

black cottonwood (Populus and perennial ryegrass (Lolium

C. oleifera is an oil-bearing tree species with self-incompatibility, and

82

To further explore the functions of CoSAD and Cofad2 genes in Camellia oil FA

83

composition formation, we had isolated two CoSAD and two Cofad2 genes from C.

84

oleifera based on previously publications

85

And the expression levels of these four genes were significantly associated with the

86

oil accumulation and FA composition during the seed developing of C. oleifera 2.

87

Nevertheless, little is known about whether the allelic diversities of these key genes

88

contribute to FA composition variation in the C. oleifera germplasms. In this study, in

2, 6, 43,

respectively (Table 1 and File S1).

6

ACS Paragon Plus Environment

Page 7 of 57

Journal of Agricultural and Food Chemistry

89

order to explore allelic effects, two InDels and 362 SNPs developed from CoSAD and

90

Cofad2 genes were used for single-marker and haplotype-based association mapping

91

through the sequencing of a natural population of C. oleifera. Subsequently, we

92

validated the significant associations in the C. oleifera by a hybrid population to

93

confirm the allelic loci associated with phenotypic traits. This work provides insights

94

into the regulation of seed oil biosynthesis in C. oleifera and benefits the genetic

95

improvement of Camellia oil quality and yield.

96

1. Material and Methods

97

1.1 Plant materials

98

Association population In 2004, the Research Institute of Subtropical Forestry,

99

Chinese Academy of Forestry (Hangzhou, China) maintained a collection of 494

100

accessions from the mostly natural distribution region of C. oleifera, covering nine

101

provinces in China subtropical areas. These accessions were propagated by grafting

102

and established a clonal plantation using a randomized complete block design with

103

three replications in Dongfanghong Forest Farm of Zhejiang Province, Jinhua city,

104

Zhejiang province, China. In this study, a set of 216 unrelated accessions of C.

105

oleifera from the collection were sampled for association analysis. 7

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

106

Validation population In this study, 279 hybrid individuals from six full-sibs

107

families were selected for validation analysis. The progenies were grown in 2009 in

108

Dongfanghong Forest Farm of Zhejiang Province, Jinhua city, Zhejiang province,

109

China, using a randomized complete block design with three replications.

110

Phenotypic evaluation OC and oil quality traits, which depended on FA composition,

111

were the important breeding aims of C. oleifera. In this study, OC, palmitic acid

112

(C16:0), palmitoleic acid (C16:1), stearic acid (C18:0), C18:1, C18:2, linolenic acid

113

(C18:3) and cis-11-eicosenoic acid (C20:1) contents of Camellia oil were evaluated.

114

The 216 accessions of the association population were scored on eight quantitative

115

traits, with at least three ramets per genotype for three years. The mature seeds were

116

collected, and the Soxtec extraction method was performed to measure OC as

117

described 44. The total lipid was extracted from kernel by petroleum ether, and seven

118

kinds of FA components were quantified using gas chromatography according to the

119

previous study 45.

120

The same eight quantitative traits were measured with three replicates in the 279

121

hybrid individuals in 2014, using the same methods for the association population.

122

Data Processing System (DPS 14.50, http://www.chinadps.net/dps_eng/) software 8

ACS Paragon Plus Environment

Page 8 of 57

Page 9 of 57

Journal of Agricultural and Food Chemistry

123

was used to perform the analysis of normal fitting and phenotypic correlations for

124

eight traits 46. The box-plots were made by R (https://www.r-project.org/).

125

1.2 DNA extraction, amplification, and sequencing

126

The genomic DNA (gDNA) was isolated from young leaves (one ramet per clone for

127

association population) using the TaKaRa MiniBEST Plant Genomic DNA Extraction

128

Kit (TaKaRa, Dalian, China) according to the user manual. Based on the cDNA

129

sequences of two CoSAD and two Cofad2 2, the specific primer sets (Table 1) were

130

designed and used for the amplification of gene regions with gDNA as template. PCR

131

was performed in a final reaction volume of 50 μL containing 60 ng of gDNA, 2 ×

132

LAmpTM Master Mix (Vazyme, Nanjing, China) 25 μL, 0.4 mM of forward primer,

133

and 0.4 mM of reverse primer. The PCR procedure was: 94 ℃ for five min, followed

134

by 35 cycles of 94 ℃ denaturation for 30 s, 55 ℃ annealing for 30 s, and 72 ℃

135

extension for two min, with a final extension at 72 ℃ for seven min. The amplified

136

DNA fragments were purified in 1.2% agarose gels and sequenced with an ABI

137

3730XL DNA Analyzer by sequencing primers (Table 2). In total, 7610 bp of gDNA

138

sequences from four unique genes Cofad2-A (GenBank ID: JQ739518.1), Cofad2-B

139

(GenBank ID: KJ995981.1), CoSAD1 (GenBank ID: MH836317) and CoSAD2 9

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

140

(GenBank ID: MH836318), with an average of 1903 bp per gene, were obtained

141

without considering InDels, and the gene length ranged from 1160 bp (Cofad2-B) to

142

3548 bp (CoSAD2; Table 1 and File S1).

143

1.3 SNP discovery, genotyping and linkage disequilibrium (LD)

144

SNP discovery and genotyping were performed using NovoSNP version 3.0.1

145

(http://www.molgen.vib-ua.be/bioinfo/novosnp/) from sequence trace files, and the

146

SNPs with the score > 12 were selected as true variations 47. All un-filtered SNPs data

147

of association and validation populations have been submitted to the European

148

Variation Archive (EVA), and the Project accession number is PRJEB31224

149

(https://www.ebi.ac.uk/eva). Minor alleles with frequency < 5% were filtered before

150

the further analysis. After filtering, the squared correlation of allele frequencies (r2)

151

was measured to estimate the LD between pairs of SNPs, using software package

152

TASSEL version 3.0 48. The decay of LD with physical distance (base pairs) between

153

SNP sites within the candidate genes was evaluated by nonlinear regression analysis

154

of r2 values 34, 49.

155

1.4 Population structure and relative kinship

156

In our previous study, 500 SRAP primer pairs were developed and 46 primer pairs 10

ACS Paragon Plus Environment

Page 10 of 57

Page 11 of 57

Journal of Agricultural and Food Chemistry

157

were used to analyze C. oleifera clones (same to the ones of the association

158

population) genetic diversity

159

kinship matrix was calculated using TASSEL version 3.0

160

The population structure of the association population was evaluated using

161

STRUCTURE

162

(http://web.stanford.edu/group/pritchardlab/structure.html)

163

Model-admixture model

164

run five times across a range of K values from K=2 to K=9, with 10000 Burn-in

165

period and 10000 repeats. The highest likelihood with lnP(D) and α value settling

166

down to be relatively constant were used for estimating the most likely number of

167

subpopulations.

168

1.5 Single marker-trait association analysis

169

In the association population, all association tests between SNP markers and traits

170

were performed, using mixed linear model (MLM) method with 5×104 permutations

171

in the software TASSEL version 3.0 48. The MLM can be described as y = SNP + Q +

172

kinship + e. In the MLM method, the matrix from STUCTURE software (Q) were

173

used to define the population structure, and the kinship matrix were used to evaluate

50.

Based on the SRAP analysis data, The pairwise

version

51

48

and shown in Table S1.

2.3.4

software by

Ancestry

based on the same SRAP analysis data. The program was

11

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 12 of 57

174

the coancestry coefficients, e is the residual. P-values for all association were

175

corrected using Benjamini and Hochberg’s method to control FDR

176

dominance (d) to additive (a) effect were calculated for each significant marker to

177

quantify the mode of gene action according to the previous study

178

tests of significant SNP loci identified in the association population were performed in

179

the validation population using a χ2 test by Data Processing System (DPS 14.50)

180

software 46. And P-values were corrected using Benjamini and Hochberg’s method to

181

control FDR 52.

182

1.6 Haplotype-trait association analysis

183

The haplotype (a block of linked ordered markers) frequencies of locus genotypes

184

were estimated by Haploview version 4.2 54 and the tests of the haplotype association

185

with the traits were performed using the software TASSEL version 3.0

186

correction for multiple testing was performed using Benjamini and Hochberg’s

187

method to control FDR 52.

188

1.7 Quantitative RT-PCR (qRT-PCR) testing

189

qRT-PCR was performed using single-strand cDNA samples, which were synthesized

190

from the total RNA in developing seeds of C. oleifera accessions (ten accessions per 12

ACS Paragon Plus Environment

52.

The ratios of

40, 53.

Inheritance

48.

A

Page 13 of 57

Journal of Agricultural and Food Chemistry

191

particular genotypes ) using PrimeScriptTM RT Master Mix (TaKaRa, Dalian, China).

192

The qRT-PCR program and the analysis of relative expression level of genes were

193

performed as described by Lin et al 2. The primer pairs (Table S2) were individually

194

designed for the CoSAD genes according to the SNP-based association results using

195

Primer 5.0 software. The glyceraldehyde-3-phosphate dehydrogenase gene (GAPHD)

196

was used as the reference gene 55.

197

2. Results

198

2.1 Phenotypic analyses of traits distribution and correlations

199

We extracted and measured the total OC and seven main kinds of FAs composition,

200

which accounted for over 99% of all FAs in Camellia oil. In the association

201

population, OC ranged from 128.17 mg/g to 510.26mg/g with mean 362.42mg/g

202

(Table S3 and S4). The relative contents of C16:0 and C18:1 ranged from 6.70% to

203

12.30% (mean 8.66%) and from 69.10% to 84.70% (mean 78.94%), respectively

204

(Table S3 and S4). Descriptive statistics of the trait distributions were presented in

205

Table S4. In the validation population, OC ranged from 121.61 mg/g to 447.48mg/g

206

with mean 317.93mg/g (Table S5 and S6). The relative contents of C16:0 and C18:1

207

ranged from 6.70% to 11.00% (mean 8.65%) and from 68.70% to 85.80% (mean 13

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

208

78.19%), respectively (Table S5 and S6). The descriptive trait-distribution statistics of

209

the validation population were showed in Table S6. As expected, the distributions for

210

most traits measured in both populations followed an approximately normal

211

distribution (Fig. 1 and Fig. S1).

212

The OC and FA relative contents showed significant correlations in the

213

association population (Table 3 above diagonal). The OC was positively correlated

214

with C18:0 (R = 0.4366, P < 0.01) and negatively correlated with C18:2 (R = -0.3406,

215

P < 0.01) and C18:3 (R = -0.6806, P < 0.01), which resembled the scenario of

216

validation population (Table 3 below diagonal). In addition, OC was significant

217

positively correlated with C18:1 and negatively correlated with C16:0 in association

218

population (P < 0.01). According to the correlation, the FA content traits were divided

219

into two groups. The first group included C16:0, C18:2 and C18:3, and the other

220

group included C18:0 and C18:1. In the both groups, the significant positive pairwise

221

correlations were observed respectively, while between groups, each one in group1

222

showed significant negative pairwise correlations with one in group2 in association

223

population (P < 0.01) (Table 3 above diagonal). Similar FA content correlation results

224

were observed in the validation population (Table 3 below diagonal). It's worth noting 14

ACS Paragon Plus Environment

Page 14 of 57

Page 15 of 57

Journal of Agricultural and Food Chemistry

225

that C18:1 and C18:2 contents showed significant negative correlations and the

226

correlation coefficient (R) were nearly 1, which were 0.9618 and 0.9763 in

227

association and validation populations, respectively (Table 3).

228

2.2 SNP detection and LD test

229

After removing the SNPs with Minimum Allele Frequency (MAF) less than 5%, total

230

364 polymorphic sites including two single nucleotide InDels and 362 SNPs (Table

231

S7) were detected in the four candidate genes in which two genes for CoSAD and two

232

ones for Cofad2, with an average density of one SNP every 21bp (π ranged from

233

0.0005 to 0.00739 and θw from 0.0013 to 0.0116). For these polymorphic sites, 67.03%

234

were derived from the intron regions and 31.87% from the exon regions, and four

235

SNP sites were detected in the 3’ UTR. The polymorphic sites had an average density

236

of one SNP every 38.73bp in the exon regions and 9.90bp in the intron regions,

237

respectively. In these 362 SNP sites, 53.57% were nucleotide transitions (including 70

238

GA and 125 CT) and 45.88% nucleotide transversions (including 35 GT,51

239

AT,48 AC and 33 GC).

240

Pairwise LD between SNP markers were estimated by r2 within the candidate

241

genes (Fig. 2). We showed that the SNPs in LD were in the same gene, and limited 15

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

242

LD of the SNPs within the candidate genes did not extend over the entire gene region

243

(Fig. 2A, B, C, D). We further calculated the average decay distance associated with

244

LD within the candidate genes, and the results showed that the LD distance decayed

245

quickly with the DNA length increasing. The most of SNPs were in linkage

246

equilibrium when the distance of SNPs were over 1.8kb (r2 < 0.3; P < 0.05; Fig. 2E;

247

Table S8). Several loci with the distance over 1.8kb were in significant LD in

248

CoSAD2, such as marker pairs SNP6U.2655 - SNP6U.331, SNP6U.3443 -

249

SNP6U.891, SNP6U.2744 - SNP6U.331, SNP6U.3443 - SNP6U.958, SNP6U.3410 -

250

SNP6U.958, and etc (r2 > 0.4; P < 0.001; Fig. 2E; Table S8). Therefore, the LD

251

between SNPs decayed within the candidate genes in C. oleifera genome, and the

252

SNP markers-based LD association analysis was feasible within the genes.

253

2.3 Population structure

254

The presence of the population deviations from Hardy-Weinberg proportions, can

255

lead to spurious associations and confound association studies. We showed that K=4

256

had the highest likelihood which with the lnP(D) and α value settling down to be

257

relatively constant using software STRUCTURE version 2.3.4. The association

258

population could be subdivided to four subpopulations, which were under the 16

ACS Paragon Plus Environment

Page 16 of 57

Page 17 of 57

Journal of Agricultural and Food Chemistry

259

Hardy-Weinberg equilibrium (HWE). The proportions of membership of the samples

260

in the four subpopulations were 13.5%, 9.0%, 70.1% and 7.4%, respectively (Fig. 3).

261

The expected heterozygosity between individuals of subpopulation 2 was the highest

262

(0.1002), that of subpopulation 3 was the lowest (0.0506), and that of subpopulation 1

263

and 4 were parallel (0.0793 and 0.0714, respectively). In agreement with tests for

264

HWE, all Fst values were over 0.50 (the mean Fst = 0.6093) and suggested there were

265

extensive genetic divergence among the four subpopulations. The details of samples

266

estimated membership in every subpopulation were in Table S9.

267

2.4 Summary of single-SNP and haplotype based associations

268

Single SNP maker-trait associations In total, 2912 (364 SNPs (or InDel) × 8 traits)

269

single-marker association tests were performed with 5×104 permutations using MLM

270

model (Table S10). In all, 132 associations were significant at threshold of P < 0.05

271

(Table S10). After multiple test correction, there were 90 associations were significant

272

with a threshold of Q < 0.05 (Table 4). These loci explained the considerable

273

proportion of the phenotypic variance from 1.87% to 17.93% (Table 4). Of these, 15

274

SNP markers were associated with C16:0 content, 13 ones were associated with

275

C16:1 content. C18:1 and C18:2 had seven significant associations each; ten C18:0 17

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

276

associations, 14 C18:3 associations, and twelve associations respectively with C20:1

277

and OC were observed in the association population (Q < 0.05; Table 4). The 90

278

associations represented 63 SNP loci from four genes. Multiple SNP markers were

279

significantly associated with more than one trait, which suggesting these loci had a

280

pleiotropic effect for certain traits (Table 4). The dominant and additive effects of the

281

twelve of 90 marker-trait pairs were calculated. In these twelve marker-trait pairs,

282

there were eight pairs with the mode of gene action which consistent with under- or

283

overdominance (|d/a| > 1.25). The remaining four markers had the mode of gene

284

action that was partially to fully dominant (0.50 < |d/a| < 1.25). And we couldn’t

285

detect the markers with the codominant (additive) gene action mode (|d/a| < 0.50)

286

(Table 5).

287

Haplotype-trait associations In all, 22 haplotypes were identified from four

288

candidate genes by the software Haploview with a significance threshold of P < 0.01

289

(Table S11). Haplotype-based association tests were performed, and no significant

290

association were found between haplotypes and phenotypic traits with a threshold of

291

P < 0.05 and Q < 0.05 (Table 6). With a significance threshold of P < 0.10, six

292

haplotypes from one candidate gene (CoSAD2) were significantly associated with the 18

ACS Paragon Plus Environment

Page 18 of 57

Page 19 of 57

Journal of Agricultural and Food Chemistry

293

five phenotypic traits excluding C16 :0, C18 :1 and C18 :2 contents (Table 6).

294

Multiple test correction analysis reduced this number to only one (Q < 0.10; Table 6).

295

The association of SNP6U.1577 (in this haplotype block) and the C18 :0 content (Q
0.4; P < 0.001).

826

Fig. 3 Population structure of 216 individuals of association population analyzed

827

by the STRUCTURE program. Numbers on the y-axis indicate the membership

828

coefficient. Each bar on the x-axis represents an individual; colored segments within

829

one bar reflect the proportional contributions of each subpopulation to this individual.

830

The color of the bar indicates the four groups identified by the STRUCTURE

831

software (P1 = red, P2 = green, P3 = blue, and P4 = yellow).

832

Fig. 4 Genotypic effect of significant SNPs for traits in both association and

833

validation populations. Genotypic effects of significant (a) SNP 2.A.705 and (b) SNP

834

6U.1577 for C18:0 content, (c) SNP 6U.1031 for C18:1 content, (d) SNP 6U.2366 for

835

C20:1 content, (e) SNP 40U.548 and (f) SNP 6U.505 for OC in association (left) and

836

validation (right) populations, respectively.

837

Fig.5 Relative expression levels for candidate genes and the associated

838

phenotype levels in different groups representing different significant genotypes (the

839

error bars represent +SD). The relative expression levels of CoSAD2 and the 50

ACS Paragon Plus Environment

Page 50 of 57

Page 51 of 57

Journal of Agricultural and Food Chemistry

840

associated phenotype levels in different groups based on SNP 6U.1577 genotypes (a),

841

SNP 6U.1031 genotypes (b) and SNP 6U.505 genotypes (c), respectively. The relative

842

expression levels of CoSAD1 and OC in different groups based on SNP 40U.548

843

genotypes (d). Every group involved ten C. oleifera individuals. Relative expression

844

levels of qRT-PCR calculated using GAPDH as the reference gene are shown in the

845

left y-axis. The associated phenotype levels are shown in the right y-axis.

846

51

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Fig. 1 The frequency distributions for traits measured in association population. Individual numbers were shown in the right y-axis, and (a) oil content in the kernel, (b) C16:0 content, (c) C16:1 content, (d) C18:0 content, (e) C18:1 content, (f) C18:2 content, (g) C18:3 content, (h) C20:1 content in x-axis, respectively. 191x244mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 52 of 57

Page 53 of 57

Journal of Agricultural and Food Chemistry

Fig. 2 LD levels among pairwise SNPs in CoSAD2 (A), CoSAD1 (B), Cofad2-A (C), Cofad2-B (D) and decay of LD with distance in base pairs between sites in four candidate genes (E). LD decayed within the candidate genes, and several loci pairs with the distance over 1.8kb were in significant LD which were in the same genes, such as (a) SNP6U.2655 - SNP6U.331, (b) SNP6U.3443 - SNP6U.891, (c) SNP6U.2744 - SNP6U.331, (d) SNP6U.3443 - SNP6U.958, (e) SNP6U.3410 - SNP6U.958, and etc (r2 > 0.4; P < 0.001). 220x280mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Fig. 3 Population structure of 216 individuals of association population analyzed by the STRUCTURE program. Numbers on the y-axis indicate the membership coefficient. Each bar on the x-axis represents an individual; colored segments within one bar reflect the proportional contributions of each subpopulation to this individual. The color of the bar indicates the four groups identified by the STRUCTURE software (P1 = red, P2 = green, P3 = blue, and P4 = yellow). 169x56mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 54 of 57

Page 55 of 57

Journal of Agricultural and Food Chemistry

Fig. 4 Genotypic effect of significant SNPs for traits in both association and validation populations. Genotypic effects of significant (a) SNP 2.A.705 and (b) SNP 6U.1577 for C18:0 content, (c) SNP 6U.1031 for C18:1 content, (d) SNP 6U.2366 for C20:1 content, (e) SNP 40U.548 and (f) SNP 6U.505 for OC in association (left) and validation (right) populations, respectively. 155x157mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Fig.5 Relative expression levels for candidate genes and the associated phenotype levels in different groups representing different significant genotypes (the error bars represent +SD). The relative expression levels of CoSAD2 and the associated phenotype levels in different groups based on SNP 6U.1577 genotypes (a), SNP 6U.1031 genotypes (b) and SNP 6U.505 genotypes (c), respectively. The relative expression levels of CoSAD1 and OC in different groups based on SNP 40U.548 genotypes (d). Every group involved ten C. oleifera individuals. Relative expression levels of qRT-PCR calculated using GAPDH as the reference gene are shown in the left y-axis. The associated phenotype levels are shown in the right y-axis.

ACS Paragon Plus Environment

Page 56 of 57

Page 57 of 57

Journal of Agricultural and Food Chemistry

LD decayed within genes and six significant SNP- oil traits associations were defected in candidate genes 247x206mm (300 x 300 DPI)

ACS Paragon Plus Environment