Variability of crops' compositional characteristics: What do

Variability of crops' compositional characteristics: What do experimental data show? Claudia Paoletti , Stefania Favilla ... Publication Date (Web): J...
0 downloads 0 Views 515KB Size
Subscriber access provided by AUSTRALIAN NATIONAL UNIV

Food Safety and Toxicology

Variability of crops’ compositional characteristics: What do experimental data show? Claudia Paoletti, Stefania Favilla, Alessandro Leo, Franco Neri, Hermann Broll, and Antonio Fernandez J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.8b01871 • Publication Date (Web): 21 Jul 2018 Downloaded from http://pubs.acs.org on July 23, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

Journal of Agricultural and Food Chemistry

Variability of crops’ compositional characteristics: What do experimental data show?

Claudia Paoletti1*, Stefania Favilla2, Alessandro Leo2, Franco M. Neri1, Hermann Broll1, Antonio Fernandez1 1

European Food Safety Authority – EFSA – Via Carlo Magno 1A, 43126 Parma, Italy 2

Independent Researcher, Parma, Italy

*Corresponding author: [email protected] – phone: +39 0521 036648

Keywords

Natural variability, food safety, compositional analysis, equivalence testing, empirical distribution, maize, soybean

Abstract (100-150 words)

Common pillar across the risk assessment strategies implemented worldwide for genetically modified plants is the comparison of their compositional profile to that of conventional counterparts deemed safe. If differences are observed, those that cannot be attributed to natural variation are further evaluated for their safety relevance. This principle is clear, but its implementation is challenging. Here we first discuss the difficulties of estimating natural variation of crop-specific compositional endpoints, the various attempts made, together with their advantages and limitations. Second we present the empirical distribution curves of compositional endpoints for two crops bearing a large commercial interest worldwide: maize and soybean. These curves provide novel information on endpoint-specific variability relevant for further progressing in the risk assessment process. 1 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 2 of 28

Introduction

The strategies implemented internationally for the risk assessment of genetically modified (GM) plants, are grounded in the investigation of their compositional profile to allow the identification of potential effects of the genetic modification in derived products. The strategy to identify such effects, recommended by Codex Alimentarius1,2

and

endorsed

worldwide3

(http://www.fao.org/food/food-safety-

quality/gm-foods-platform/en) is the comparative approach according to which the composition of the GM plant is compared to that of a non-GM comparator deemed as safe: if differences are observed, these are considered indicators of potential effects which are evaluated, as appropriate, in the subsequent steps of the risk assessment process. This approach is rooted in the concept of substantial equivalence, initially developed by OECD in 1993, which establishes that a GMderived food can be considered as safe as a conventional food if it can be demonstrated that its characteristics and composition are the same as those of the conventional food, apart from the change objective of the genetic modification (i.e. the intended effect)4. Within this frame, Codex Alimentarius guidelines2 recommend to analyze key compositional endpoints, i.e. components with a substantial impact on the overall diets, when considered in their totality, provide assurance that the food is unlikely to have adverse effects on human/animal health. The OECD Working Group for the safety of novel foods and feeds derived through modern biotechnology, has defined with larger precision the key nutrients, anti-nutrients, toxins and endogenous allergens of several crop varieties, developing crop-specific consensus documents5. 2 ACS Paragon Plus Environment

Page 3 of 28

Journal of Agricultural and Food Chemistry

OECD is continuously developing new ones and updating those already existing, providing valuable support to risk assessment bodies worldwide. Despite the international consensus surrounding the well-developed Codex Alimentarius GMO risk assessment frame2, a difficulty still exists: Codex Alimentarius explicitly recommends that the risk assessment should take into account the degree of natural variability of each compositional endpoint to maximize the probability of distinguishing differences reflecting potential effects from those which are direct expression of natural variation. Although the relevance and appropriateness of such recommendation

has

been

internationally

recognised6,7,8,9,

neither

Codex

Alimentarius, nor other international organizations proposed a concrete strategy to obtain reliable estimates of the natural variability of crop-specific compositional endpoints. Indeed this is not a trivial task. Each of the endpoints measured in any given crop is naturally characterized by a certain degree of variability, which is unknown because of the very large number of varieties present on the market at any given time, and the indefinite number of conditions under which such varieties can be grown. OECD consensus documents attempt to partially mitigate such knowledge gap and report, whenever possible, ranges of endpoint-specific values collected from either literature, or databases, or other publicly available sources5. Even though ranges provide an indication of the spread of the data, they have biological and statistical limitations. Ranges collected using information from different sources are likely to be inflated by hidden confounding effects, linked to differences in experimental conditions and/or analytical methodologies. From a broader point of view, the range is a summary statistics computed taking into account only two data points: the largest and the smallest of a dataset. Hence, it is particularly sensitive to changes in sample size, even to the addition of a single data point. In general, the 3 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 4 of 28

range is affected by extreme values and it can only provide a “one-dimensional” measure of dispersion, directly defined by the interval identified by the largest and smallest value. In contrast, measures of variability such as standard deviation and/or variance provide a “two-dimensional” measure of dispersion, as they take into account also the probability of occurrence of each data value, enabling the use of confidence intervals and thus statistical hypothesis testing approaches. The European Food Safety Authority – EFSA – has taken a proactive role on these issues and in 201010 proposed a new approach for GM plants risk assessment, where the natural variability of the endpoints measured for the comparative assessment is estimated as the variance between non-GM commercial plant varieties grown under controlled field conditions. The underlying assumption is that since consumers are regularly exposed to varieties commercially available, these can provide biological relevant bench-marks for the risk assessment process. Certainly the use of a limited number of commercial varieties to determine the variability is conservative as these can capture only portions of the natural variation, but these are portions to which consumers are exposed and, as such, already accepted as safe. Overall, this approach offers several advantages, all directed to reduce consumers’ risk: confounding effects, potentially affecting variability estimates, are controlled by growing commercial varieties under well-defined experimental field conditions10,11; variability is estimated using a statistical method robust to extreme values10, which allows establishing endpoint-specific distributions. This approach, also incorporated in the European legislation12, outstands as the first regulatory attempt to quantify the degree of natural variability of compositional characteristics, safeguarding consumers’ protection. Since its implementation, it has proven itself an effective framework for the risk assessment of GM crops. 4 ACS Paragon Plus Environment

Page 5 of 28

Journal of Agricultural and Food Chemistry

Here we offer an additional step forward in the challenging process of estimating natural variability of crops’ compositional profile. First, we present the empirical distribution curves of the compositional endpoints recommended by OECD for two of the crops bearing a large commercial interest on the European market for import and processing, and worldwide for levels of production: maize and soybean13,14. These curves, estimated from experimental data collected from non-GM commercial varieties grown in standardised field conditions10,11 over the course of seven years, allow identifying and characterizing the portion of variability to which consumers have been exposed over that given time period. Second, we investigate whether endpoint-specific ranges recommended by OECD in the respective consensus documents capture the one-dimensional spread of the observed variability. Together these two objectives provide a valuable contribution to the process that, by progressive approximation, will allow establishing endpoint-specific variability distribution trends in support of a more robust risk assessment process. Last but not least, a solid characterization of compositional variability is also an essential prerequisite to revisit and prioritise, on a crop-by-crop basis, the number and nature of the recommended OECD endpoints; a wish under discussion by the scientific community15, difficult to achieve without a complete understanding of the endpoint-specific variability distribution profiles. In such respect, this study contributes to develop the knowledge necessary to revisit the criteria used to select representative and biologically relevant compositional endpoints for the comparative assessment.

Materials and Methods

5 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 6 of 28

The dataset used for the present aggregated analysis includes data from comparative field trial studies conducted for GM maize and soybean applications submitted in the European Union (EU) for market authorisation, following the requirements of Regulation (EU) 503/201311. These studies adhere to a standardised experimental design, established to minimise confounding effects and harmonise data production, requiring the inclusion of at least 8 locations, 4 replicates per location and 6 non-GM commercial varieties. The different sites are selected to reflect the different meteorological and agronomic conditions under which the GM crop is to be grown, whereas the non-GM commercial varieties are selected to be representative of the site specific agronomic conditions11,12. The aggregation of these data across field trial studies, conducted according to the same regulatory experimental design, can provide unique opportunities to improve our understanding of crop-specific compositional endpoints variability. Overall, the aggregate analysis presented here includes 103 non-GM commercial maize and 106 non-GM commercial soybean varieties grown in 33 field-trial studies (15 for maize and 18 for soybean) covering more than 160 locations (82 for maize and 79 for soybean) spread across North and South America, during seven growing seasons, from 2008 until 2014. According to Regulation (EU) 503/201311, the crop-specific compositional endpoints to be measured in any given regulatory field trial comparative study must be selected according

to

the

available

OECD

consensus

documents

(http://www.oecd.org/chemicalsafety/biotrack/consensus-document-for-work-onsafety-novel-and-foods-feeds-plants.htm). All compositional endpoints recommended by OECD for maize13 and soybean14 were, in principle, considered suitable for the purpose of our investigation. However, we excluded some endpoints to ensure 6 ACS Paragon Plus Environment

Page 7 of 28

Journal of Agricultural and Food Chemistry

reliability of results in terms of both dataset dimension and comparability of analytical methodologies. Suitability of dataset dimension was assessed following the minimum requirements defined by Regulation (EU) 503/201312 and EFSA guidance documents10,11 establishing the inclusion of at least three non-GM commercial varieties at each location in four replicates. On this basis, it was possible to estimate the minimum number of data values that the analysis of 82 (maize) and 79 (soybean) locations should include for each endpoint: 984 and 948, respectively. Any endpoint with a smaller number of data values was excluded from the analysis. Reliability and comparability of analytical results is normally maximised by minimizing or, whenever possible eliminating, the occurrence of confounding effects that may artificially inflate variability estimates and/or introduce bias. For this purpose we excluded from the analysis all those endpoints that, over the course of the seven years, either had been measured applying non-comparable analytical methods, or were expressed in non-equivalent units of measurements. Some endpoints had data values below the limit of quantification (LOQ); as the number of such values was small, they were also excluded from the dataset. These included furfural, sodium, selenium, and vitamin C for maize; and crude fibre for soybean. In addition, soybean lectins were also omitted because the analytical method used in most cases (hemagglutination assay) is known to be sensitive to specific lab conditions16 compromising data comparability across field trial studies. The complete list of the compositional endpoints included in our investigation is provided in Table 1. The frequency distribution of each endpoint - for each crop - was visualised producing a series of histograms to explore, compare and characterise natural 7 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 8 of 28

variability patterns across a range of environments, under comparable conditions. For the visual assessment of the distributional properties of each endpoint, we used a non-parametric kernel density estimate (KDE) of the probability density function (PDF)17,18, free of underlying distributional assumptions. On the PDF of each endpoint we superimposed, if the units of measurement were compatible, the endpoint-specific ranges reported by OECD for both maize13 and soybean14 to allow a visual comparison of the spread captured by the ranges and the width of the observed distributions. Because most statistical tests commonly used in comparative assessment rely on the assumption of normality19, we also calculated a Gaussian-fitting (GF) curve, i.e. a normal distribution with the same mean and standard deviation as the original data, in order to quantify variability of the endpoints measured. A 95% reference interval (R.I.), defined as the 2.5-97.5 percentile of the GF distribution, was calculated and compared with the spread captured by each range. Contrary to ranges, R.I.s are a two-dimensional measure of dispersion, constructed taking into account the probability distribution along both axes and controlling for individual data values that can be extreme and may inflate data variability interpretation. To check for possible departures of the empirical PDF from normality, we evaluated its degree of similarity with the GF curve using two metrics to increase confidence in the evaluation: the relative error (

=

‖ ‖

), which is defined as the square root of

the sum of the squares of the differences between the PDF ( ) and the theoretical GF ( ) dived by the square root of the sum of the squares of the GF ( ); and the well-known Pearson coefficient (r) for linear correlation19 which, in this case,

8 ACS Paragon Plus Environment

Page 9 of 28

Journal of Agricultural and Food Chemistry

measures the degree of linear correlation between PDF and GF. The highest degree of similarity correspond to ρ2 = 0 and r = 1, respectively. Departures from normality are often addressed by transforming the original data to a new scale20. For illustrative purposes, we applied a logarithmic transformation to few endpoints showing deviations from normality, and calculated the GF on the logarithmic scale. Results (GF curves and 95% R.I.) were then back-transformed to the original scale to allow meaningful comparison with ranges reported by OECD. The analysis was performed implementing our own codes in R20,21. Results The estimated PDF of all selected endpoints and their GF were analyzed for each crop. To facilitate discussion, we focus on specific examples illustrative of typical distribution patterns and best/worst GF for maize (Fig. 1.a,b,c,d) and soybean (Fig. 2.a,b,c,d). Endpoint-specific ranges reported by OECD are also shown. The complete set of results for all the endpoints included in our analysis is provided as supplementary information (Appendix A and B). OECD ranges are shown for all endpoints having a comparable unit of measurement. Maize Endpoint-specific PDF vary from being symmetric around the mean, as in the case of cysteine (Fig. 1.a) and potassium (Fig. 1.b), to being skewed as in the case of vitamin A and p-coumaric acid (Fig. 1.c and 1.d). The majority of PDFs show a good GF, as indicated by the high r values and the small

(Table 2): for more than 33 endpoints r ≥ 0.98 and for 36 endpoints

≤ 0.2.

In particular, cysteine and potassium show the best GF (Fig. 1.a and 1.b), whereas vitamin A and p-coumaric acid can be considered worst-case (Fig. 1.c and 1.d). 9 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 10 of 28

Given their right-skewed distributions, the GF was performed on the log scale; the resulting log-normal distribution curves provided a better approximation for the empirical PDFs (Fig. 1.c and 1.d). Ranges reported by OECD13 for cysteine, potassium, vitamin A and p-coumaric acid are superimposed on the x-axis (Fig. 1.a,b,c,d). Interestingly, these ranges capture different portion of the observed spread. For example, in the case of cysteine, range NRC captures only a portion of the spread; range Sou00 spans the entire spread going beyond the minimum and maximum empirical values; range Wat82 captures only a limited portion of the spread and misses the majority of the data values; whereas Comm. Range and range Whi95 embrace a larger proportion of the data values. Similar situations are present for the other three endpoints selected as illustrative examples, as well as for the remaining ones, included as supplementary information (Appendix A). The 95% R.I. was calculated and superimposed on the PDFs and GFs of the endpoints. As shown in Fig. 1.a,b,c,d, whereas ranges provide estimates of a spread without accounting for the likelihood of occurrence of any given data value, the 2-D distribution profile defines the probability of occurrence of each data point and enables the definition of the region (R.I.) where data values occur with high probability (here, 95%). In the case of vitamin A (Fig. 1.c) and p-coumaric acid (Fig. 1.d), the R.I. was estimated on the log scale and back-transformed to the original scale. Also in these cases, the range reported by OECD missed to capture the observed dispersion, identifying only a very restricted portion of the entire R.I.

10 ACS Paragon Plus Environment

Page 11 of 28

Journal of Agricultural and Food Chemistry

Soybean As for maize, four illustrative examples of soybean endpoint-specific distribution patterns were selected and are presented below. PDFs vary from being symmetric around the mean, as for alanine (Fig. 2.a) and phytic acid (Fig. 2.b), to being skewed as in the case of total glycitein (Fig. 2.c) and vitamin E (Fig. 2.d). Moderate tails to the left or right are present also for some of the remaining endpoints included as supplementary information (Appendix B). In general, soybean endpoints show high r coefficients and low

values (Table 2),

indicative of a good GF of the data. Skewedness and multiple peaks are sometimes present, but variability patterns reveal a widespread bell-shape distribution (23 endpoints had r ≥ 0.98) and absolute data values are well captured (24 endpoints had

≤ 0.2) showing high degree of similarity between PDFs and the respective

GFs curves (Table 2 and Appendix B). In some instances, ranges reported by OECD14 capture almost completely the spread of data values, as in the case of alanine (Fig. 2.a), whereas in other instances they capture only partially the data dispersion, as in the case of phytic acid (Fig. 2.b). Similarly to what we observed in the case of maize, also in the case of soybean ranges reported by OECD lack consistency among them. For example, in the case of phytic acid (Fig. 2.b) the NFRI-NARO 2011 and ILSI 2010 ranges cover a large amount of the empirical data-spread, whereas Liener 1994 captures only a fraction around the mean. Total glycitein and vitamin E are among the endpoints with the worst GF in terms of both r and

values. Even though Pearson coefficients are not extremely low, the

high relative errors (Table 2) reflect the poor GF of the underlying distributions, which

11 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 12 of 28

can be visualized in Fig. 2.c and Fig.2.d, where the peaks of the GF curves do not align with the peaks of the respective PDFs. For both endpoints, the PDF is rightskewed, and a more suitable description of the data is provided by a log-normal GF curve (Fig.2.c,d). Also for these two endpoints, ranges reported by OECD fail to capture the empirical dispersion of data values. Interestingly, range NFRI-NARO 2011 for vitamin E is shifted with respect to the PDFs peak, implying that data values with the largest probability of occurrence are not captured by this range.

Discussion

The characterization of endpoint-specific compositional variability presented in this study for maize and soybean allows exploring natural variability of the compositional endpoints recommended by OECD. The majority of the empirical distribution curves show a good Gaussian-fitting on their natural scale, for both maize and soybean. Nevertheless, some depart from normality showing a skewed or bimodal distribution, documenting the existence of endpoint-specific variability patterns, even though the possibility that such departures could be artefacts due to sample size limitations cannot be excluded, especially in cases of multi-modal distributions. In line with standard and well-established practices for data analysis19, we wish to reiterate the importance

of

the

a

priori

verification

of

distributional

assumptions,

as

recommended10,11,12 in comparative assessment when analyses of variance or other statistical tests relying on distributional assumptions are applied.

Certainly, the endpoint-specific variability captured in this study reflects only a portion of the natural variability, which is and remains unknown because of the very large number of varieties present on the market and the indefinite number of conditions under which are grown. Yet, this study captures a biologically relevant 12 ACS Paragon Plus Environment

Page 13 of 28

Journal of Agricultural and Food Chemistry

fraction of such variability, i.e. the fraction to which consumers have been exposed and as such accepted as safe, providing a relevant bench-mark for the risk assessment process maximising consumer protection. Another key aspect in risk assessment is the comparison between the information that the analysis of variability-distribution curves and the spread of a dataset through the estimation of its range can provide. Such a comparison bears a broader relevance encompassing the risk assessment frame, which has been extensively addressed in the pertinent literature contrasting summary statistics and statistical hypothesis testing approaches19. As already mentioned, the range magnitude is directly determined by the most extreme values, regardless of their likelihood of occurrence. On the contrary, two-dimensional distribution-profiles assign probability of occurrence to each data point, and allow the use of standard deviation and reference intervals to define the region within which most data values occur. In a risk assessment frame, where the decision making-process is mainly driven by probabilistic considerations, understanding the shape of PDF curves is a priority that cannot be overlooked as it provides insights into the likelihood of occurrence of any given value. As shown in this study, ranges reported by OECD for maize and soybean compositional endpoints only occasionally overlapped with the margins of the observed PDF: many times ranges are either too narrow or too broad. In both cases they fail to capture the one-dimensional spread of the empirical datasets. The intrinsic limitations of ranges become even clearer when compared with the 95% reference intervals of GF curves, always resulting in a mismatch. In addition, it is interesting to observe that often the ranges reported for an endpoint show lack of consistency among themselves. This inconsistency is not unexpected as data 13 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 14 of 28

sources are highly heterogeneous, and, more generally, the range is by nature greatly affected by extreme values. But it raises concerns regarding the risk of inconsistency of interpretation of results: potentially, depending upon the range selected, conclusions for the same endpoint could be quite different. Based on all these considerations, the use of ranges in risk assessment should be discouraged, whereas approaches taking into account distributional properties should be encouraged, as they provide the necessary probabilistic frame to evaluate likelihood of occurrence of specific outcomes. In line with this, EFSA approach is based on the establishment of equivalence limits, which are defined as an appropriately chosen interval (2.5-97.5 percentile) of the distribution associated to reference non-GM commercial varieties10. This approach provides a frame suitable to calibrate risk assessment evaluation criteria on observed frequency patterns and to take into account natural variability. Nevertheless, more research is needed in this direction: a deeper and broader understanding of PDF curves underling the natural variability of compositional endpoints, also in other crops, is necessary to ensure a reliable framework to incorporate natural variability estimates in the risk assessment process. EFSA is at the fore front of this issue, but further international debate and consensus is needed to improve harmonization and reliability of risk assessment approaches around Codex Alimentarius principles. Occasionally, EFSA approach has been criticised22 as natural variability estimates obtained from equivalence testing might underestimate natural variability. As explained and documented by the experience gained in the last years, this is indeed not only possible but even likely. However, in a risk assessment frame consumer protection is the priority and the risk that must be minimised is that of overestimating natural variability, in order to reduce the probability of wrongly interpreting potentially relevant findings as the expression 14 ACS Paragon Plus Environment

Page 15 of 28

Journal of Agricultural and Food Chemistry

of natural variability. This is indeed possible with equivalence testing. The aggregate analysis presented here explores new possibilities to take into account natural variability within the frame of Codex risk assessment principles and it constitutes an open invitation to a fruitful and informed scientific dialog on how the application of equivalence testing to safety assessment can be further improved and refined. A solid characterization of compositional variability is also an essential prerequisite to revisit and prioritise the number and nature of OECD endpoints recommended on a crop-by-crop basis, taking into account the experience gained so far: a wish already highlighted in the past by the scientific community15,23. In principle the optimal combination of compositional endpoints should provide an accurate indication of the crop-specific metabolism so that the probability of detecting any unintended effect is maximised. At the same time, it should minimise redundancy within datasets so that problems of multiplicity can be diminished or, even better, eliminated. Achieving this goal is difficult. Redundancy can be reduced either by selecting representative endpoints, for example on the basis of the degree of correlation present in the dataset, or by exploring the applicability of multivariate approaches. EFSA started to investigate this issue10, but more experience with multivariate tests of equivalence24,25,26 is needed. Understanding compositional endpoints variability patterns is a necessity for decision making during the process of risk assessment. This study is a concrete achievement in such respect, but we must go further. The creation of a database recording the compositional profile of commercial crop varieties on a crop-by-crop basis, with clear entrance criteria to ensure comparability of information and minimisation of confounding effects15, could be an effective step to properly frame natural variability estimates in the risk assessment process, as required by Codex Alimentarius, OECD 15 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 16 of 28

and the EU legal frame. Such a database would need regular maintenance as crop variability estimates would require updates matching the speed of the development of new varieties with unique degrees and patterns of variability. We hope that this study fosters a fruitful dialog with the scientific community involved in the risk assessment of food and feed and, more generally, with all those interested in the empirical application of the comparative approach, which is already posing new scientific challenges. For example, the risk assessment of crops that have undergone intentional changes in their compositional profile, i.e. in one or more endogenous compounds normally used as benchmarks to assess overall substantial equivalence to a comparator1,2, raises conceptual difficulties. Indeed it challenges the paradigm of Codex Alimentarius risk assessment principle, rooted in a clear distinction between intended and unintended changes. In these cases, the comparative approach in its traditional delineation is confronted with the difficulties of identifying suitable comparators. Also under these circumstances, understanding compositional endpoints variability patterns could proof itself a necessary asset to explore alternatives.

Appendix A and appendix B. Supplementary data

Supplementary data related to this article can be found at http//xxxxx.

Abbreviation Used

EFSA: European Food Safety Authority; GF: Gaussian-fitting; GM: genetically modified; KDE: kernel density estimate; LOQ: limit of quantification; OECD: Organisation for economic co-operation and development; PDF: probability density function; RI: reference interval. 16 ACS Paragon Plus Environment

Page 17 of 28

Journal of Agricultural and Food Chemistry

Acknowledgements

The authors are grateful to Niccolò Franceschi for technical assistance in compiling initial data and to Elisabeth Waigmann for inspiring comments and discussions.

Disclaimer

The authors are employed by the European Food Safety Authority (EFSA). The positions and opinions presented in this article are those of the authors alone and do not necessarily represent the views or scientific works of EFSA. This manuscript does not disclose any confidential information. Conflict of interest

The authors declare that they have no conflict of interest.

References

1. Codex Alimentarius. Codex principles and guidelines on foods derived from biotechnology Codex Alimentarius Commission Joint FAO/WHO Food Standards Programme, Food and Agriculture Organization, Rome, 2003. 2. Codex Alimentarius. Foods derived from modern biotechnology. 2nd. Codex Alimentarius Commission Rome, 2009. 3. ADAS UK Ltd. & Rothamsted. Research Review of the strategies for the comprehensive food and feed safety and nutritional assessment of GM plants per se. EFSA Supporting Publ. 2013, 10, EN-480. 4. OECD. Safety Evaluation of Foods Produced by Modern Biotechnology:  Concepts

and

Principles;

Organization

of

Economic

Co-operation

and

Development:  Paris, 1993.

17 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 18 of 28

5. OECD. Consensus documents: work on the safety of novel foods and feeds: Plants

(http://www.oecd.org/chemicalsafety/biotrack/consensus-document-for-

work-on-safety-novel-and-foods-feeds-plants.htm) (accessed October 26, 2017). 6. Institute of Medicine and National Research Council. Safety of Genetically Engineered Foods: Approaches to Assessing Unintended Health Effects. Washington, DC: The National Academies Press, 2004. 7. Food Standards Australia New Zealand. Food Standards Australia New Zealand Application Handbook. 1 March, 2016. 8. Indian Council of Medical Research. Guidelines for the Safety Assessment of Foods Derived from Genetically Engineered Plants. New Delhi. 2008. 9. The Food Safety Commission (Japan). Standards for the Safety Assessment of Genetically Modified Foods (Seed Plants). Final decision 29 January, 2004. 10. EFSA

Panel

on

Genetically

Modified

Organisms

(GMO).

Statistical

considerations for the safety evaluation of GMOs. EFSA J. 2010, 8, 1250. 11. EFSA Panel on Genetically Modified Organisms (GMO). Guidance for risk assessment of food and feed from genetically modified plants. EFSA J. 2011, 9, 2150. 12. Commission Regulation (EU) No 503/2013 of 3 April 2013 on applications for authorisation of genetically modified food and feed in accordance with Regulation (EC) No 1829/2003 of the European Parliament and of the Council and amending Commission Regulations (EC) No 641/2004 and (EC) No 1981/2006. (OJ L 157, 08.06.2013, p.1) 13. OECD. Consensus document on compositional considerations for new varieties of maize (Zae mays): Key food and feed nutrients, anti-nutrients and secondary

18 ACS Paragon Plus Environment

Page 19 of 28

Journal of Agricultural and Food Chemistry

plant metabolites, ENV/JM/MONO (2002)25 (No.6), Organisation of Economic Co-operation and Development: Paris, 2002. 14. OECD. Revised consensus document on compositional consideration for new varieties of soybean [Glycine max (L.) Merr.]: Key food and feed nutrients, antinutrients,

toxicants

and

allergen,

ENV/JM/MONO

(2012)24

(No.

25),

Organisation of Economic Co-operation and Development: Paris, 2012. 15. Fernandez, A. and Paoletti, C. Unintended effects in GM food/feed safety: a way forward. Trends biotechnol. 2018, 36, 5−8. 16. Breeze, M.L.; Leyva-Guerrero, E.; Yeaman, G.R; Dudin, Y.; Akel, R.; Brune, P.; Claussen, F.; Dharmasri, C.; Goldbacj, J.; Guo, R.; Maxwell, C.; Privalle, L.; Rogers, H.; Liu, K.; Shan, G.. Validation of a method for quantitation of Soybean Lectin in commercial varieties. J Am Oil Chem Soc. 2015, 92, 1085. 17. Scott, D. W. Multivariate Density Estimation. Theory, Practice and Visualization. Wiley: New York, 1992. 18.Deng, H.; Wickham, H. Density estimation in R. Electronic publication. 2011. 19.Sokal, R.; Rohlf, F. Biometry: the principles and practice of statistics in biological research, 2nd ed. 2012. 20.Ricci, V. Fitting Distributions with R. Contributed documentation available on CRAN.

URL

(http://CRAN.R-project.org/doc/contrib/Ricci-distributions-en.pdf)

2005. 21.R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3900051-07-0. URL (http://www.R-project.org). 22.Hong, B.; Fisher, T.L.; Sult, T.S.; Maxwell, C.A.; Mickelson, J.A.; Kishino, H.; Locke, M.E.H. Model-Based Tolerance Intervals Derived from Cumulative

19 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 20 of 28

Historical Composition Data: Application for Substantial Equivalence Assessment of a Genetically Modified Crop. , J. Agric. Food Chem. 2014, 62, 9916−9926. 23. Ladics, G.S.; Bartholomaeus, A.; Bregitzer, P.; Doerrer, N.G.; Gray, A.; Holzhauser, T.; Jordan, M.; Keese, P.; Kok, E.; Macdonald, P.; Parrott, W.; Privalle, L.; Raybould, A.; Rhee, S.Y.; Rice, E.; Romeis, J.; Vaughn, J.; Wal, J.M.; Glenn, K. Genetic basis and detection of unintended effects in genetically modified crop plants. Transgenic Res. 2015, 24, 587−603. 24. Brown,

L.D.;

Casella,

G.;

Hwang,

J.T.G.

Optimal

confidence

sets,

bioequivalence, and the limaçon of Pascal, Journ. Americ. Statist. Assoc. 1995, 90(431), 880-889. 25. Munk. A.; Pfluger, R. 1 − α equivariant confidence rules for convex alternatives are α 2 - level tests – with applications to the multivariate assessment of bioequivalence. Journ. Americ. Statist. Assoc. 1999, 94, 1311-1320. 26. Beckmann, M; Enot, D.P.; Overy, D.P.; Draper, J. Representation, Comparison, and Interpretation of Metabolome Fingerprint Data for Total Composition Analysis and Quality Trait Investigation in Potato Cultivars. J. Agric. Food Chem. 2007, 55 (9), 3444-3451.

Figure caption

Figure 1: Maize PDF: Probability density function (solid black line), GF: Gaussian Fitting (dashed grey line), R.I.: Reference Interval 1.a- Cysteine: 95% R.I. (solid grey stems) and OECD ranges (coloured stems)

20 ACS Paragon Plus Environment

Page 21 of 28

Journal of Agricultural and Food Chemistry

1.b- Potassium: 95% R.I. (solid grey stems) and OECD ranges (coloured stems) 1.c- Vitamin A: 95% R.I. calculated on log scale and back-transformed on original scale (solid blue stems) and OECD ranges (coloured stems) 1.d- p-Coumaric acid: 95% R.I. calculated on log scale and backtransformed on original scale (solid blue stems) and OECD ranges (coloured stems) Figure 2: Soybean PDF: Probability density function (solid black line), GF: Gaussian Fitting (dashed grey line), R.I.: Reference Interval 2.a- Alanine: 95% R.I. (solid grey stems) and OECD ranges (coloured stems) 2.b- Phytic Acid: 95% R.I. (solid grey stems) and OECD ranges (coloured stems) 2.c- Total Glycitein: 95% R.I. calculated on log scale and backtransformed on original scale (solid blue stems) and OECD ranges (coloured stems) 2.d- Vitamin E: 95% R.I. calculated on log scale and back-transformed on original scale (solid blue stems) and OECD ranges (coloured stems)

21 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 22 of 28

Tables

Table 1. Maize and Soybean compositional endpoints, with respective number of data values and unit of measurement (UM): DW: Dry Weight, FA: Fatty Acid, FW: Fresh Weight, Tot: Total.

Maize Endpoints (UM)

N° data values

Amino Acids Essential (%DW) Arginine Cysteine Glycine Histidine Isoleucine Leucine 1997 Lysine Methionine Phenylalanine Threonine Tryptophan Valine Amino Acids Non-essential (%DW) Alanine Aspartic Acid (incl. Asparagine) Glutamic Acid (incl. Glutamine) 1997 Proline Serine Tyrosine Anti-nutrients (%DW) Ferulic acid 1997 p-Coumaric acid Phytic acid 1996 Raffinose 1934 Fatty Acids (% of tot. FA) Linoleic acid Linolenic acid 1237 Oleic acid Palmitic acid

Soybean Endpoints (UM)

N° data values

Amino Acids Essential (%DW) Arginine Cysteine Glycine Histidine Isoleucine Leucine 1655 Lysine Methionine Phenylalanine Threonine Tryptophan Valine Amino Acids Non-essential (%DW) Alanine Aspartic Acid (incl. Asparagine) Glutamic Acid (incl. Glutamine) 1655 Proline Serine Tyrosine Anti-nutrients (%DW) Phytic Acid 2038 Raffinose 1822 Stachyose Fatty Acids (% of tot. FA) Arachidic acid Linoleic acid Linolenic acid 2014 Oleic acid Palmitic acid 22

ACS Paragon Plus Environment

Page 23 of 28

Journal of Agricultural and Food Chemistry

Stearic acid Minerals (mg/100g DW) Calcium Copper Iron Magnesium Phosphorus Potassium Zinc Proximates Acid detergent fibre (%DW) Ash (%DW) Carbohydrates (%DW) Moisture (% of Tot. FA) Neut. det. fibre (%DW) Protein (%DW) Tot. dietary fibre (%DW) Tot. fat (%DW) Vitamins (ppm DW) A (β-Carotene) B1 (Thiamin) B2 (Riboflavin) B6 (Pyridoxine) E (Tocopherols) Folate, tot. Niacin (Nicotinic Acid)

1891 1993

1997

1997

Stearic acid Proximates Acid detergent fibre (%DW) Ash (%DW) Carbohydrates (%DW) Crude Fat (%DW) Crude Protein (%DW) Moisture (%FW) Neut. det. fibre (%DW) Vitamins (mg/100g DW) E (α-tocopherol) K1 (Phylloquinone) Isoflavones (ppm DW) Tot. Daidzein Tot. Genistein Tot. Glycitein

2134 2034 2134

2123 1164

1822 1798

1784 1997 1980 1993 1784 1995 1970 1995 1996

Table 2. Maize and Soybean: Pearson correlation coefficient r and relative error of compositional endpoints.

Maize Endpoint Amino Acids Essential Arginine Cysteine Glycine Histidine Isoleucine Leucine Lysine Methionine

ρ2

r 0.99 1.00 1.00 1.00 0.98 0.98 0.98 0.99

0.12 0.03 0.07 0.04 0.16 0.13 0.15 0.09

Soybean Endpoint Amino Acids Essential Arginine Cysteine Glycine Histidine Isoleucine Leucine Lysine Methionine

r

ρ2

0.98 1.00 0.99 0.95 0.99 0.99 0.98 0.99

0.17 0.08 0.15 0.26 0.10 0.12 0.18 0.19 23

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Phenylalanine 0.99 Threonine 1.00 Tryptophan 1.00 Valine 0.99 Amino Acids Non-essential Alanine 0.99 Aspartic Acid 0.99 Glutamic Acid 0.99 Proline 0.99 Serine 1.00 Tyrosine 1.00 Anti-nutrients Ferulic acid 0.98 0.97 p-Coumaric acid log: 0.99 Phytic acid 1.00 Raffinose 0.98 Fatty Acids Linoleic acid 0.85 Linolenic acid 0.62 Oleic acid 0.96 Palmitic acid 0.79 Stearic acid 0.99 Minerals Calcium 0.96 Copper 0.82 Iron 0.99 Magnesium 0.99 Phosphorus 0.99

0.11 0.05 0.07 0.10

Page 24 of 28

0.98 0.99 1.00 0.72

0.13 0.08 0.07 0.85

Phenylalanine 1.00 Threonine 0.97 Tryptophan 0.95 Valine 0.99 Amino Acids Non-essential Alanine 0.99 Aspartic Acid 0.97 Glutamic Acid 0.99 Proline 0.99 Serine 0.99 Tyrosine 0.91 Anti-nutrients Phytic Acid 0.99 Raffinose 0.94 Stachyose 0.96 Fatty Acids Arachidic acid 0.91 Linoleic acid 0.98 Linolenic acid 0.95 Oleic acid 0.98 Palmitic acid 0.99 Stearic acid 0.96 Proximates Acid detergent fibre 0.98 Ash 0.99 Carbohydrates 0.93 Crude Fat 0.93 Crude Protein 0.99 Moisture 0.82 Neutral detergent 0.99 fibre Vitamins 0.84 E (α-tocopherol) log: 0.99 K1 (Phylloquinone) 0.95 Isoflavones Total Daidzein 0.99

0.97

0.19

Total Genistein

0.98

0.17

0.98 0.97 0.99

0.14 0.18 0.12

Total Glycitein

0.96 log: 0.99

0.24 log: 0.13

0.12 0.13 0.09 0.08 0.07 0.06 0.18 0.25 log: 0.09 0.07 0.13 0.31 0.42 0.19 0.30 0.11 0.28 0.97 0.14 0.10 0.12

Potassium

1.00

0.08

Zinc Proximates Acid detergent fibre Ash Carbohydrates Moisture Neutral detergent fibre Protein Total dietary fibre Total fat

0.99

0.11

0.08 0.21 0.22 0.09 0.10 0.21 0.11 0.16 0.08 0.30 0.08 0.31 0.27 0.34 0.16 0.18 0.15 0.10 0.21 0.15 0.14 0.37 0.33 0.13 0.59 0.09 0.54 log: 0.09 0.24 0.12

24 ACS Paragon Plus Environment

Page 25 of 28

Journal of Agricultural and Food Chemistry

Vitamins 0.49 log: 0.96 B1 (Thiamin) 0.98 B2 (Riboflavin) 0.95 B6 (Pyridoxine) 0.95 E (Tocopherols) 0.97 Folate. total 0.75 Niacin (Nicotinic Acid) 0.98 A (β-Carotene)

2.64 log: 0.38 0.11 0.30 0.27 0.24 0.81 0.14

25 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 26 of 28

Figure graphics 1.a-

1.b-

1.c-

26 ACS Paragon Plus Environment

Page 27 of 28

Journal of Agricultural and Food Chemistry

1.d-

2.a-

2.b-

27 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 28 of 28

2.c-

2.d-

28 ACS Paragon Plus Environment