Using Machine Learning To Predict the Self-Assembled

Feb 8, 2019 - The ANN model appeared to be more accurate than the MLR model in predicting mesophases. We then used the obtained ANN models to ...
7 downloads 0 Views 4MB Size
Subscriber access provided by University of Massachusetts Amherst Libraries

Article

Using Machine Learning to Predict the Self-Assembled Nanostructures of Monoolein and Phytantriol as a Function of Temperature and Fatty Acid Additives for Effective Lipid-Based Delivery Systems Tu C. Le, and Nhiem Tran ACS Appl. Nano Mater., Just Accepted Manuscript • DOI: 10.1021/acsanm.9b00075 • Publication Date (Web): 08 Feb 2019 Downloaded from http://pubs.acs.org on February 8, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Using Machine Learning to Predict the Self-Assembled Nanostructures of Monoolein and Phytantriol as a Function of Temperature and Fatty Acid Additives for Effective Lipid-Based Delivery Systems

Tu C. Le1,*, Nhiem Tran2

1

School of Engineering, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia

2

School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia

*

Email: [email protected]

Abstract Lyotropic liquid crystalline lipid nanomaterials have shown promise as delivery vehicles for small therapeutic drugs, protein, peptides, and in vivo imaging contrast agents. In order to design effective lipid based delivery systems, it is important to understand and be able to predict their self-assembly processes. In this study, we utilised a machine learning approach to study the phase behaviour of a nanoparticulate system consisting of a base lipid, monoolein or phytantriol, and varied concentration of saturated and unsaturated fatty acids. The experimental data sets acquired by high throughput characterisation techniques were used to train the “machine” using two separate models, i.e. multiple linear regression (MLR) and Bayesian regularized artificial neural networks (ANNs). The models were accurate (>70%) in predicting the phase behaviour for data used to train the neural networks. The ANN model appeared to be more accurate than the MLR model in predicting mesophases. We then used the obtained ANN models to interpolate the phase behaviour of various nanoparticles at temperatures not yet tested. Compared to the experimental result, the

1 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

prediction of phase behaviour was interpolated with high accuracy, ranging from 66% to 96% for the different phases. The models were capable of interpolating data for the same fatty acids at temperatures that were not yet tested, as well as extrapolating data for new fatty acid structures. We also studied quantitatively the contributions of various factors on the formation of different mesophases and elucidates rules that are useful for future design of advanced lipid systems for therapeutic delivery.

Keywords: Therapeutic delivery; self-assembled nanostructure; monoolein; phytantriol; fatty acids; machine learning; artificial neural networks

1.

Introduction

Understanding and predicting phase behaviour of self-assembled lyotropic liquid crystalline materials are keys to unlock their potential as nano carriers for drugs, proteins, peptides, and in vivo imaging contrast agents.1-3 So far several lipid self-assembled nanostructures, including a lamellar, several inverse bicontinuous cubic, and an inverse hexagonal phase, have been studied as drug carriers with several other mesophases such as an inverse micellar cubic and a sponge phase starting to receive more attention.4 Previous studies have shown that the self-assembled nanostructures directly influence the drug release rate and their cytotoxicity. For example, Lee et al. demonstrated that a bicontinuous cubic phase with three dimensional porous nanostructures exhibited a burst release of glucose compared to the two dimensional inverse hexagonal phase.5 An inverse micellar cubic phase with confined water compartments within inverse micelles is expected to release water soluble drug at an even slower rate than an inverse hexagonal phase.6 Our previous study suggested that the nanostructures also influence in vitro cytotoxicity of these nanoparticles against mammalian cells.7

2 ACS Paragon Plus Environment

Page 2 of 38

Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

The study of phase behaviour of lyotropic liquid crystals has been facilitated by recent advances in high throughput formulation and characterisation methods, particularly with the help of synchrotron scattering techniques.8-14 These approaches are crucial for the development of complex multicomponent lipid systems, which are designed to satisfy increasingly sophisticated demands of specific biomedical applications.1,

15-16

Using these

approaches, large multi-component lipid nanoparticle libraries with information on mesophase and lattice parameter were established. For example, we previously studied the phase behaviour of lipid nanoparticles consisting of a base lipid (e.g. monoolein, phytantriol, or monopalmitolein), a fatty acid (FA), and a polymeric stabiliser.12-14 Subsequently a library of more than 700 lipid nanoparticles with unique compositions was analysed, from which valuable information regarding the influence of fatty acid concentration, chain length, degree of unsaturation, location of the double bonds, and temperature on the lipid nanostructures were revealed. However, it is still challenging to predict the phase behaviour of lyotropic liquid crystals. Traditionally, the intrinsic shape of lipid molecules has been used to roughly estimate the global geometry of the different nanostructures. The critical packing parameter (cpp),17-19 which is the ratio between the lipid tail volume and the product of headgroup area and the chain length, is regularly used to provide a qualitative explanation for the self-assembly behaviour. Additionally, more sophisticated methods based on self-consistent field theory simulations have been proposed to describe self-assembly property of lyotropic liquid crystals.20 Nevertheless, making accurate predictions of the complex phase behaviour of lyotropic liquid crystals by any computational means faces high degree of difficulty. Recently, machine learning has proven to be a useful tool to predict diverse materials properties including the phase behaviour of a complex lipid system.21-27 This, however, requires experimental inputs so the ‘machine’ can be trained on before being used to make

3 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

new predictions. Therefore, combining high-throughput screening experiments with machine learning algorithms is a logical step to analyse and predict phase behaviour of self-assembled materials. Previously, we have used the techniques to study these materials under the influence of incorporated drugs,22 protein crystallization screens,23-24 temperature and saturated FAs additives.21, 27 In the previous study, we successfully built machine learning models to predict the complex phase behaviour of lipid based systems incorporated with ten different saturated FAs. However, these FAs only differ in the length of the hydrocarbon chains. In this study, we aim to construct a wider platform for robust modelling of the complex phase behaviour of lipid based nanomaterials under the effect of saturated as well as unsaturated FAs at various temperatures. Twenty different FAs were included in the study. They have different chain lengths as well as different levels of chain saturation (the total number of double bond in the hydrocarbon chain of up to 6), cis versus trans isomerism, and different location of the double bonds in the hydrocarbon chain. The chemical structures of the two lipids and thirty FAs are illustrated in Figure 1 and the list of FAs is presented in Table 1. We also studied quantitatively the contributions of these various factors on the formation of different mesophases. The work allows the establishment of reliable models with large applicable domain that take into account diverse FA structures and elucidates rules that are useful for future design of advanced lipid systems for therapeutic delivery.

4 ACS Paragon Plus Environment

Page 4 of 38

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 1. Chemical structure of the two lipids, monoolein and phytantriol, and the twenty unsaturated and ten saturated fatty acids studied. The blue arrows show the location of the double bond in the hydrocarbon chains.

5 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 38

Table 1. List of twenty unsaturated and ten saturated fatty acids that were added to MO or PHYT nanoparticles and their chemical structures. Type

Fatty acid

Common name

Structure*

Unsaturated

trans-9-hexadecenoic acid

Palmitelaidic acid

trans-16:1, n - 7

trans-9-octadecenoic acid

Elaidic acid

trans-18:1, n - 9

trans-11-octadecenoic acid

trans vaccenic acid

trans-18:1, n - 7

trans-11-eicosenoic acid

trans gondoic acid

trans-20:1, n - 9

trans-13-docosenoic acid

Brassidic acid

trans-22:1, n - 9

trans-9,12-octadecadienoic acid

Linoelaidic acid

trans-18:2, n - 6

cis-9-hexadecenoi acid

Palmitoleic acid

cis-16:1, n - 7

cis-9-trans-11-octadecadienoic acid

Rumenic acid

cis,trans-18:2, n - 7

cis-11-octadecenoic acid

Vaccenic acid

cis-18:1, n - 7

cis-9-octadecenoic acid

Oleic acid

cis-18:1, n - 9

cis-11-eicosenoic acid

Gondoic acid

cis-20:1, n - 9

cis-13-docosenoic acid

Erucic acid

cis-22:1, n - 9

cis-15-tetracosenoic acid

Nervonic acid

cis-24:1, n - 9

cis-9,12-octadecadienoic acid

Linoleic acid

cis-18:2, n - 6

cis-9,12,15-octadecatrienoic acid

-linolenic acid

cis-18:3, n - 3

cis-6,9,12-octadecatrienoic acid

-linolenic acid

cis-18:3, n - 6

cis-11,14,17-eicosatrienoic acid

ETA

cis-20:3, n - 3

cis-5,8,11,14-eicosatetraenoic acid

Arachidonic acid

cis-20:4, n–6

cis-5,8,11,14,17-eicosapentaenoic acid

EPA

cis-20:5, n–3

cis-4,7,10,13,16,19-docosahexaneoic acid

DHA

cis-22:6, n - 3

n-octanoic acid

Caprylic acid

8:0

Saturated

6 ACS Paragon Plus Environment

Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

*Note:

n-nonanoic acid

Pelargonic acid

9:0

n-decanoic acid

Capric acid

10:0

n-undecanoic acid

Undecylic acid

11:0

n-dodecanoic acid

Lauric acid

12:0

n-tridecanoic acid

Tridecylic acid

13:0

n-tetradecanoic acid

Myristic acid

14:0

n-pentadecanoic acid

Pentadecylic acid

15:0

n-hexadecanoic acid

Palmitic acid

16:0

n-heptadecanoic acid

Margaric acid

17:0

Unsaturated FA structure is described as “cis/trans-a:b, n – c”, where a is the number

of carbon atoms in the FA molecule, b is the number of double bonds, and c is the location of the first unsaturated carbon atom. Saturated FA structure is described as a:0 where a is number of carbon atoms in the molecule and 0 indicates that there are no double bonds.

2.

Methodology

Data A combination of different data sets resulting from high throughput experimentation previously reported12-13 were used to train the machine learning models. These data sets were developed to examine the influence of fatty acids (FAs), ten saturated and twenty unsaturated, on the mesophases of two lipids, monoolein (MO) and phytantriol (PHYT). The screening for saturated FAs were performed at five temperatures ((25, 37, 40, 45, 50C) using twelve FA:lipid molar ratios (r). Experiments for unsaturated FAs were carried out at slightly different temperature range (25, 37, 60C) with eleven FA:lipid weight ratios. This combination results in a data set consisting of 2420 observations for 680 unique nanoparticles formulated. The addition of increasing amount of fatty acid to MO nanoparticles resulted in a 7 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 38

phase transition from an inverse bicontinuous primitive cubic (QIIP) to an inverse diamond cubic (QIID), inverse hexagonal (HII), and microemulsion (L2). Certain unsaturated fatty acid such as oleic acid induced a phase transition from HII to a micellar cubic (I2) phase. Other mesophases such as lamellar crystal (Lc) were also observed when the fatty acid with the high melting point content was high. Coexistence of several mesophases including fluid lamellar (Lalpha), microemulsion (L2), and sponge phase (L3) were noted in MO nanoparticles added with a highly unsaturated fatty acid such as DHA or EPA.28 These nanoparticles were believed to be phase separated. The phase behaviour of PHYT nanoparticles in the presence of fatty acids appeared to be much simpler. The addition of increasing amount of a fatty acid to PHYT nanoparticles also resulted in a phase transition with a sequence from QIID, to HII, and L2. An Lc phase was also observed when the fatty acid with the high melting point content was high. These mesophases were identified using small-angle X-ray scattering at the Australian Synchrotron. Details on the structure of mesophases and the phase transition behaviour of various lyotropic liquid crystalline materials have been reported elsewhere.

2

Herein, we use machine learning algorithms to build seven series of models predicting the formation of seven individual mesophases, QIIP, QIID, HII, L2, I2, Lc and Lalpha+L2+L3 combination. Each model is built using a data set comprising 2420 data points.

Descriptors A small pool of variables was used as inputs for the machine learning models. These variables captured the differences between samples and provided a unique set of variable values for each data entry. Since there were only two types of lipid molecules, we used an indicator variable, LipidIndicator, to differentiate between MO and PHYT samples. LipidIndicator equals 0 when the lipid type is MO and 1 when that is PHYT. The FA:lipid molar ratio and temperature were also employed as model inputs. Another ten variables were

8 ACS Paragon Plus Environment

Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

included to describe the various FA structures. The full list of input variables and their description are provided in Table 2.

Table 2. List of input variables used in the models. Input variable

Description

Type

LipidIndicator

0 for MO and 1 for PHYT

Integer

MolarRatio

FA:lipid molar ratio

Continuous

Temp

Temperature

Integer

a

Total number of carbon atoms in FA hydrocarbon chain

Integer

b

Total number of double bonds in FA hydrocarbon chain

Integer

c

Location of the first unsaturated carbon atom in FA

Integer

hydrocarbon chain totalCis

Total number of cis double bonds in FA hydrocarbon chain

Integer

totalTrans

Total number of trans double bonds in FA hydrocarbon

Integer

chain Rgyr

Molecular radius of gyration of FA molecule

Continuous

Ui

Unsaturation index of FA molecule

Continuous

Hy

Hydrophylic factor of FA molecule

Continuous

AMR

Ghose-Crippen molar refractivity of FA molecule

Continuous

ALOGP

Ghose-Crippen octanol-water partition coefficient (logP) of

Continuous

FA molecule

Machine learning models Multiple linear regression (MLR) and Bayesian regularized artificial neural networks (ANNs) 29-30

were employed to build separate models predicting the formation of different individual 9 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mesophases. The output of each model is an indicator that had the value of 1 if the mesophase was formed or 0 if not. The neural networks employed consisted of an input, hidden and output layer. The number of nodes in the input layer was equal to the number of input variables listed in Table 2 while the output layer had only one node corresponding to the mesophase existence indicator. Two nodes in the hidden layer was found to be sufficient and increasing the number of nodes did not show improvement in the model predictability. Previous papers also reported that the Bayesian regularization automatically controlled the complexity of the models to optimize the predictability and increasing this number was unnecessary.31-32 The entire data sets of 2420 entries compiled from recently published work12-13 were used to train the MLR models or the ANNs and the obtained models were used to predict new data. The mesophases of nanoparticles incorporated with unsaturated FAs at 40, 45 and 50C as well as saturated FAs at 60C have not been studied or reported previously and therefore were not used to train the models. The predictions for these samples (a total of 1540 entries) could produce a complete phase behaviour profile for the entire temperature range considered which includes 25, 37, 40, 45, 50 and 60C. Vadidation experiments were performed subsequently to assess the accuracy of the models in predicting the phase behaviour for some of the new samples. The experimental methodology is given in the Supporting Information and the obtained data is presented in the Supporting Information spreadsheet. We also attempted to test the ability of machine learning models in predicting the mesophases of nanoparticles under the influence of new FA structures. For that purpose, we excluded all entries for four unsaturated FAs, trans gondoic, gondoic, -linolenic and EPA acids, as well as two saturated FAs, capric and myristic acids, from the training data. Entries for these FAs are only used to validate the performance of the obtained models. In this instance, the training and test data consisted of 1936 and 484 entries respectively. 10 ACS Paragon Plus Environment

Page 10 of 38

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

3.

Results and Discussion

3.1.

Phase Prediction Results

Different multiple linear regression (MLR) models and Bayesian regularized artificial neural networks (ANNs) were constructed to predict each individual phase that the lipid materials adopted. The performance of these models for the training set of 2420 tested samples is summarized in Table 3. The number of effective weights of the MLR models was 14 (intercept and 13 input variables). Meanwhile, the maximum number of effective weights of the ANN models was 26. These values were very small compared to the total number of data points of 2420 so the models were clearly not overfitted. The accuracy of the prediction was at least 70%. For all mesophases, the nonlinear ANN models perform significantly better than the linear ones. As can be seen in Table 3, for the three major mesophases, QIID, QIIP, and HII, the use of ANNs improved the prediction accuracy by 15, 13, and 8 % respectively. For the three minor mesophases, I2, Lc and the Lalpha+L2+L3, the total prediction errors decreased by 30 to 50%. L2 was the only mesophase where the use of ANNs did not lead to better prediction. It must be noted that the nature of the data sets is imbalanced for many mesophases. The percentage of data points in which a particular phase was present or absent is very high. This means that even a model predicting that the phase was present or absent for all data points can still have high prediction accuracy. As shown in Table 3, the MLR models have high accuracy but the actual prediction of these MLR models for positive cases (the mesophase was present), as given in the Supporting Information spreadsheet, was not very accurate. Truth tables are better criteria for assessing the prediction results.

11 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 38

Table 3. Summary of the multiple linear regression (MLR) and Bayesian regularized artificial neural networks (ANNs) models predicting each individual of the seven phases: an inverse bicontinuous primitive cubic (QIIP), an inverse bicontinuous diamond cubic (QIID), an inverse hexagonal (HII), a micellar cubic (I2), microemulsion (L2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases, for the training data set of 2420 entries. Mesophase

Model type

Effective weights

Total prediction errors

Accuracy of model

QIIp

MLR

14

437

82%

ANN

25

120

95%

MLR

14

712

71%

ANN

25

340

86%

MLR

14

719

70%

ANN

26

534

78%

MLR

14

9

100%

ANN

19

6

100%

MLR

14

282

88%

ANN

14

281

88%

MLR

14

80

97%

ANN

24

42

98%

MLR

14

31

99%

ANN

26

15

99%

QIID

HII

I2

L2

Lc

Lalpha+L2+L3

Truth tables of major mesophases prediction The truth tables of the prediction results for the three major mesophases, inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), and inverse hexagonal (HII), are shown in Figure 2(a). These truth tables present the number of False Positive, True 12 ACS Paragon Plus Environment

Page 13 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Positive, False Negative, or True Negative predictions for each of the individual phases. False Positive means that the experiment indicated the specific mesophase was present but the model predicted that it was not formed. In contrast, False Negative means the experiment showed that the phase was absent but the model predicted its existence. On the other hand, True Positive means the phase formation was observed experimentally and predicted accurately. Meanwhile True Negative means the phase was absent and the prediction was also correct. As seen in Figure 2a which shows the truth tables for the three major meso phases, for the training data, both MLR and ANN models were highly accurate in picking up cases where the phases were absent. This is reflected in high numbers of True Negative. For the cases where the meso phases were present, the ANN models were significantly better than MLR ones in predicting the formation of the cubic phases (QIID and QIIP). The True Positive values of the ANN models are 653 for QIID and 375 for QIIP while these of the MLR models are very low (49 and 40 respectively). MLR models did not estimate well the emergence of the nanoparticle cubic phases. For the inverse hexagonal phase (HII), the performance of the ANN model was only slightly better than the MLR one. The True Positive and Negative values for the ANN model were 1160 and 726 whereas these values for the MLR model were 1021 and 680 respectively. Although the statistical results for the HII models were reasonable, the models did not pick up the change in the phase behaviour when the FA:lipid weight ratio varies. These models usually predicted the same behaviour (HII absent or HII present) for samples containing the same lipid and FA, at the same temperature. To improve the quality of the models for the HII phase, we added extra variables indicating the absence and presence of the other two major phases (QIID, and QIIP) to the input layer of ANNs and the MLR models. These indicators were based on experimental observations for the training data of 2420 entries and on prediction results for the test data of 1540 entries. We have previously shown

13 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

that this approach could create models with improved prediction power.22-23 Using this approach to predict the complete phase behaviour for new samples, models for the two major phases, QIIP and QIID, must be constructed first. These models only require standard input descriptors as listed in Table 2. They do not require other phase indicator descriptors. The limitation of this approach is that the prediction accuracy for the QIIP and QIID mesophases are the critical and will affect the model predictability for other phases. The MLR and ANN models were built using only one of the two add-on variables indicating the presence or absence of the QIIP and QIID mesophases as well as the combination of both. The truth tables of the ANN models when extra variables were added are presented in Figure 2b. The addition of the QIIP or QIID indicator led to a reduction in the prediction error by 228 and 220 respectively. The inclusion of both QIIP and QIID indicators resulted in an improved model with 267 less errors than the original HII model without any extra indicator variables. The accuracy of this model is 89%. The prediction results of this model for the HII phase, as well as those for the QIIP and QIID phases, for MO and PHYT at 25C, are illustrated in Figure 3. If the phase prediction was inaccurate for a specific sample, the corresponding square in Figure 3 was marked with ‘x’. As can be seen, the models could predict correctly the majority of three main mesophase formations. Errors usually occurred at the transition edges where different phases coexist or there was inconsistency among the phase behaviours at various temperatures. The occurrence of prediction errors at the phase boundaries suggests that the model predictions are robust. In fact, some ‘errors’ may be due to subtle changes in the experimental conditions near the sensitive phase boundary region or due to the assignment ambiguity in these regions. Furthermore, there are also multiple coexisting phases for some combinations of experimental parameters which could exacerbate assignment. In these cases, model predictions could be useful for identifying the exact mesophase formation with more quantitative evidence. 14 ACS Paragon Plus Environment

Page 14 of 38

Page 15 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 2. (a) Truth tables showing the prediction results using MLR and ANN, for the training set, for three major mesophases: inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), and inverse hexagonal (HII). (b) Truth tables for ANN models, for the training set, for the inverse hexagonal (HII) phase when adding extra variables indicating the absence and presence of the other two major phases (QIIP and QIID) to the input layer of ANNs. The tables show the number of True Negative, False Negative, True Positive and False Positive predictions (in clockwise order, starting from the top left value).

15 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Predicted phase diagrams for monoolein (top) and phytantriol (bottom) at 25C under the effect of different unsaturated FAs. These diagrams show the prediction results from the best ANN models for each of the three major mesophases: inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), and inverse hexagonal (HII). Samples that are marked with colour indicate that the mesophase was observed experimentally. Samples marked with ‘x’ indicate that the phases for these samples were predicted incorrectly. No diagram is shown for the inverse bicontinuous primitive cubic (QIIP) mesophase for PHYT as this phase was absent for all PHYT samples.

16 ACS Paragon Plus Environment

Page 16 of 38

Page 17 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Truth tables of minor mesophases prediction

Figure 4. Truth tables showing the prediction results, for the training set, for four minor mesophases: microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases when standard descriptors as shown in Table 2 were used (a) and when three additional variables are included (b).

Figure 4a presents the truth tables for the four minor mesophases: microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases, for the training data. As can be seen, the number of data points in which a particular phase was absent is very high. Although this indicates that a naive model which predicts that the phase is always absent can give very high accuracy, it also poses challenges in constructing predictive models that can detect the very small number of cases where the phase was present. As shown in Figure 4a, the MLR models could not pick up the existence of these minor phases, the False Negative and True Positive values were 0 for all cases. There was

17 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

one exception where the False Negative was 1 for the microemulsion (L2) phase. The ANN models performed similarly for the L2 phase and for the other three phases, they could predict the presence of the mesophases for about half of the cases. To improve the quality of the machine learning models in predicting these four minor mesophases, we used a similar approach as mentioned above, adding three extra variables indicating the absence and presence of the three major phases (QIIP , QIID and HII) to the input layer of ANNs and the MLR models. These indicators, again, were based on experimental observations for the training data of 2420 entries and on prediction results for the test data of 1540 entries. The MLR and ANN models were built using only one of these three add-on variables as well as the combination of all three. Although the addition of one add-on variable did not lead to an improvement in the model performance, the inclusion of all three variables resulted in remarkable improvement, particularly for models using ANNs. As seen in Figure 4b, True Positive value for the microemulsion L2 phase rose from 0 previously to 265 for the MLR model and 271 for the ANN one. The MLR models picked up 94% (265 out of 281) of cases where the L2 phase was present. Meanwhile the ANNs gave correct prediction for 96% of these positive cases. For the micellar cubic (I2), lamellar crystal (Lc) and the combination of Lalpha+L2+L3 phases, the addition of the major phase existence indicators to the input variables did not produce any effect on the performance of the MLR models. However, this led to a superior increase in accuracy for the ANNs. For the I2 phase, out of 9 positive observations, previously, the ANN model could predict correctly for only 3 cases. The new ANN model gave valid prediction for 8 cases, the accuracy in predicting for positive cases grew from 33% to 89%. For the lamellar crystal (Lc) and the Lalpha+L2+L3 combination, the phase behaviour could be predicted to an accuracy of 53 out of 80 (66%) and 26 out of 31 (84%) respectively.

18 ACS Paragon Plus Environment

Page 18 of 38

Page 19 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Contributions of descriptors to the formation of mesophases An analysis of the MLR coefficients can provide insights on the contributions of different descriptors to the formation of the mesophases. The coefficients for individual descriptors allow us to determine whether each descriptor inhibits or promotes a particular phase as well as provide an indication of how strong the effect is. A positive MLR coefficient for a descriptor indicates that the descriptor promotes the emergence of the mesophase. In contrast, a negative MLR coefficient means that the descriptor inhibits the mesophase formation. The higher the absolute value of the MLR coefficient for a descriptor, the larger its contribution is. Please note that to compare the contribution of descriptors whose range of values varies on different scales (for example the temperature ranges from 25 to 60, while the total number of carbon atoms in the FA hydrocarbon chain (a) varies from 16 to 22 and the total number of double bonds in the FA hydrocarbon chain (b) expands from 0 to 6), the data was normalized (scaled) so all descriptor values ranges from 0 to 1. Consequently, the MLR coefficients presented herein are those for the scaled descriptors values and hence they are labelled as “scaled MLR coefficients”. For the inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), micellar cubic (I2), lamellar crystal (Lc) and the combination of Lalpha+L2+L3 mesophases, the relationship between the descriptors and the mesophase formation is non-linear. Hence, the MLR models did not give high prediction accuracy and their coefficients for different descriptors cannot be used to assess the contributions to the formation of phases. In contrast, the performance of the MLR models for the inverse hexagonal (HII) and microemulsion (L2) mesophases is good, as shown in the previous section discussing the truth tables results. As a result, the coefficients for individual descriptors in these models can be used to evaluate the effect of the different factors and guide the design of future materials. For the purposes of reverse-engineering, we excluded

19 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the molecular radius of gyration, hydrophylic factor, molar refractivity and octanol-water partition coefficient and constructed MLR models using the remaining interpretable descriptors. These models performed slightly less accurate and their prediction results are given in the Supporting Information spreadsheet. The scaled MLR coefficients are illustrated in Figure 5 and the actual values of these coefficients are presented in Tables S-1 and S-2 of the Supporting Information. As can be seen in Figure 5, the mesophase indicators (QIIP, QIID, HII) have the strongest effects among the descriptors used. Without these indicator descriptors, the models could not predict well the absence or presence of the mesophases, as shown in the previous section. The coefficients of all the mesophase indicators are negative, indicating that the presence of one phase inhibits the formation of another. Among the remaining descriptors, molar ratio has the most significant contribution to the HII phase. Its negative coefficient suggests that high molar ratio inhibits the formation of this phase while low molar ratio promotes it. Molar ratio has the same effect on the L2 phase but its contribution is not as remarkable. It has been shown previously that the incorporation of FAs (eg. oleic acid) in cubic phases may trigger the formation of a HII phase at room temperature.33 Our study provides further insights on how this phase can be affected by the amount of FAs added in the system. The total number of double bonds in the FA hydrocarbon chain (b) is another major contributor but it has opposite effects on the formation of HII and L2. Unsaturated FAs with more double bonds in the hydrocarbon chain are more likely to have L2 mesophase and less likely to have HII mesophase compared to saturated FAs or unsaturated ones with less double bonds in the chain. The total number of carbon atoms in the FA hydrocarbon chain (a) is an important factor with positive influence on the formation of the HII phase. FAs with longer chains are likely to form this phase. However, this descriptor has very minimal effect on the L2 phase. The lipid indicator which has a value of 0 for MO and 1 for PHYT has a reasonably

20 ACS Paragon Plus Environment

Page 20 of 38

Page 21 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

significant effect. Nanoparticles that are PHYT based are more likely to form HII and L2 mesophases than those from MO. On the other hand, totalCis descriptor is a major contributor to the L2 mesophase emergence with a negative MLR coefficient. FAs with multiple cis double bonds are suggested to have strong tendency to form L2 phase. This descriptor has a similar effect on the formation of HII but the strength of this effect is much less. Other factors including temperature, total trans double bond and the location of the first unsaturated carbon atom in FA hydrocarbon chain (c) do not contribute substantially to the formation of the two mesophases.

21 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Contributions of different descriptors on the formation of the inverse hexagonal (HII) and microemulsion (L2) mesophases illustrated by the coefficients of the MLR models.

3.2.

Phase prediction for new data

We used the obtained ANN models to interpolate the phase behaviour of nanoparticles at temperatures not yet tested, specifically, the mesophases of nanoparticles incorporated with 22 ACS Paragon Plus Environment

Page 22 of 38

Page 23 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

unsaturated FAs at 40, 45 and 50C and saturated FAs at 60C. These models were constructed using the standard input descriptors listed in Table 2 as well as the add-on majormesophase indicator variables. To validate the performance of the models, we performed some subsequent experiments. These were done for unsaturated FA nanoparticles at 45 and 50C (880 entries). The prediction results for these samples (a total of 1540 entries) are provided in the Supporting Information spreadsheet and a summary of the ANN models are given in Table S-3. The truth tables in Figure 6 provide a detailed capture of the prediction results. It can be seen that the True Negative and True Positive values were high while the False Negative and False Positive were low for all seven mesophases. The models could predict well the phase behaviour profile for these 880 samples. The total number of prediction errors ranged from 0 to 264 and the accuracy varied from 70% to 100%.

Figure 6. Truth tables showing the prediction results, for the test data consisting of 880 entries (unsaturated FAs at 45 and 50C), for all seven mesophases: the inverse bicontinuous diamond cubic (QIID), inverse bicontinuous primitive cubic (QIIP), inverse hexagonal (HII),

23 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases.

3.3.

Phase prediction for new fatty acids

We explored the ability of machine learning models in predicting the mesophases of nanoparticles under the influence of new FA structures. As a test case, we excluded all entries for four unsaturated FAs, trans gondoic, gondoic, -linolenic and EPA acids, as well as two saturated FAs, pelargonic and tridecylic acids, from the training data. The structures of these FAs fall within the applicability domain of the models built using other FAs. The number of carbon atoms in the hydrocarbon chain of unsaturated FAs in the training set varies from 16 to 22 whereas that for the four unsaturated FAs in the test set is 18 and 20. The number of carbon atoms in the hydrocarbon chain of saturated FAs in the training set is between 8 and 17 while that for the two saturated FAs in the test set is 9 and 13. The test FA structures include both cis and trans isomers, FAs with the double bond at different locations and FAs with multiple double bonds. The data entries for these FAs were not used to train the models. They were only used to validate the performance of the obtained models. This resulted in a training set of 1936 entries and a test set of 484 entries. The same approach as described above was used to create ANN models predicting each of the seven mesophases for the training set data (16 unsaturated and 8 saturated FAs). These ANN models used the descriptors listed in Table 2 as well as the extra phase indicator descriptors. The performance of the models was validated using the test data (4 unsaturated and 2 saturated FAs). Table S-4 provides a summary of the model performance for all seven mesophases. As can be seen, the models could predict very well the formation of the mesophases. The QIID and HII models had 82% and 89% accuracy, respectively. Models for all other phases had accuracy of over 90%. The total number of errors was as low as 0 (I2 phase, test set) or 1 (I2 and Lalpha+L2+L3, 24 ACS Paragon Plus Environment

Page 24 of 38

Page 25 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

training set). The ability to predict the presence of a minor phase such as a micellar cubic I2 phase will be beneficial for the development of novel drug delivery systems. So far, among non-lamellar phases, bicontinous cubic phase and hexagonal phase have received the most attention as nano carriers. An I2 phase with a cubic packing of discrete inverse micelles is believed to provide more sustained release compared to a bicontinuous cubic phase or a hexagonal phase.6 This mesophase, however, has not been widely explored due to its low occurrence. The machine learning tools developed here may help expanding the number of lipid systems with an I2 phase and propel the usage of this mesophase for drug delivery applications. Figure 7 presents the truth tables for all seven mesophases: inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), inverse hexagonal (HII), microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases, for both training and test data. The outstanding performance of the models were reflected in high True Positive and True Negative values and low False Positive and False Negative ones. Figure 8 shows the predicted compared to experimentally observed phase diagrams for MO and PHYT at 25oC. Mesophase formation results for other temperatures (37, 40, 45, 50, 60oC) are given in the Supporting Information spreadsheet.

25 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Truth tables showing the prediction results, for the training data consisting of 1936 entries and test data with 484 entries, for all seven mesophases: the inverse bicontinuous diamond cubic (QIID), inverse bicontinuous primitive cubic (QIIP), inverse hexagonal (HII), microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases.

26 ACS Paragon Plus Environment

Page 26 of 38

Page 27 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 8. Experimentally observed (left) and predicted (right) phase diagram for monoolein (top) and phytantriol (bottom) at 25C under the effect of different unsaturated FAs for samples in the test set. The matching percentage is 71% for QIIP, 76% for QIID, 89% for HII, 91% for L2, 100% for I2, 99% for Lc and 96% for the combination of Lalpha+L2+L3 phases.

4.

Conclusions

We demonstrate that the complex phase behaviour of monoolein and phytantriol based nanomaterials under the effect of different temperatures and fatty acid additives could be modelled effectively using Bayesian regularized neural networks. Robust models have been developed for each of the seven mesophases: inverse bicontinuous diamond cubic (QIID), inverse bicontinuous primitive cubic (QIIP), inverse hexagonal (HII), microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases. These models incorporated considerable complexity and accounted for the effect of two types of lipids, 20 unsaturated fatty acids, 10 saturated fatty acids, a range of fatty acid/lipid ratios, and temperature. The models were highly accurate in predicting the phase behaviour for data used to train the neural networks, as well as for new data that was not seen in the training 27 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

process. They were capable of interpolating data for the same fatty acids at temperatures that were not yet tested, as well as extrapolating data for new fatty acid structures. The approach is useful in quickly expanding the phase behaviour landscape of the materials for further studies as well as reducing the number of experiments and therefore the resources required.

Acknowledgement TCL and NT thank RMIT University for the Vice Chancellor’s Fellowships.

Associated content Supporting Information available: More details about the experimental methodology and summary of modelling results are provided in Supporting Information 1 (PDF). The full list of prediction results is provided in Supporting Information 2 (Excel spread sheet).

References 1. Fong, W. K.; Negrini, R.; Vallooran, J. J.; Mezzenga, R.; Boyd, B. J., Responsive Self-Assembled Nanostructured Lipid Systems for Drug Delivery and Diagnostics. J Colloid Interf Sci 2016, 484, 320339. 2. Fong, C.; Le, T.; Drummond, C. J., Lyotropic Liquid Crystal Engineering-Ordered Nanostructured Small Molecule Amphiphile Self-Assembly Materials by Design. Chem Soc Rev 2012, 41, 1297-1322. 3. Angelova, A.; Angelov, B.; Mutafchieva, R.; Lesieur, S.; Couvreur, P., Self-Assembled Multicompartment Liquid Crystalline Lipid Carriers for Protein, Peptide, and Nucleic Acid Drug Delivery. Accounts Chem Res 2011, 44, 147-156. 4. Mulet, X.; Boyd, B. J.; Drummond, C. J., Advances in Drug Delivery and Medical Imaging Using Colloidal Lyotropic Liquid Crystalline Dispersions. Journal of colloid and interface science 2013, 393, 1-20. 5. Lee, K. W.; Nguyen, T.-H.; Hanley, T.; Boyd, B. J., Nanostructure of Liquid Crystalline Matrix Determines in Vitro Sustained Release and in Vivo Oral Absorption Kinetics for Hydrophilic Model Drugs. International journal of pharmaceutics 2009, 365, 190-199. 6. Phan, S.; Fong, W. K.; Kirby, N.; Hanley, T.; Boyd, B. J., Evaluating the Link between SelfAssembled Mesophase Structure and Drug Release. Int J Pharmaceut 2011, 421, 176-182. 7. Tran, N.; Mulet, X.; Hawley, A. M.; Hinton, T. M.; Mudie, S. T.; Muir, B. W.; Giakoumatos, E. C.; Waddington, L. J.; Kirby, N. M.; Drummond, C. J., Nanostructure and Cytotoxicity of SelfAssembled Monoolein-Capric Acid Lyotropic Liquid Crystalline Nanoparticles. RSC Advances 2015, 5, 26785-26795. 28 ACS Paragon Plus Environment

Page 28 of 38

Page 29 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

8. Chong, J. Y. T.; Mulet, X.; Waddington, L. J.; Boyd, B. J.; Drummond, C. J., Steric Stabilisation of Self-Assembled Cubic Lyotropic Liquid Crystalline Nanoparticles: High Throughput Evaluation of Triblock Polyethylene Oxide-Polypropylene Oxide-Polyethylene Oxide Copolymers. Soft Matter 2011, 7, 4768-4777. 9. Conn, C. E.; Darmanin, C.; Mulet, X.; Le Cann, S.; Kirby, N.; Drummond, C. J., High-Throughput Analysis of the Structural Evolution of the Monoolein Cubic Phase in Situ under Crystallogenesis Conditions. Soft Matter 2012, 8, 2310-2321. 10. Mulet, X.; Kennedy, D. F.; Conn, C. E.; Hawley, A.; Drummond, C. J., High Throughput Preparation and Characterisation of Amphiphilic Nanostructured Nanoparticulate Drug Delivery Vehicles. Int J Pharmaceut 2010, 395, 290-297. 11. Tran, N.; Bye, N.; Moffat, B. A.; Wright, D. K.; Cuddihy, A.; Hinton, T. M.; Hawley, A. M.; Reynolds, N. P.; Waddington, L. J.; Mulet, X.; Turnley, A. M.; Morganti-Kossmann, M. C.; Muir, B. W., Dual-Modality Nirf-Mri Cubosomes and Hexosomes: High Throughput Formulation and in Vivo Biodistribution. Mat Sci Eng C-Mater 2017, 71, 584-593. 12. Tran, N.; Hawley, A. M.; Zhai, J. L.; Muir, B. W.; Fong, C.; Drummond, C. J.; Mulet, X., HighThroughput Screening of Saturated Fatty Acid Influence on Nanostructure of Lyotropic Liquid Crystalline Lipid Nanoparticles. Langmuir 2016, 32, 4509-4520. 13. Tran, N.; Mulet, X.; Hawley, A. M.; Fong, C.; Zhai, J. L.; Le, T. C.; Ratcliffe, J.; Drummond, C. J., Manipulating the Ordered Nanostructure of Self-Assembled Monoolein and Phytantriol Nanoparticles with Unsaturated Fatty Acids. Langmuir 2018, 34, 2764-2773. 14. Zhai, J.; Tran, N.; Sarkar, S.; Fong, C.; Mulet, X.; Drummond, C. J., Self-Assembled Lyotropic Liquid Crystalline Phase Behavior of Monoolein-Capric Acid-Phospholipid Nanoparticulate Systems. Langmuir 2017, 33, 2571-2580. 15. Angelova, A.; Garamus, V. M.; Angelov, B.; Tian, Z. F.; Li, Y. W.; Zou, A. H., Advances in Structural Design of Lipid-Based Nanoparticle Carriers for Delivery of Macromolecular Drugs, Phytochemicals and Anti-Tumor Agents. Adv Colloid Interfac 2017, 249, 331-345. 16. van t'Hag, L.; Gras, S. L.; Conn, C. E.; Drummond, C. J., Lyotropic Liquid Crystal Engineering Moving Beyond Binary Compositional Space - Ordered Nanostructured Amphiphile Self-Assembly Materials by Design. Chem Soc Rev 2017, 46, 2705-2731. 17. Israelachvili, J. N.; Mitchell, D. J.; Ninham, B. W., Theory of Self-Assembly of Hydrocarbon Amphiphiles into Micelles and Bilayers. J. Chem. Soc., Faraday Trans. 2 1976, 72, 1525-1568. 18. Hyde, S. T., Identification of Lyotropic Liquid Crystalline Mesophases. Handbook of applied surface and colloid chemistry 2001, 2, 299-332. 19. Seddon, J.; Templer, R., Polymorphism of Lipid-Water Systems. Handbook of biological physics 1995, 1, 97-160. 20. Lee, W. B.; Mezzenga, R.; Fredrickson, G. H., Self-Consistent Field Theory for Lipid-Based Liquid Crystals: Hydrogen Bonding Effect. The Journal of chemical physics 2008, 128, 02B621. 21. Le, B. T. C.; Tran, N.; Mulet, X.; Winkler, D. A., Modeling the Influence of Fatty Acid Incorporation on Mesophase Formation in Amphiphilic Therapeutic Delivery Systems. Mol Pharmaceut 2016, 13, 996-1003. 22. Le, T. C.; Mulet, X.; Burden, F. R.; Winkler, D. A., Predicting the Complex Phase Behavior of Self-Assembling Drug Delivery Nanoparticles. Mol Pharmaceut 2013, 10, 1368-1377. 23. Le, T. C.; Conn, C. E.; Burden, F. R.; Winkler, D. A., Predicting the Effect of Lipid Structure on Mesophase Formation During in Meso Crystallization. Crystal Growth & Design 2013, 13, 3126-3137. 24. Le, T. C.; Conn, C. E.; Burden, F. R.; Winkler, D. A., Computational Modeling and Prediction of the Complex Time-Dependent Phase Behavior of Lyotropic Liquid Crystals under in Meso Crystallization Conditions. Crystal Growth & Design 2013, 13, 1267-1276. 25. Le, T.; Epa, V. C.; Burden, F. R.; Winkler, D. A., Quantitative Structure-Property Relationship Modeling of Diverse Materials Properties. Chem Rev 2012, 112, 2889-2919. 26. Le, T. C.; Winkler, D. A., Discovery and Optimization of Materials Using Evolutionary Approaches. Chem Rev 2016, 116, 6107-6132. 29 ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

27. Le, T. C.; Tran, N.; Mulet, X.; Winkler, D. A., Modeling the Influence of Fatty Acid Incorporation on Mesophase Formation in Amphiphilic Therapeutic Delivery Systems (Vol 13, Pg 996, 2016). Mol Pharmaceut 2018, 15, 341-341. 28. Angelova, A.; Drechsler, M.; Garamus, V. M.; Angelov, B., Liquid Crystalline Nanostructures as Pegylated Reservoirs of Omega-3 Polyunsaturated Fatty Acids: Structural Insights toward Delivery Formulations against Neurodegenerative Disorders. Acs Omega 2018, 3, 3235-3247. 29. Burden, F. R.; Winkler, D. A., Robust Qsar Models Using Bayesian Regularized Neural Networks. Journal of Medicinal Chemistry 1999, 42, 3183-3187. 30. Winkler, D. A.; Burden, F. R., Robust Qsar Models from Novel Descriptors and Bayesian Regularised Neural Networks. Molecular Simulation 2000, 24, 243-258. 31. Polley, M. J.; Winkler, D. A.; Burden, F. R., Broad-Based Quantitative Structure-Activity Relationship Modeling of Potency and Selectivity of Farnesyltransferase Inhibitors Using a Bayesian Regularized Neural Network. Journal of Medicinal Chemistry 2004, 47, 6230-6238. 32. Tarasova, A.; Burden, F.; Gasteiger, J.; Winkler, D. A., Robust Modelling of Solubility in Supercritical Carbon Dioxide Using Bayesian Methods. Journal of Molecular Graphics & Modelling 2010, 28, 593-597. 33. Angelov, B.; Angelova, A.; Mutafchieva, R.; Lesieur, S.; Vainio, U.; Garamus, V. M.; Jensen, G. V.; Pedersen, J. S., Saxs Investigation of a Cubic to a Sponge (L-3) Phase Transition in Self-Assembled Lipid Nanocarriers. Phys Chem Chem Phys 2011, 13, 3073-3081.

Table of Contents/Abstract Graphic

30 ACS Paragon Plus Environment

Page 30 of 38

Page 31 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 1. Chemical structures of two lipids, monoolein and phytantriol, together with twenty unsaturated and ten saturated fatty acids. The blue arrows show the location of the double bond in the hydrocarbon chains. 120x109mm (600 x 600 DPI)

ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. (a) Truth tables showing the prediction results using MLR and ANN, for the training set, for three major mesophases: inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), , and inverse hexagonal (HII). (b) Truth tables for ANN models, for the training set, for the inverse hexagonal (HII) phase when adding extra variables indicating the absence and presence of the other two major phases (QIIP and QIID) to the input layer of ANNs. The tables show the number of True Negative, False Negative, True Positive and False Positive predictions (in clockwise order, starting from the top left value). 178x126mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 32 of 38

Page 33 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 3. Predicted phase diagrams for monoolein (top) and phytantriol (bottom) at 25oC under the effect of different unsaturated FAs. These diagrams show the prediction results from the best ANN models for each of the three major mesophases: inverse bicontinuous primitive cubic (QIIP), inverse bicontinuous diamond cubic (QIID), and inverse hexagonal (HII). Samples that are marked with colour indicate that the mesophase was observed experimentally. Samples marked with ‘x’ indicate that the phases for these samples were predicted incorrectly. No diagram is shown for the inverse bicontinuous primitive cubic (QIIP) mesophase for PHYT as this phase was absent for all PHYT samples. 87x76mm (600 x 600 DPI)

ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Truth tables showing the prediction results, for the training set, for four minor mesophases: microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases when standard descriptors as shown in Table 2 were used (a) and when three additional variables are included (b). 178x101mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 34 of 38

Page 35 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 5. Contributions of different descriptors on the formation of the inverse hexagonal (HII) and microemulsion (L2) mesophases illustrated by the coefficients of the MLR models. 101x201mm (600 x 600 DPI)

ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Truth tables showing the prediction results, for the test data consisting of 880 entries (unsaturated FAs at 45 and 50C), for all seven mesophases: the inverse bicontinuous diamond cubic (QIID), inverse bicontinuous primitive cubic (QIIP), inverse hexagonal (HII), microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases. 142x172mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 36 of 38

Page 37 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Applied Nano Materials

Figure 7. Truth tables showing the prediction results, for the training data consisting of 1936 entries and test data with 484 entries, for all seven mesophases: the inverse bicontinuous diamond cubic (QIID), inverse bicontinuous primitive cubic (QIIP), inverse hexagonal (HII), microemulsion (L2), micellar cubic (I2), lamellar crystal (Lc) and a combination of Lalpha+L2+L3 phases. 284x172mm (600 x 600 DPI)

ACS Paragon Plus Environment

ACS Applied Nano Materials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Experimentally observed (left) and predicted (right) phase diagram for monoolein (top) and phytantriol (bottom) at 25oC under the effect of different unsaturated FAs for samples in the test set. The matching percentage is 71% for QIIP, 76% for QIID, 89% for HII, 91% for L2, 100% for I2, 99% for Lc and 96% for the combination of Lalpha+L2+L3 phases. 101x55mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 38