Meta-Analysis of Nanoparticle Cytotoxicity via Data ... - ACS Publications

Significantly, meta-analysis, or data-mining and knowledge-extraction from literature data, has the potential to reveal hidden relationships that were...
0 downloads 0 Views 1005KB Size
Subscriber access provided by EKU Libraries

Article

Meta-Analysis of Nanoparticle Cytotoxicity via Data-Mining the Literature Hagar Ibrahim Labouta, Nasimeh Asgarian, Kristina Rinker, and David T. Cramb ACS Nano, Just Accepted Manuscript • DOI: 10.1021/acsnano.8b07562 • Publication Date (Web): 28 Jan 2019 Downloaded from http://pubs.acs.org on January 30, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Meta-Analysis of Nanoparticle Cytotoxicity via Data-Mining the Literature

Hagar I. Laboutaa,b,c,d,*, Nasimeh Asgariana, Kristina Rinkerc,e,f, and David T. Cramba,f,g*

aDepartment

of Chemistry, Faculty of Science, University of Calgary, Calgary, T2N 1N4,

Canada; bCollege of Pharmacy, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, R3E 0T5, Canada; cBiomedical Engineering, University of Calgary, Calgary, T2N 1N4, Canada; dDepartment of Pharmaceutics, Faculty of Pharmacy, Alexandria University, Alexandria, 21521, Egypt; eDepartment of Chemical and Petroleum Engineering, University of Calgary, Calgary, T2N 1N4, Canada; fDepartment of Physiology and Pharmacology, Cumming School of Medicine, University of Calgary, Calgary, T2N 1N4, Canada; gDepartment of Chemistry and Biology, Faculty of Science, Ryerson University, Toronto, M5B 2K3, Canada

ACS Paragon Plus Environment

1

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 58

KEYWORDS. Nanoparticle cytotoxicity; cell viability; meta-analysis; machine learning; classification decision trees.

ACS Paragon Plus Environment

2

Page 3 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

TOC GRAPHIC

ACS Paragon Plus Environment

3

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 58

ABSTRACT. Developing predictive modeling frameworks of potential cytotoxicity of engineered nanoparticles is critical for environmental and health risk analysis. The complexity and the heterogeneity of available data on potential risks of nanoparticles, in addition to interdependency of relevant influential attributes makes it challenging to develop a generalization of nanoparticle toxicity behaviour. Lack of systematic approaches to investigate these risks further adds uncertainties and variability to the body of literature and limits generalizability of existing studies. Here, we developed a rigorous approach for assembling published evidence on cytotoxicity of several organic and inorganic nanoparticles and unraveled hidden relationships that were not targeted in the original publications. We used a machine learning approach that employs decision trees together with feature selection algorithms (e.g. Gain ratio) to analyze a set of published nanoparticle cytotoxicity sample data (2896 samples). The specific studies were selected because they specified nanoparticle-, cell- and screening method-related attributes. The resultant decision-tree classifiers are sufficiently simple, accurate and with high prediction power and should be widely applicable to a spectrum of nanoparticle cytotoxicity settings. Among several influential attributes, we show that the cytotoxicity of nanoparticles is

ACS Paragon Plus Environment

4

Page 5 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

primarily predicted from the nanoparticle material chemistry, followed by nanoparticle concentration and size, cell type and cytotoxicity screening indicator. Overall, our study indicates that following rigorous and transparent methodological experimental approaches, in parallel to continuous addition to this dataset developed using our approach will offer higher predictive power and accuracy and uncover hidden relationships. Results obtained in this study helps focus future studies to develop nanoparticles that are safe-by-design.

ACS Paragon Plus Environment

5

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 58

Despite years of excellent individual investigations, it has not been possible to develop a globalized cause and effect model relating nanoparticle (NP) properties to cytotoxicity.1 The main physicochemical properties of NPs that have been considered include material, surface chemistry, size, surface charge, concentration and colloidal stability. These physicochemical properties are often strongly interconnected and it is thereby hard to synthesize NPs in which only one parameter can be exclusively varied while others are kept constant. However, if toxicity could be correlated to these basic physicochemical properties, those correlations could allow researchers to predict potential risks and design tailor-made NPs with minimal cytotoxicity. A modeling framework that can anticipate the cytotoxic potential of a specific NP design would be of a great benefit to manufacturers and regulators as well as researchers. Additional motivation for the development of these predictive frameworks is to help materials developers transition out of the animal toxicity testing paradigm.

To date, a systematic approach to investigate the potential risks of NPs using in vitro cell experiments is lacking. The nanomaterials community is in a paradigm whereby

ACS Paragon Plus Environment

6

Page 7 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

researchers independently estimate the risks of their formulated or purchased NPs using their own methodological approaches. The basic studies used various cell types and cell exposure times to NPs. There are also a variety of test indicators being used for measuring cytotoxicity and/or cell viability. This testing landscapes leads to uncertainties and variability found within the body of nanotoxicology literature and therefore limits generalization of individual studies’ findings.

A challenging question is that of the hierarchy of NP features with respect to cytotoxicity. For example, is size more predictive than NP material, or surface chemistry? Limited approaches exist which attempt to produce a generalization of NP toxicity behaviour via either qualitative classification-based models 2 or quantitativestructure–activity relationships.3 They have usually focused on developing a given model using a limited dataset from an individual study rather than considering the “entire” body of published evidence. Significantly, meta-analysis, or data-mining and knowledge-extraction from literature data, has the potential to reveal hidden relationships that were not evident in the original investigations.4 To this end, strategies

ACS Paragon Plus Environment

7

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 58

have been proposed for ascertaining the properties of NPs that lead to the observed cytotoxic effects.5, 6 To date, attempts for meta-analysis of published data have been limited to two particular types of NPs, namely carbon nanotubes 7, 8 and quantum dots.9

Given the complexity of the global datasets on NP cytotoxicity, linear (single causeeffect) models and other algebraic approaches will have great difficulty discerning trends from the dense body of available information.2 We submit that using a machinelearning approach will be more fruitful as has been done for more limited datasets.2, 7, 9 Decision-trees are machine learning approaches that predict the value of a target variable (cell viability, in this case) by learning simple decision rules inferred from the datasets (e.g. surface charge, diameter, material, etc.). Each of these measurable attributes or characteristics being observed is termed a ‘feature’. Choosing a set of informative and discriminating features is a crucial step for effective algorithms in pattern recognition and classification. Each algorithm takes as input a complete dataset (training dataset) and produces a ‘classifier’ which learns to predict a ‘label’ based on patterns found in the training dataset, a dataset that typically involves many features. In

ACS Paragon Plus Environment

8

Page 9 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

our case, the ‘label’ is low cell viability (for 50%). The classifier can then take a new ‘instance’ (corresponding to a vector of available feature information, but not the “% cell viability” label) and returns a predicted label for this instance. Thus decision trees can reveal critical variables from a complex dataset, perform accurately as classifiers, and can be sufficiently simple for future use.10

The study presented herein responded to the need for a predictive modeling framework of potential cytotoxicity of engineered NPs. We have performed a comprehensive and rigorous meta-analysis of published cytotoxicity data of a variety of organic and inorganic NPs using classification-based decision tree models. We have hypothesized that NP cytotoxicity could be predicted using NP-related parameters (core and surface coating material, size, surface charge and concentration), cell-related attributes (celltype, source, organism, age and morphology) as well as methodological parameters related to cytotoxicity/cell viability test indicator and the exposure time. Given that all available published datasets are not equally valuable, we adopted a rigorous framework for data selection and extraction. To overcome the challenge of integrating

ACS Paragon Plus Environment

9

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 58

heterogeneous cytotoxicity data generated in different studies aiming at high prediction power, we followed some strict criteria for inclusion of a study in the meta-analysis pool as well as limiting the data in the pool to those involving commonly used cell viability/cytotoxicity assays. Out of the ~400 peer-reviewed original research articles examined, 93 studies were selected. Ultimately, the dataset used had 2896 NPs with 15 features. In order to be able to work with 2 additional features that were available for some of the NPs, surface coating material and zeta Potential (ZP), we created 3 mode datasets: a subset of the data with 1052 NPs from 89 publications with known coating material, subset of the data with 1261 NPs from 32 publications with known ZP feature and a subset of data with only 540 NPs from 15 publications with known coat and ZP features.

RESULTS AND DISCUSSION

Literature data search and data mining

The process used produced a final selection of 93 peer-reviewed research articles (published in 2004-2015). Knowledge extracted from the selected literature included

ACS Paragon Plus Environment

10

Page 11 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

features describing NP material type, surface coating, size, concentration, and ZP, cells used for the assay, cells’ exposure time to NPs, the type of cell viability/cytotoxicity assay used, and resulting cell viability data. This is in addition to a binary assessment (Yes or No) of the presence of interference tests, a colloidal stability test and the use of a positive control (toxin). Literature data mining yielded a total of 2896 individual % cell viability data points (for the features list and ranges, see Figure 1). Some features values, e.g. NP concentration, were harmonized for units (µM for NP concentration) before machine learning experiments. The specifics of the literature data search and data mining process are indicated in the methods section.

ACS Paragon Plus Environment

11

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 58

Figure 1. Overview of literature nanoparticle data attributes/features (2896 data points). A total of 17 features were considered: Nanoparticle (NP)-related features (type, core and surface coating material, diameter, surface charge and concentration), cell-related features (cell-type, cell line/primary cells, human/animal, organism (animal) source, organ/tissue source, age and morphology) as well as methodological parameters related to cytotoxicity/cell viability test indicator (test, test indicator, biochemical metric) and the exposure time.

ACS Paragon Plus Environment

12

Page 13 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Machine learning of the meta-analyzed dataset: Building nanoparticle cell viability models

A dataset with a total of 2896 individual data points (i.e. NP experiments) spanning 15 features (all features except NP surface coating and ZP having missing values for some instances) was first harmonized to have consistent units of measure and pre-processed (refer to the Methods section and Supplementary nanoparticle dataset file) prior to machine learning experiments. Several “classification-based” algorithms were built in an attempt to predict % cell viability on exposure to NPs. We used 10-fold cross validation to evaluate the classifiers. That is, we partitioned the data into 10 (almost) equal parts. Then we used 9/10 of these partitions to build the model and used the remaining 1/10 of the data to test it. We repeated this process for each partition and report the average accuracy over all 10 datasets. Among all the algorithms, the decision tree has resulted in the best accuracy, based on the cross validation results for all of our different datasets. Please refer to the Supplementary Table 1 for the cross validation results. The objective of the decision trees was to predict the % viability of cells on exposure to NPs

ACS Paragon Plus Environment

13

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 58

using the binary classification of low % cell viability (≤50%) or high % cell viability (>50%). Based on this definition, a total of 686 particles corresponded to low % cell viability and 2210 particles high % cell viability classes.

Based on the studies available, not all datasets used in this analysis contained all 15 features plus surface coating and ZP. For example, a total of 1844 and 1635 values were missing for the NP surface coating material and ZP values, respectively. To deal with the challenge of missing values, four additional decision trees (DT) were built using different parts of the dataset with known values:

Decision Tree 1 (DT1): 2896 data points with exclusion of the NP surface coating material and ZP as features (a total of 15 features)

Decision Tree 2 (DT2): 1052 data points (in 89 publications) that included the NP surface coating material feature with exclusion of ZP (16 total features)

Decision Tree 3 (DT3): 1261 data points (in 32 publications) that included the ZP feature with exclusion of surface coating material (a total of 16 features)

ACS Paragon Plus Environment

14

Page 15 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Decision Tree 4 (DT4): 540 data points (in 15 publications) that included all features (a total of 17 features)

Prior to training these decision trees, feature selection was essential to avoid “overfitting” the data. Overfitting refers to a “learner” (learning in machine learning is a process that involves using the training dataset to find patterns in the data) that models the training data too well; that is it learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data, i.e. limiting the generalization power of the final model.11 We tried several feature selection techniques. We used the Gain ratio algorithm for selecting the most important features describing the cell viability results. The Gain ratio evaluates the worth of an attribute by measuring the Gain ratio with respect to the class label, low vs. high % cell viability.11 In addition, we could show that using only a subset of features gives us the same performance as using all the features (Supplementary Figure 1).

The different selected features of the four decision trees (DTs) are listed in Table 1. In all cases, different sets of attributes describing the NPs and the cells were selected.

ACS Paragon Plus Environment

15

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 58

Test-related features were added in DT 1, 2 and 4. In Table 1, the baseline accuracy corresponds to the overall accuracy of the data when every instance is predicted as the majority class without having a model. For example, with a majority class “high” % cell viability (2210 instances) for DT1 data, if all instances (2896 instances) are predicted or assumed as a best guess to have “high” % cell viability, the DT is correct 76.3% of the time, because the data from DT1 shows 76.3% high viability (2210/2896 * 100). The value of baseline accuracy therefore fluctuates depending on the dataset considered. For example, for DT2 data with a total number of instances of 1052 and a majority class of 893 high viability instances, the baseline accuracy was computed as 84.9%. Our goal was to build a decision tree model that could predict better than the baseline. Decision tree accuracy is based on predicting the label using the decision tree model. A decision tree is considered a failure if it cannot predict better than the baseline. The decision trees accuracies and the improvement from baseline are presented in Table 1. In all cases, decision trees (DT1 to DT4) could predict higher than the baseline. It should be noted that we cannot compare the accuracies of the four decision trees (DT1 to DT4) since each of them is built based on a different set of data with different features. Other

ACS Paragon Plus Environment

16

Page 17 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

attributes measuring the predictivity and sensitivity of the decision trees are included in the Supplementary information (refer to Supplementary Table 2).

Table 1. Summary of the feature selection results using Gain ratio.

Decision tree Decision Tree 1

Decision Tree 2

Decision Tree 3

Decision Tree 4

Baseline

84.9%

72.6%

83.1%

76.3%

accuracy Selected

 NP material

 NP diameter

 NP material

 NP zeta potential

features

 NP diameter

 NP surface

 NP diameter

 Cells

 NP

 Human/Animal

 NP concentration  Cells

coating  NP

 Cell age

concentration

concentration  NP zeta

 Exposure time

 Cell source

 Cells

 Cell type

 Cell source

 Cells

 Test indicator

 Exposure time

 Cell age

 Biochemical

 Test

 Cell source

(animal/human)  Exposure time

potential

cells  Test

metric

 Test indicator Decision tree 87.9±2.2%

90±4.1%

88.2±2.2%

91.8±3.7%

6%

21%

10%

accuracy %

15%

Improvement in accuracy from baseline* NP: Nanoparticle

ACS Paragon Plus Environment

17

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 58

* % improvement in accuracy = (Decision tree accuracy – Baseline accuracy)/Baseline accuracy*100

Decision Tree 1

Some interesting outcomes resulted from the use of DT1. For example, the most important discriminator of cytotoxicity is NP material type. Following that, NP concentration and size are also shown as important determinants of cell viability (Figure 2). Thus, general conclusive statements on the safety of specific material types of NPs cannot be made, because the prediction of cytotoxicity depends on multiple parameters. Moreover, it is well-understood in the literature that toxicity is all about dose.12-14 Nanotoxicity results should also be paired with the cell type(s) used for screening as well as exposure time since viability results do differ with respect to those attributes (Figure 2). Below we describe some of the interesting details that analysis of the DT1 results produced. The entire DT1 was visualized in two formats, top-down or hierarchical (Supplementary Figure 2), and spring model (Supplementary Figure 3), using the graph visualization software Graphviz (www.graphviz.org). Leaf nodes (the

ACS Paragon Plus Environment

18

Page 19 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

nodes without any branches out of them) with 10 or fewer instances, were removed in order to further clean the tree and minimize the noise for presentation. Additionally, the presented DT1 was simplified by removing leaf nodes directly connected to the root in order to decrease crowding; excluded data is presented in Supplementary Table 3. Representative parts of DT1 are presented here in Figure 2.

ACS Paragon Plus Environment

19

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 58

Figure 2. Representative branches of the decision tree 1 (DT1) learnt from 9 features (nanoparticle (NP) type, NP diameter, NP concentration, cells, cell age, cell source, cell type (animal/human), cell exposure time to NP and test indicator used to determine cell viability). The label “Low” in red in the decision tree stands for low cell viability (for 50%), i.e. low cell toxicity.

According to DT1 including all considered data points (2896), NP material type is the first rank parameter and the most important determinant of cell viability. Cytotoxicity of nanomaterials was reported in literature to correlate with their composition aside from their physical properties such as size and shape.15, 16 Since each NP material type triggers distinctive cell responses, it is important that cytotoxicity studies are conducted for each new NP material. It should be noted here that the effect of surface chemistry (NP surface coating material) and surface charge (approximated as ZP) of NPs were not included as testable features in DT1, because the data was either unavailable or inconsistent. Among, the tested NPs, polylactic-co-glycolic acid (PLGA) NPs, of size

ACS Paragon Plus Environment

20

Page 21 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

diameter 105 to 230nm and a concentration of 6.02x10-6 to 0.01μM, resulted in high cell viability for all instances (69). Interestingly, different cell viability results were observed for carbon nanoparticles (5.9-900nm diameter, 6.37x10-7-0.7μM concentration range, extracted from four publications) versus carbon nanotubes (1-225.8nm diameter, 1.4x10-10-1127.6μM concentration range, extracted from seven publications) (Figure 2A). While a high viability is observed for carbon nanoparticles, viability of cells (high or low) on exposure to carbon nanotubes is dependent on the nanotube diameter and exposure time. It is possible that the difference in toxicity between carbon nanoparticles and carbon nanotubes is related to the aspect ratio, as well as a concentration. To examine this, the dataset (Supplementary nanoparticle dataset file) was surveyed for instances of low cell viability on exposure to carbon nanotubes at a concentration range lower than 0.7μM (the upper concentration limit of carbon nanoparticles included in our dataset); low viability values were observed at a concentration range of 0.0003-0.106μM for nanotubes only. The results of our analysis thus imply an aspect ratio dependency of the cytotoxicity of carbon nanomaterials. This outcome is supported by previous

ACS Paragon Plus Environment

21

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 58

literature showing that carbon nanomaterials with different geometric structure exhibit different in vitro cytotoxicity with nanotubes possessing signs of cytotoxicity at 10 times lower dose than carbon nanoparticles.17

Importantly, the cytotoxicity test indicator was found to be an important determinant of the viability results (Figure 2). For instance, tetrazolium salt-based assays always resulted in higher cell viability values compared with low viability indicated by an LDH (lactate dehydrogenase enzyme) activity assay, as observed with MgO, Al2O3, TiO2, iron oxide and CuO NPs (for all instances in DT1). Tetrazolium salts are widely used to analyze cell viability via determining cellular metabolic activity by a rapid colorimetric quantification. Tetrazolium salts (e.g. MTT, MTS, XTT, WST-1, etc.) are chemically reduced by metabolically active cells to formazan products whose concentration can be spectrophotometrically determined. Yet, the use of tetrazolium salts to test cell response to NP exposure suffer several limitations. For instance, Zhang et al.18 observed optical interference of iron oxide NPs with formazan produced from MTS reagent resulting in higher cell viability values. Other NPs loaded with analytes (antioxidants and efflux

ACS Paragon Plus Environment

22

Page 23 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

pump inhibitors or substrates) were reported to interfere with the cell uptake of tetrazolium salts, for the MTT assay.19, 20 Other shortcomings include inability to use them with some cell lines exhibiting low metabolic capacity,21 and media component amplification of formazan production in a way that cannot be corrected for by mediaonly controls.22 To overcome the aforementioned limitations, LDH is routinely used in combination with MTT assay for cell viability screening. The test detects the release of the cytoplasmic enzyme LDH into the cell culture supernatant on cell membrane damage. Nevertheless, the LDH assay also suffers from several drawbacks. For instance, the enzymatic activity of LDH measured by the assay deteriorates with time due to LDH’s natural degradation.23 Enzyme stability in the supernatant as well as its enzymatic activity varies with several factors, including pH and components of the culture medium.23 Moreover, NPs could interfere with LDH assay via adsorption of the enzyme on the NP surface.24, 25 For that reason, we submit that future investigations and data interpretations should follow a thorough understanding of the assay mechanism and limitations as well as interference testing.

ACS Paragon Plus Environment

23

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 58

NP interference with the detection method could arise from the intrinsic optical absorbance and/or the scattering power of NPs.25 The high adsorptive power of NPs could interfere also with dyes, enzymes or proteins used in cytotoxicity assays and subsequently quench or alter their optical 26, 27 or enzymatic properties.24, 28, 29 More than one assay is sometimes recommended when determining NP toxicity for risk assessment to increase the level of confidence.30 A more economic approach would be simply screening the different NPs for possible interference with cytotoxicity assay reagents. Even subtle differences in the oxidation state of metal oxide NPs could be sufficient to result in major differences in their ability to interfere with fluorometric dyes.31 However, there is no comprehensive understanding of a general trend or specified parameters prompting such interactions. NPs under investigation should thus be checked for interference before using any cytotoxicity assay on a case-by-case basis. Protocols to test for NP interference with several assay reagents in acellular conditions were described in literature.25, 28, 31, 32 These protocols could thus be adapted to avoid apparent artefacts in the cytotoxicity measurements 29, 33 Otherwise, an alternative assay method should be used. Unfortunately, most of the studies in literature (about

ACS Paragon Plus Environment

24

Page 25 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

85% of the tests on the meta-analyzed NPs) do not indicate a pre-screening for NPs interference with assay reagents. Thus, reported results could be sometimes misleading. Furthermore, testing NPs interference in acellular environment, though appearing as the optimal strategy, does not account for possible interactions in presence of cells and cell extracts. NP could interact differently in presence of the cellular milieu.13

Decision Trees 2-8: General observations/discussion

Three additional machine learning experiments were conducted using subsets of the data to overcome the problem of missing data and test the effect of NP surface coating material and ZP separately (DT2 and DT3, respectively) and combined (DT4 with a limited subset of the data) (Figure 3). The trees were trained using selected features pre-determined by the Gain ratio (Table 1) and were cleaned and pruned similarly as described above. Leaf nodes that were connected directly to the root node were omitted so that it would be easier to read the tree. These nodes from DT2 and DT3 are presented in Supplementary Tables 4-5 (DT4 did not have any of those leaf nodes).

ACS Paragon Plus Environment

25

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 58

Decision trees 5 and 6 zoomed in on specific cell types; primary cells (‘DT5’ – based on 540 data entries in 99 publications) and immortalized cell lines (‘DT6’ – based on 2356 data entries in 24 publications), respectively (Figure 4A and Supplementary Figure 4). Whereas decision trees 7 and 8 explored single prevalent nanoparticles, zinc oxide (‘DT7’ – based on 238 data entries in 11 publications) and silver (‘DT8’ – based on 268 data entries in 11 publications), respectively (Figure 4).

ACS Paragon Plus Environment

26

Page 27 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

(A) Decision tree 2 (DT2)

(B) Decision tree 3 (DT3)

ACS Paragon Plus Environment

27

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 58

(C) Decision tree 4 (DT4)

Figure 3. (A) Decision tree 2 (DT2) learnt from 7 features (nanoparticle (NP) type, NP surface coating, NP concentration, cells, cell source, exposure time and test). (B)

ACS Paragon Plus Environment

28

Page 29 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Decision tree 3 (DT3) learnt from 7 features (nanoparticle (NP) type, NP diameter, NP concentration, NP zeta potential, cells, cell age and cell source). (C) Decision tree 4 (DT4) learnt from 7 features (NP zeta potential, cells, Human/Animal cells, exposure time, test, test indicator and biochemical metric of the test indicator). The label “Low” in red in the decision trees stands for low cell viability (for 50%), i.e. low cell toxicity.

ACS Paragon Plus Environment

29

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 58

The overall outcomes of DT1 were expanded upon by decision trees DT2, DT3 and DT4 where NP-related, cell-related and test-related features determined the cell viability class (low or high). The viability test indicator and its biochemical metric are at the roots of DT2 and DT4, respectively. As before, tests measuring cell metabolic activity tend to result in high cell viability values (Figure 3C). It is noteworthy mentioning that the NP type was not among the strongly influencing features for DT2 and DT4. This could be attributed to the different feature set (adding NP surface coating feature) and the combination of the new feature with the other available features.

Interestingly, the NP surface coating is an influential feature on the cell viability results as shown in DT2 (Figure 3A). The surface coating, influences the surface charge and the potential binding moieties for interactions with cells. Additionally, DT3 and DT4 (Figure 3B,C) showed the impact of ZP on the cell viability results. Practically, the surface charge of NPs is often characterized in terms of ZP [mV]. However, ZPs are not equivalent to surface charge densities for two main reasons: the complex hybrid geometry of NPs and the effect of ions in solution on what is experimentally measured.1

ACS Paragon Plus Environment

30

Page 31 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

To this end, we only included ZP measurements from pure water in our dataset. This provides a reduction in the background noise of the measured values. Surface charge and coating material of NPs were reported in literature to drive interactions with cells, cellular uptake, cytotoxicity and modes of toxic action.34, 35 However, about 64% and 56% of the representative sample publications herein lacked appropriate reports of surface chemistry and ZP of NPs, respectively. This significantly limits the generalization ability from nanotoxicity literature.

On another note, different cells respond differently to nanomaterials and the toxicants used as positive controls. Decision trees DT2 and DT4 clearly indicate a higher sensitivity of human cells to NP toxicity than non-human animal cells (Figure 3A,C). Human cells were reported as more susceptible to toxicants such as arsenic than nonhuman animal cells.36 Systematic studies are needed to explore whether this observation holds for exposure to a wide variety of NPs. According to DT2, humanderived SH-SY5Y cell line (adult epithelial cells cloned from bone marrow) were more sensitive to NPs than RAW 264.7 cell line (adult mouse monocyte/macrophage from the

ACS Paragon Plus Environment

31

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 58

blood) (DT2-Figure 3A). Because of this observation, we also explored NP toxicity on primary cells (DT5 – Figure 4A) versus immortalized cell lines (DT6 –Supplementary Figure 4). We used separate subsets of the data, resulting in 540 and 2356 data entries, for primary and immortalized cells, respectively. All possible features were included in building up these classification-based DTs and were pruned and cleaned similarly as described above. The decision trees’ accuracies were 91.2% ± 3.1 (DT5) and 88.25% ± 2 (DT6) compared to a baseline accuracies (accuracies based on prediction of the majority class without having a model) of 76.9% and 76.2%, respectively. Similar to DT1, NP- and test-related effects on cell viability were observed. Using only primary cells (DT5), we could build a better decision tree than DT1 (by gaining higher improvement in DT accuracy with respect to the baseline) to find the pattern in this subset of the data. The combination of other available features in DT5 (540 entries) was enough to gain good accuracy without the use of cell attributes (e.g. cell-type, source, age and morphology) in this subset of the data; this effect of cell attributes on % cell viability was evident in DT6.

ACS Paragon Plus Environment

32

Page 33 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

ACS Paragon Plus Environment

33

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 58

(A) Decision tree 5 (DT5): Primary cells

(B) Decision tree 7 (DT7): ZnO

ACS Paragon Plus Environment

34

Page 35 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

(C) Decision tree 8 (DT8): Ag

Figure 4. Decision trees 5, 7 and 8 (DT5, DT7 and DT8) learnt from all features under study. (A) DT5 is limited to experiments conducted on primary cells. DT7 (B) and DT8 (C) are limited to experiments conducted using Zinc oxide and silver nanoparticles,

ACS Paragon Plus Environment

35

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 58

respectively. The label “Low” in red in the decision trees stands for low cell viability (for 50%), i.e. low cell toxicity.

ACS Paragon Plus Environment

36

Page 37 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Finally, NPs with frequent presence in the datasets, namely zinc oxide (DT7 - 238 entries in 11 publications) and silver (DT8 - 268 entries in 11 publications), were separately studied using all the features under study (Figure 4B, C). For both of them, adult cells from both immortalized (cell lines) and primary sources were used in all experiments. A baseline accuracy, of 63.9% (DT7 - zinc oxide NPs) and 56% (DT8 silver NPs) increased to 79.3% ± 8.6 and 84.7% ± 6.1, respectively, when the decision tree algorithms were applied. This indicates that we can use decision trees to make better prediction even on smaller subset of NPs than simply using anecdotal evidence (i.e. baseline). The cytotoxicity of zinc oxide NPs was primarily dependent on exposure time followed by NP diameter, then concentration and cell-type effects. On the other hand, the concentration of silver NPs is the first rank parameter and key determinant of their cytotoxicity followed by exposure time, cells, biochemical metric of the test indicator. These decision trees (DT7 and DT8) could be used as guiding frameworks when planning cell experiments with zinc oxide and silver NPs.

Global limitations of typical literature data

ACS Paragon Plus Environment

37

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 58

Further challenges and uncertainties in the available dataset include missing experimental data on the colloidal stability of NPs in culture medium (or the medium of exposure to cells) and the deficiency of appropriate controls when screening NP cytotoxicity.

NPs are known to aggregate especially in aqueous dispersion leading to a change in their colloidal properties 24 and should be ideally characterized for size and surface charge under conditions that are representative of in vitro cell experimental conditions for improved in vitro assessment of NP cytotoxicity.37, 38 Almost all NP cytotoxicity studies depend on inherent colloidal properties; 93% of the analyzed NPs were not characterized for colloidal stability under in vitro experimental conditions. This creates an additional uncertainty in the specific dose and characteristics of NPs that could differ from one production batch to another.

Regulatory agencies often utilize tools to assess the reliability of the results from peer reviewed publications in order to be employed for use in human health hazard evaluation and risk assessments, e.g. the publicly available software-based ToxRTool

ACS Paragon Plus Environment

38

Page 39 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

(Toxicological data Reliability assessment Tool) which is a battery of questions for reliability assessment developed by the European Commission’s Joint Research Center.39, 40 We employed ToxRTool for quality assessment of the meta-analyzed studies for use in human health hazard assessments. However, most of the studies (92%) did not pass the reliability assessment owing to a missing positive control when screening for cytotoxicity. Positive controls (cytotoxins, corresponding to 100% cytotoxic response) are particularly essential to have for the purpose of checking the reliability of the used assay kit under experimental condition and for accurate estimation of cell viability/cytotoxicity data.13, 41, 42 Relying solely on negative control (cells not exposed to NPs, 100% cell viability) for normalizing measured cell responses due to NPs rather determines the fold decrease in cell viability from the control experiment, occasionally resulting in values way less than 0% cell viability shown in some literature. Most significantly, in absence of positive controls, one tends to emphasize small changes in cell responses to NPs.13

ACS Paragon Plus Environment

39

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 58

To alleviate with these challenges, classification-based decision trees were used because they are insensitive to those data defects that impair several types of regression analysis (high dimensionality, non-linearity, correlated variables, and significant quantities of missing values).7 Decision trees can therefore reveal critical variables from a complex dataset, perform accurately as classifiers, and are sufficiently simple for future use.10 To ensure the regulatory acceptance of our model, we followed the OECD principles (Organization for Economic Co-operation and Development) for QSAR (Quantitative structure–activity relationship) model development:43 (i) a defined endpoint (predicting high versus low cell viability), (ii) an unambiguous algorithm (decision tree), (iii) a clear domain of applicability (nanoparticles with the same provided feature set), (iv) appropriate measure of decision tree goodness and accuracy to predict, and (v) mechanistic interpretation of the decision tree models.

CONCLUSIONS

A meta-analysis approach has been developed here for the assembly and generalization of published NP cytotoxicity data involving commonly used organic and

ACS Paragon Plus Environment

40

Page 41 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

inorganic NPs. Following extensive data collection and mining, a total of 2896 cytotoxicity data samples were generated, each with 15 features of NP, cell and methodological attributes. This study provides a comprehensive compiled dataset for a large spectrum of NPs in practice.

In this work, we showed that a machine-learning algorithm, when applied to this complex heterogeneous dataset describing NP cytotoxicity experiments, can autonomously return a simple and efficient classifier of selected features describing NP, cell and test-related attributes with high prediction accuracy. Furthermore, we have assessed the literature data from different perspectives by successfully building different DTs with different features sets. The combination of the different features helped uncover several subtle observations not obvious in the original research articles. Our study also indicates that we can use DT to make better prediction even on smaller subset of dataset.

Developed models indicated that NP-induced cytotoxicity response is primarily related to NP chemistry. Other influential NP-related attributes such as NP concentration, size,

ACS Paragon Plus Environment

41

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 58

and surface properties, as well as cell anatomical type and cell origin, and cell exposure time to NPs. We find that absolute statements on the safety of specific types of NPs can only be possible considering all of the physicochemical properties, used cell models and methodological procedures. Interestingly, the cytotoxicity screening indicator was found to be an important determinant of cell viability results. To this end, our study clearly articulates that interference testing (absent in 85% of the tests on the meta-analyzed samples) and thorough understanding of the assay mechanism and limitations are integral to valid estimation of potential risks of NPs using in vitro cell models. Results herein also imply differential sensitivities of cell models following exposure to NPs, especially when comparing human cells to animal cells. Finally DTs developed in this study, example zinc oxide and silver NPs (DT7 and DT8), could be used as guiding frameworks when planning cell experiments with these NPs and could be implemented by policy makers.

Overall, this work suggests that information developed from machine learning of published NP cytotoxicity data can provide guidance and prediction to key attributes

ACS Paragon Plus Environment

42

Page 43 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

related to potential risks of NP cytotoxicity that should be characterized and reported in future studies. Following rigorous and transparent methodological experimental approaches and reports, in parallel to continuous addition to this dataset, will offer higher prediction power and accuracy and reveal further hidden relationships. This will help focus future studies to help in the development of NPs that are safe-by-design.

METHODS

Literature search and data extraction and harmonization

A work flow chart summarizing the steps undertaken is shown below:

Initial Selection of nanoparticle cytotoxicity studies → more focused selection → data extraction and compilation → data harmonization Initially, an iterative systematic literature search was performed using several search engines (e.g. Google/Google Scholar, PubMed, Web of Science) and different combinations

of

search

terms

(e.g.

nanoparticle(s)+cytotoxicity,

nanoparticle(s)+cell+responses, nano+viability). This phase resulted in an initial selection of peer-reviewed original research articles (~400 articles) reporting on cytotoxicity

ACS Paragon Plus Environment

43

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 58

assessment of NPs (diameter < 1000nm) using in vitro cell systems. All studies obtained by this search were evaluated for inclusion in the publications pool for analysis. Eligibility of a study for the meta-analysis was based on the following selection criteria: NPs described by at least core material, size and dose, specified cell type and cell exposure time to NPs, and clear presentation of the average cell viability/toxicity ± standard deviation/error. To limit the heterogeneity of the pool to enable sound conclusions, data points developed using only commonly used cell viability/cytotoxicity assays were selected: neutral red uptake assay (NR), mitochondrial toxicity assays using tetrazolium salts-based assays and others (MTT, MTS, XTT, WST-1 and WST-8), ATP bioluminescence assays, lactate dehydrogenase assay (LDH), resazurin (Alamar blue) live cell assay and live/dead (cell membrane integrity) assays. Metalloid-based NPs as well as loaded-NPs were out of the scope of this study.

Having sorted the data into a potentially more relevant list, publications were thoroughly read to manually extract data on the physicochemical parameters of the NPs (core and coat material, size diameter, ZP (marker for surface charge)), cells name, cell exposure

ACS Paragon Plus Environment

44

Page 45 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

time to NPs, cell viability/cytotoxicity assay and % cell viability following exposure to NPs. It should be noted that the particle diameters included in the dataset were measured by different techniques (e.g. Zetasizer, microscopy). In the event that several size types were measured, a hydrodynamic diameter in solution (the size of a hypothetical hard sphere that diffuses in the same fashion as that of the particle being measured)44 better mimics the experimental condition and was therefore chosen in our analysis. Percentage confluency of the cell monolayer was not considered as a feature in our meta-analysis since it is difficult to control and was not mentioned in most papers. Although co-cultures and 3D cell models represent more realistic models to study the toxic potential of NPs, they were excluded from the study for simplicity. Only cell monoculture systems providing the basis for high-throughput analysis for nanotoxicology were considered. Plotdigitizer® freeware, version 2.6.6 (http://plotdigitizer.sourceforge.net) was used to extract the mean cell viability values presented as graph plots (i.e. standard deviations or standard errors were not included in the dataset). Properly labelled plots with an image resolution allowing for precise data extraction only were analyzed. Data points on NPs with indicated interference or colloidal instability in the cell culture medium were excluded. ZPs

ACS Paragon Plus Environment

45

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 58

measured in buffers or cell culture media were disregarded (left blank) due to possible false artefacts from the dissolved salts.1 Further extracted data included yes/no fields on NP interference check, colloidal stability in culture media check, conducting a positive control.

To prepare the data for machine learning, data were harmonized for the units of the NP size (nm), ZP (mV) and concentration (μM), as well as exposure time (h). Additional descriptive attributes were added for the NPs (type: organic or inorganic), the cells (whether they are cell lines or primary cells, whether they are human or animal cells, animal source (as applicable), cell morphology, cell age, and organ or tissue source) and the assay method (assay reagent and biochemical metric); see Supplementary nanoparticle dataset file.

Machine learning

The methods used for machine learning followed the general approach adopted earlier by Asgarian et al.45 In specific, prepared data (Supplementary nanoparticle dataset file) for machine learning analysis was an n x p matrix, where n = 2896 rows each corresponds

ACS Paragon Plus Environment

46

Page 47 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

to a NP, and p = 15 columns each corresponds to a feature of the NPs—here the NP type, diameter, concentration, etc. The final column is the % cell viability: “Low” for 50%. In this paper, we used decision trees. In specific, we used the J48 learning algorithm (within the WEKA software package.46 Its basic ‘‘LearnDT’’ algorithm has two parts. The first, GrowDT, finds the feature that produces the best split of the initial data, then recurs, forming new subtrees using only the instances that have the same specified range of values of that feature. This continues until reaching a node that is ‘‘pure’’, i.e., all of the instances that reach that node have the same label. LearnDT’s second step, PruneDT, will remove some parts of this tree; this is done to avoid overfitting.47 Empirical evidence, across a wide variety of datasets, shows that this second step is essential for good performance.48 The learning algorithm takes as input the complete dataset, and produces a classifier. A classifier corresponds to some relevant patterns in the dataset that connect some set of features with the known % cell viability. The classifier then takes as input a description of the new data (new NP or existing NP at different NP-, cell- and test-related attributes, i.e.

ACS Paragon Plus Environment

47

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 58

a vector of 15 features) with unknown % cell viability label (but with known features in the dataset) and returns a prediction: Low or high % cell viability. The goal is to have a classifier with accuracy higher than the baseline that will typically make correct predictions for new instances (NPs with unknown cell viability). This is called the ‘‘generalization accuracy’’. This score is used to decide whether one should use this classifier in practice or a different one. Details on the actual algorithm, e.g., how to select the best feature, how to deal with missing values, how to prune, etc., are reviewed and thoroughly explained by Quilan,49 and Witten and Frank.47 Using decision trees together with feature selection, we could successfully predict the % cell viability of NPs with an accuracy better than baseline classifier (which simply predicts the majority class) using 10-fold cv. To evaluate the accuracy of the classifier, we have used the ‘‘cross-validation approach’’; see Figure 5: Here, we first split entire dataset D into k disjoint partitions D = D1 ꓴ…ꓴ Dk (here k = 3), then define D-i = D - Di as the complement of the subset Di. Next, for each i = 1…k, D-i is used to produce a classifier, L(D-i). Then this L(D-i) classifier is evaluated on

Di (i.e., compute eval(L(D-i), Di). Note this evaluation should be fair as Di was not used to create L(D-i). Finally, the accuracies in each of the k-fold sized dataset are used to

ACS Paragon Plus Environment

48

Page 49 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

calculate the mean and variance; this approximates the true accuracy of the classifier

L(D) built using the entire set D. Our analysis used 10 fold cross-validation (cv).

ACS Paragon Plus Environment

49

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 58

Figure 5: Evaluation of a decision tree classifier using cross-validation.

ACS Paragon Plus Environment

50

Page 51 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

ASSOCIATED CONTENT

Supporting Information (word file): containing supplementary cross validation results, supplementary feature selection algorithms results, and supplementary machine learning results.

Compiled literature nanoparticle dataset (Excel file)

Supplementary Figure 2 (PDF file): Decision tree 1 (DT1), in top-down or hierarchal format, learnt from 9 features (nanoparticle (NP) type, NP diameter, NP concentration, cells, cell age, cell source, cell type (animal/human), cell exposure time to NP and test indicator used to determine cell viability).

Supplementary Figure 3 (PDF file): Decision tree 1 (DT1), in spring model format, learnt from 9 features (nanoparticle (NP) type, NP diameter, NP concentration, cells, cell age, cell source, cell type (animal/human), cell exposure time to NP and test indicator used to determine cell viability).

ACS Paragon Plus Environment

51

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 58

Supplementary Figure 4 (PDF file): Decision tree 6 (DT6) learnt from all features under study. DT6 is limited to experiments conducted on immortalized cell lines.

This material is available free of charge via the Internet at http://pubs.acs.org.

AUTHOR INFORMATION

Corresponding Author *Authors to whom correspondence should be addressed:

Hagar I. Labouta, e-mail: [email protected]

David T. Cramb, e-mail: [email protected]

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

ACKNOWLEDGMENT

ACS Paragon Plus Environment

52

Page 53 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

This study was financially supported by an Eyes High Postdoctoral Scholarship from the University of Calgary, an Alberta Innovates-Technology Futures (AITF) Fellowship, NSERC Discovery Grant, CFI grant and NSERC-CIHR Collaborative Health Research Projects (CHRP) grant.

REFERENCES 1.

Rivera-Gil, P.; Jimenez De Aberasturi, D.; Wulf, V.; Pelaz, B.; Del Pino, P.; Zhao,

Y.; De La Fuente, J. M.; Ruiz De Larramendi, I.; Rojo, T.; Liang, X.-J.; Parak, W. J., The Challenge to Relate the Physicochemical Properties of Colloidal Nanoparticles to Their Cytotoxicity. Acc Chem Res 2013, 46, 743-749. 2.

Horev-Azaria, L.; Kirkpatrick, C. J.; Korenstein, R.; Marche, P. N.; Maimon, O.;

Ponti, J.; Romano, R.; Rossi, F.; Golla-Schindler, U.; Sommer, D.; Uboldi, C.; Unger, R. E.; Villiers, C., Predictive Toxicology of Cobalt Nanoparticles and Ions: Comparative in

Vitro Study of Different Cellular Models Using Methods of Knowledge Discovery from Data. Toxicol Sci 2011, 122, 489-501. 3.

Sayes, C. M.; Smith, P. A.; Ivanov, I. V., A Framework for Grouping

Nanoparticles Based on Their Measurable Characteristics. Int J Nanomedicine 2013, 8, 45-56. 4.

Casman, E. A.; Gernand, J. M., Nanotoxicology: Seeing the Trees for the Forest.

Nat Nanotechnol 2016, 11, 405-407. 5.

Lynch, I.; Weiss, C.; Valsami-Jones, E., A Strategy for Grouping of

Nanomaterials Based on Key Physico-Chemical Descriptors as a Basis for Safer-byDesign Nms. Nano Today 2014, 9, 266-270. 6.

Linkov, I.; Bates, M. E.; Trump, B. D.; Seager, T. P.; Chappell, M. A.; Keisler, J.

M., For Nanotechnology Decisions, Use Decision Analysis. Nano Today 2013, 8, 5-10.

ACS Paragon Plus Environment

53

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

7.

Page 54 of 58

Gernand, J. M.; Casman, E. A., A Meta-Analysis of Carbon Nanotube Pulmonary

Toxicity Studies—How Physical Dimensions and Impurities Affect the Toxicity of Carbon Nanotubes. Risk Anal 2014, 34, 583-597. 8.

Simkó, M.; Tischler, S.; Mattsson, M.-O., Pooling and Analysis of Published in

Vitro Data: A Proof of Concept Study for the Grouping of Nanoparticles. Int J Mol Sci 2015, 16, 26211-26236. 9.

Oh, E.; Liu, R.; Nel, A.; Gemill, K. B.; Bilal, M.; Cohen, Y.; Medintz, I. L., Meta-

Analysis of Cellular Toxicity for Cadmium-Containing Quantum Dots. Nat Nanotechnol 2016, 11, 479-486. 10.

Kotsiantis, S. B., Decision Trees: A Recent Overview. Artif Intell Rev 2013, 39,

261-283. 11.

Witten, I. H.; Frank, E.; Hall, M. A.; Pal, C. J., Data Mining: Practical Machine

Learning Tools and Techniques. Morgan Kaufmann: 2016. 12.

Rischitor, G.; Parracino, M.; La Spina, R.; Urbán, P.; Ojea-Jiménez, I.; Bellido,

E.; Valsesia, A.; Gioria, S.; Capomaccio, R.; Kinsner-Ovaskainen, A.; Gilliland, D.; Rossi, F.; Colpo, P., Quantification of the Cellular Dose and Characterization of Nanoparticle Transport During in Vitro Testing. Part Fibre Toxicol 2016, 13, 47. 13.

Labouta, H. I.; Sarsons, C.; Kennard, J.; Gomez-Garcia, M. J.; Villar, K.; Lee, H.;

Cramb, D. T.; Rinker, K. D., Understanding and Improving Assays for Cytotoxicity of Nanoparticles: What Really Matters? RSC Advances 2018, 8, 23027-23039. 14.

Teeguarden, J. G.; Hinderliter, P. M.; Orr, G.; Thrall, B. D.; Pounds, J. G.,

Particokinetics in Vitro: Dosimetry Considerations for in Vitro Nanoparticle Toxicity Assessments. Toxicol Sci 2007, 95, 300-312. 15.

Limbach, L. K.; Wick, P.; Manser, P.; Grass, R. N.; Bruinink, A.; Stark, W. J.,

Exposure of Engineered Nanoparticles to Human Lung Epithelial Cells:  Influence of Chemical Composition and Catalytic Activity on Oxidative Stress. Environ Sci Technol 2007, 41, 4158-4163. 16.

Sohaebuddin, S. K.; Thevenot, P. T.; Baker, D.; Eaton, J. W.; Tang, L.,

Nanomaterial Cytotoxicity Is Composition, Size, and Cell Type Dependent. Part Fibre

Toxicol 2010, 7, 22.

ACS Paragon Plus Environment

54

Page 55 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

17.

Jia, G.; Wang, H.; Yan, L.; Wang, X.; Pei, R.; Yan, T.; Zhao, Y.; Guo, X.,

Cytotoxicity of Carbon Nanomaterials:  Single-Wall Nanotube, Multi-Wall Nanotube, and Fullerene. Environ Sci Technol 2005, 39, 1378-1383. 18.

Zhang, C.; Wängler, B.; Morgenstern, B.; Zentgraf, H.; Eisenhut, M.; Untenecker,

H.; Krüger, R.; Huss, R.; Seliger, C.; Semmler, W.; Kiessling, F., Silica- and Alkoxysilane-Coated Ultrasmall Superparamagnetic Iron Oxide Particles:  A Promising Tool to Label Cells for Magnetic Resonance Imaging. Langmuir 2007, 23, 1427-1434. 19.

Natarajan, M.; Mohan, S.; Martinez, B. R.; Meltz, M. L.; Herman, T. S.,

Antioxidant Compounds Interfere with the 3. Cancer Detect Prev 2000, 24, 405-14. 20.

Vellonen, K.-S.; Honkakoski, P.; Urtti, A., Substrates and Inhibitors of Efflux

Proteins Interfere with the Mtt Assay in Cells and May Lead to Underestimation of Drug Toxicity. Eur J Pharm Sci 2004, 23, 181-188. 21.

Scudiero, D. A.; Shoemaker, R. H.; Paull, K. D.; Monks, A.; Tierney, S.; Nofziger,

T. H.; Currens, M. J.; Seniff, D.; Boyd, M. R., Evaluation of a Soluble Tetrazolium/Formazan Assay for Cell Growth and Drug Sensitivity in Culture Using Human and Other Tumor Cell Lines. Cancer Res 1988, 48, 4827-4833. 22.

Huang, K. T.; Chen, Y. H.; Walker, A. M., Inaccuracies in Mts Assays: Major

Distorting Effects of Medium, Serum Albumin, and Fatty Acids. Biotechniques 2004, 37, 406-412. 23.

Galluzzi, L.; Aaronson, S. A.; Abrams, J.; Alnemri, E. S.; Andrews, D. W.;

Baehrecke, E. H.; Bazan, N. G.; Blagosklonny, M. V.; Blomgren, K.; Borner, C.; Bredesen, D. E.; Brenner, C.; Castedo, M.; Cidlowski, J. A.; Ciechanover, A.; Cohen, G. M.; De Laurenzi, V.; De Maria, R.; Deshmukh, M.; Dynlacht, B. D., et al., Guidelines for the Use and Interpretation of Assays for Monitoring Cell Death in Higher Eukaryotes.

Cell Death Differ 2009, 16, 1093-1107. 24.

Han, X.; Gelein, R.; Corson, N.; Wade-Mercer, P.; Jiang, J.; Biswas, P.;

Finkelstein, J. N.; Elder, A.; Oberdorster, G., Validation of an LDH Assay for Assessing Nanoparticle Toxicity. Toxicology 2011, 287, 99-104. 25.

Kroll, A.; Pillukat, M. H.; Hahn, D.; Schnekenburger, J., Interference of

Engineered Nanoparticles with in Vitro Toxicity Assays. Arch Toxicol 2012, 86, 11231136.

ACS Paragon Plus Environment

55

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

26.

Page 56 of 58

Casey, A.; Herzog, E.; Davoren, M.; Lyng, F. M.; Byrne, H. J.; Chambers, G.,

Spectroscopic Analysis Confirms the Interactions between Single Walled Carbon Nanotubes and Various Dyes Commonly Used to Assess Cytotoxicity. Carbon 2007, 45, 1425-1432. 27.

Casey, A.; Davoren, M.; Herzog, E.; Lyng, F. M.; Byrne, H. J.; Chambers, G.,

Probing the Interaction of Single Walled Carbon Nanotubes within Cell Culture Medium as a Precursor to Toxicity Testing. Carbon 2007, 45, 34-40. 28.

Wilhelmi, V.; Fischer, U.; van Berlo, D.; Schulze-Osthoff, K.; Schins, R. P. F.;

Albrecht, C., Evaluation of Apoptosis Induced by Nanoparticles and Fine Particles in Raw 264.7 Macrophages: Facts and Artefacts. Toxicol in Vitro 2012, 26, 323-334. 29.

Guadagnini, R.; Halamoda Kenzaoui, B.; Cartwright, L.; Pojana, G.;

Magdolenova, Z.; Bilanicova, D.; Saunders, M.; Juillerat, L.; Marcomini, A.; Huk, A.; Dusinska, M.; Fjellsbø, L. M.; Marano, F.; Boland, S., Toxicity Screenings of Nanomaterials: Challenges Due to Interference with Assay Processes and Components of Classic in Vitro Tests. Nanotoxicology 2015, 9, 13-24. 30.

Monteiro-Riviere, N. A.; Inman, A. O.; Zhang, L. W., Limitations and Relative

Utility of Screening Assays to Assess Engineered Nanoparticle Toxicity in a Human Cell Line. Toxicol Appl Pharmacol 2009, 234, 222-235. 31.

Griffiths, S. M.; Singh, N.; Jenkins, G. J. S.; Williams, P. M.; Orbaek, A. W.;

Barron, A. R.; Wright, C. J.; Doak, S. H., Dextran Coated Ultrafine Superparamagnetic Iron Oxide Nanoparticles: Compatibility with Common Fluorometric and Colorimetric Dyes. Anal Chem 2011, 83, 3778-3785. 32.

Belyanskaya, L.; Manser, P.; Spohn, P.; Bruinink, A.; Wick, P., The Reliability

and Limits of the Mtt Reduction Assay for Carbon Nanotubes–Cell Interaction. Carbon 2007, 45, 2643-2648. 33.

Shukla, R. K.; Sharma, V.; Pandey, A. K.; Singh, S.; Sultana, S.; Dhawan, A.,

Ros-Mediated Genotoxicity Induced by Titanium Dioxide Nanoparticles in Human Epidermal Cells. Toxicol in Vitro 2011, 25, 231-241. 34.

Fröhlich, E., The Role of Surface Charge in Cellular Uptake and Cytotoxicity of

Medical Nanoparticles. Int J Nanomedicine 2012, 7, 5577-5591.

ACS Paragon Plus Environment

56

Page 57 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

35.

Hauck, T. S.; Ghazani, A. A.; Chan, W. C. W., Assessing the Effect of Surface

Chemistry on Gold Nanorod Uptake, Toxicity, and Gene Expression in Mammalian Cells. Small 2008, 4, 153-159. 36.

Lee, T.-C.; Ho, I.-C., Differential Cytotoxic Effects of Arsenic on Human and

Animal Cells. Environ Health Perspect 1994, 102, 101-105. 37.

Fatisson, J.; Quevedo, I. R.; Wilkinson, K. J.; Tufenkji, N., Physicochemical

Characterization of Engineered Nanoparticles under Physiological Conditions: Effect of Culture Media Components and Particle Surface Coating. Colloids and Surfaces B:

Biointerfaces 2012, 91, 198-204. 38.

Ahamed, M.; Siddiqui, M. A.; Akhtar, M. J.; Ahmad, I.; Pant, A. B.; Alhadlaq, H.

A., Genotoxic Potential of Copper Oxide Nanoparticles in Human Lung Epithelial Cells.

Biochem Biophys Res Commun 2010, 396, 578-583. 39.

Segal, D.; Makris, S. L.; Kraft, A. D.; Bale, A. S.; Fox, J.; Gilbert, M.; Bergfelt, D.

R.; Raffaele, K. C.; Blain, R. B.; Fedak, K. M.; Selgrade, M. K.; Crofton, K. M., Evaluation of the Toxrtool’s Ability to Rate the Reliability of Toxicological Data for Human Health Hazard Assessments. Regul Toxicol Pharmacol 2015, 72, 94-101. 40.

Koch, M. S.; DeSesso, J. M.; Williams, A. L.; Michalek, S.; Hammond, B.,

Adaptation of the Toxrtool to Assess the Reliability of Toxicology Studies Conducted with Genetically Modified Crops and Implications for Future Safety Testing. Crit Rev

Food Sci Nutr 2016, 56, 512-526. 41.

Labouta, H. I.; Menina, S.; Kochut, A.; Gordon, S.; Geyer, R.; Dersch, P.; Lehr,

C.-M., Bacteriomimetic Invasin-Functionalized Nanocarriers for Intracellular Delivery. J

Controlled Release 2015, 220, Part A, 414-424. 42.

Nafee, N.; Schneider, M.; Schaefer, U. F.; Lehr, C.-M., Relevance of the Colloidal

Stability of Chitosan/Plga Nanoparticles on Their Cytotoxicity Profile. Int J Pharm 2009,

381, 130-139. 43.

Tropsha, A., Best Practices for Qsar Model Development, Validation, and

Exploitation. Mol Inf 2010, 29, 476-488. 44.

Hirschle, P.; Preiß, T.; Auras, F.; Pick, A.; Völkner, J.; Valdepérez, D.; Witte, G.;

Parak, W. J.; Rädler, J. O.; Wuttke, S., Exploration of Mof Nanoparticle Sizes Using

ACS Paragon Plus Environment

57

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 58

Various Physical Characterization Methods – Is What You Measure What You Get?

Cryst Eng Comm 2016, 18, 4359-4368. 45.

Asgarian, N.; Hu, X.; Aktary, Z.; Chapman, K. A.; Lam, L.; Chibbar, R.; Mackey,

J.; Greiner, R.; Pasdar, M., Learning to Predict Relapse in Invasive Ductal Carcinomas Based on the Subcellular Localization of Junctional Proteins. Breast Cancer Res Treat 2010, 121, 527-538. 46.

Eibe, F.; Hall, M.; Witten, I.; Pal, J., The Weka Workbench. Online Appendix for

“Data Mining: Practical Machine Learning Tools and Techniques 2016, 4. 47.

Witten, I. H.; Frank, E., Data Mining: Practical Machine Learning Tools and

Techniques. 2nd Edition ed.; San Francisco, Morgan Kaufmann Publishers: 2005. 48.

Bishop, C., Pattern Recognition and Machine Learning. Springer: Berlin, 2006.

49.

Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann: San

Mateo, Calif, 2003.

ACS Paragon Plus Environment

58