Machine Learning for Silver Nanoparticle Electron Transfer Property

Molecular & Materials Modelling Laboratory, DATA61 CSIRO, Door 34 Goods Shed, Village Street, Docklands VIC, 3008, Australia. J. Chem. Inf. Model. , 2...
0 downloads 0 Views 10MB Size
Subscriber access provided by Purdue University Libraries

Article

Machine Learning for Silver Nanoparticle Electron Transfer Property Prediction Baichuan Sun, Michael Fernandez, and Amanda S Barnard J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00272 • Publication Date (Web): 22 Sep 2017 Downloaded from http://pubs.acs.org on September 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Machine Learning for Silver Nanoparticle Electron Transfer Property Prediction Baichuan Sun,



Michael Fernandez, and Amanda S. Barnard

Molecular & Materials Modelling Laboratory, DATA61 CSIRO, Door 34 Goods Shed, Village St, Docklands VIC, Australia 3008, Australia E-mail: [email protected] Phone:

+61 3 9662 7109

Abstract Nanoparticles exhibit diverse structural and morphological features that are often inter-connected, making the correlation of structure/property relationships challenging. In this study a multi-structure/single-property relationship of silver nanoparticles is developed for the energy of Fermi level, which can be tuned to improve the transfer of electrons in a variety of applications. By combining dierent machine learning analytical algorithms, including k -mean, logistic regression and random forest with electronic structure simulations, we nd that the degree of twinning (characterised by the fraction of hexagonal closed packed atoms) and the population of {111} facet (characterized by a surface coordination number of 9) are strongly correlated to the Fermi energy of silver nanoparticles. A concise 3 layer articial neural network together with principal component analysis is built to predict this property, with reduced geometrical, structural and topological features, making the method ideal for ecient and accurate high-throughput screening of large-scale virtual nanoparticles libraries, and the creation of single-structure/single-property, multi-structure/single-property and singlestructure/multi-property relationships in the near future. 1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 32

Introduction Predicting structure/property relations for nanomaterials has been a focus area of numerous computational and theoretical studies over the past decades.

Applying them in practise,

however, has proven to be more problematic than the initial predictions suggest.

One of

the reasons for this is that the structure/property relationship paradigm implies a single structural feature is uniquely responsible for a single property. This type of relationship is theoretically satisfying, and true of some well known systems,

13

but can lead to mislead-

ing assumptions that hinder development in many cases. For example, this implication is partially responsible for the ubiquitous assumption that nanoparticles have to be perfect to perform well, and that modifying a particular structural feature to tune a property necessitates that all other features remain preserved. Not only is this assumption incorrect,

47

it is

rarely possible in reality. Typically the properties of nanoscale materials are linked to more than one structural feature, if only loosely, such that tuning based on one feature fails to deliver the desired results.

To overcome this inconvenient reality we must identify and characterise multi-

structure/single-property relationships where the connection is robust against variation in all other quantities. The modication of the important structural features should not necessitate consideration of any others, provided they are the right ones.

This will ultimately

lead to more scalable and cost eective manufacturing strategies, but predicting multistructure/single-property relationships is still challenging.

Nanoparticles exhibit diverse

structural and morphological features, including a range of energetically similar sizes, geometric shapes, surface topologies and defects; many of which are interconnected. This is an ideal problem for machine learning (ML) methods, which take advantage of available datasets and predict strucutre/property relationships with high eciency and accuracy.

8

ML has been shown to provide insight into complex multi-structure correlations

that are dicult to identify using conventional computational methods,

1012

9

as well as deal-

ing with the deluge of data emanating from high-throughput screening of large-scale virtual

2

ACS Paragon Plus Environment

Page 3 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

nanoparticles libraries.

13,14

By combining ecient electronic structure simulations, appro-

priate ML algorithms (such as articial neural networks (ANNs), and

k -means

clustering),

and a suciently large and diverse ensemble of candidate nanostructures, it is possible to identify the set of features that drive performance, and the particles that best represent the properties of the entire sample. This provides a guide as to where experimental eort should be focussed, and which structures should form the basis for more detailed computational studies.

9,15

To test this hypothesis we have elected to investigate the charge transfer properties of silver nanoparticles, since potential applications span the optical, chemical and medical domains. Depending on the size and shape, silver nanoparticles have been developed for optical labeling,

16,17

contrast enhancement,

18,19

chemical and biological sensors,

surface-enhanced Raman spectroscopy (SERS), tiviral agents.

2735

2426

2023

substrates for

and antimicrobial, antifungal, and an-

For this reason controlling the size and shape of silver has been a topic

of intense investigation. Recently progress in synthesis technologies has made it possible to produce silver nanocrystals with well-dened sizes and shapes,

16,3647

but while the quality

of sample is improving, heterogeneity, polydispersivity and imperfections have not yet been eliminated. Alternatively, idealised structures with perfect features can be created and characterised computationally.

26,4850

Various computational methods have been applied to study of silver

nanoparticles, and provide valuable information on how structure can impact performance in dierent scientic and engineering domains.

5153

Results of this type also provide a suitable

basis for statistical analysis since a diverse range of descriptors can be extracted, without replication or redundancy.

54

Computational data, or

capta, is also exquisitely reproducible.

In this study, we use electronic structure methods to investigate the relationship between the Fermi energy,

EF ermi ,

of silver nanoparticles, and an array of dierent structural and

morphological features. Dierent ML algorithms are adopted to analyse the dataset, including

k -means,

logistic regression (LR) and random forest (RF) modelling for feature quality

3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 32

analytics, principal components analysis (PCA) and 3 layer articial neural network (ANN) regression based on selected features, and

k -fold cross-validation for recursive anomaly detect-

ing. We nd that these methods successfully map the Fermi energy onto simple geometrical, structural, topological and morphological features, providing precise information about this multi-structure/single-property relationship that can be used to guide future research and development.

This study also oers practical insight into the implementation of ML to a

quantum mechanical dataset, while achieving high accuracy and generality.

Electronic Structure and Data Analytic Methods Dataset characterisation and collection The dataset used in this study contains 425 silver nanoparticles with a diameter between 0.5 nm and 4.9 nm (13 to 2947 atoms), and a range of dierent morphologies dened by zonohedrons enclosed by {100}, {110}, {111}, {210}, {113}, {331} and {123}. These are illustrated geometrically in Fig. 1, including the (a) tetrahedron, (b) octahedron, (c) truncated octahedron, (d) cuboctahedron, (e) truncated cube, (f ) truncated rhombic dodecahedron, (g) rhombic dodecahedron, (h) rhombi-truncated octahedron, (i) modied truncated octahedron (also known as a doubly-truncated octahedron), (j) small rhombicuboctahedron, (k) great rhombicuboctahedron, (l) tetrakis hexahedron, (m) trapezohedron, (n) triakis octahedron, (o) hexakis octahedron, and (p) decahedron and (q) icosahedron.

Some minor modica-

tions (edge removal) and truncations (corner removal) were also included as it is dicult to preserve the exact geometric shape in each structure due to the constraints of the face centered cubic (FCC) lattice, and generally not possible to make a structure of each shape with equivalent numbers of atoms. In each case care was taken to include a range of sizes for each shape, and all proceeding discussions involve

trends in the predicted properties.

To accommodate the need for all of these simulations on structures we have used the density functional tight-binding method with self-consistent charges (SCC-DFTB), which

4

ACS Paragon Plus Environment

Page 5 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

(b)

(g)

(l) Figure 1:

(c)

(h)

(m)

(d)

(i)

(e)

(j)

(n)

(o)

(f )

(k)

(p*)

(q)

Morphologies represented in the dataset used in this study:

(a) tetrahedron,

(b) octahedron, (c) truncated octahedron, (d) cuboctahedron, (e) truncated cube, (f ) truncated rhombic dodecahedron, (g) rhombic dodecahedron, (h) rhombi-truncated octahedron, (i) modied truncated octahedron, (j) small rhombicuboctahedron, (k) great rhombicuboctahedron, (l) tetrakis hexahedron, (m) trapezohedron, (n) triaxis octahedron, (o) hexakis octahedron, (p) decahedron* and (q) icosahedron. *Note that this shape is an example, and that this group contains signicant intra-class polydispersivity.

was implemented in the DFTB+ code.

5557

The SCC-DFTB is an approximate quantum

chemical approach where the KohnSham density functional is expanded to second order around a reference electron density. The reference density is obtained from self-consistent density functional calculations of weakly conned neutral atoms within the generalized gradient approximation (GGA). The connement potential is optimized to anticipate the charge density and eective potential in molecules and solids. A minimal valence basis set is used to account explicitly for the two-center tight-binding matrix elements within the DFT level. The double counting terms in the Coulomb and exchangecorrelation potential, as well as the intra-nuclear repulsion are replaced by a universal short-range repulsive potential. All structures have been fully relaxed with a conjugate gradient methodology until forces on each atom was minimized to be less than 10

−4

a.u. (i.e.

≈5

meV/Å). In all the calculations,

the HYB" set of parameters is used to describe the contributions from diatomic interactions of silver,

58,59

with a Fermi smearing temperature of 300 K. To conrm the reliable accuracy

of SCC-DFTB for the Ag nanoparticles considered, we performed a comparison with the Density Functional Theory (DFT). The detailed comparison results are provided in the Sup-

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 32

porting Information. As a data-driven study that uses a consistent computational method for all data points, the use of a higher order method would not inuence the identication of structure/property relationships, and would only serve to improve the quality of the noise. This work has been focused on correlation; undertaking erty, rather than reporting

how

a structure relates to prop-

why a structure gives rise to a property.

nanoparticles are openly provided online.

The entire set of silver

60

Although we have elected to use this computational method, the following analysis can be equally applied to the output of any computational or experimental characterisation method. Our data analytics are not specic to, or contingent on, the original data source or method of collection. All datasets that can be adequately described by a consistent set of meaningful descriptors are applicable.

Stratied and over-sampling To measure the prediction accuracy" and generality" of regression ML models and optimize hyper-parameters, the coecient of determination tion for more details).

61

R2

is used (see the Supporting Informa-

Root mean square error (RMSE) and mean absolute error (MAE)

are also calculated for easier practical evaluation.

6264

For ML analysis, it is important that a

model be comprehensive so that the accuracy is guaranteed by overcoming high-bias (undertting);

6567

and be concise so that the generality is achieved by avoiding high-variance

(over-tting).

6870

Both metrics are measured through comparisons between the

on training, cross-validation and testing partitions.

71

R2

scores

A proper splitting of dataset into these

three partitions would eventually conrm a ML models' accuracy and generality, but problems occur when data is imbalanced across dierent sub-ranges of labelling values. Random splitting methods

72

are routinely applied, but extra care needs to be taken for small-scale

datasets such as those traditionally used in the molecular, nano and materials sciences. As we will see in the following sections, the distribution of the Fermi level in our dataset of silver nanoparticles was imbalanced, and so stratied sampling was used to ensure that

6

ACS Paragon Plus Environment

Page 7 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

relative ratio (4:1)

71

was approximately preserved in each sub-ranges (strata) for training and

testing splitting. This guarantees that (1) in each strata, more data points are designated for model training than model testing, and (2) generality of the ML model will be tested over the complete range of the studied problem by its During the

k -fold

cross-validation,

study) equal sized subgroups. Of the

73,74

k

R2

score.

the training dataset was split into

k

(=3 in this

subgroups, a single validation subgroup was retained

to measure the ML model's accuracy for its

R2 ,

while the remaining

k -1

sub-samples were

used to determine the ML model's hyper-parameters. The cross-validation was repeated times, with each of the variance of

R2

from the

k

subgroups used for validation only once.

k -folds were computed,

and the

k

The mean value and

k -fold cross-validation was repeated

for dierent combinations of hyper parameters. The one giving the highest

R2 value was then

chosen. In cases where the dataset is imbalanced, ML models can be easily biased by the majority population over these strata, giving a deceivingly high

R2

score that masks under-

performance on the minor populated training strata. To overcome this issue, over-sampling within the

k -1 folds is conducted on strata with a minority population, 7577

so that the data

samples across the whole range are uniformly distributed, and the ML model is validated on the remaining 1-fold. These steps ensure that all of the data is treated in a consistent and reproducible way.

Feature analytics and selection One of the advantages of applying ML analysis is the ability to assess the impact quality of each feature with respect to the studied properties.

8

Feature selection also inuences the

ML performance: (1) irrelevant or partially relevant features could decrease the accuracy of classication and regression (constant features oer no information at all); and (2) linearly correlated features that are redundant will over complicate the model and should be reduced. In some cases, datasets show strong evidence of being clustered and can be categorised into dierence classes which could be represented by a suitable average" to reduce the

7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

data.

9,15

Page 8 of 32

The centroids of each class are the representative samples, which together form a

sub-set suitable for more detailed investigation. To measure the classication performance, we have used the result.

F1

score

78

to measure both the

In this context precision

79

precision

and the

re-call

of the testing

is the number of true positive results divided by the

number of all positive results, and reects how accurate a classier is. Recall

80

is the number

of true positive results divided by the total number of positive results that should have been returned, and indicates how comprehensive a classier is.

F1

is a metric of both and is

expressed as:

F1 = 2 ×

precision × recall precision + recall

(1)

Investigating the classifciation ML model accuracy, e.g. the logistic regression (LR),

81

based on recursively removed features and those that remain, recursive feature elimination (RFE) has been used to reliably identify which features contribute the most to predicting the target property. This was corroborated using an alternative method, the random forest (RF)

71

algorithm consists of a forest" of decision trees (DTs),

8286

each node of which splits

the dataset based on a condition of a single feature so that similar properties are grouped together.

Gini impurity

87

was then used as measurement of how similar" samples are,

and this was calculated for each feature and tree, before averaging across the forest". The impact of features were then ranked based on the averaged impurity. Using both LR and RF algorithms ensures that samples are associated with their features, and we focus on only the features that are important. When using relatively small datasets (such as the 425 silver nanoparticles used herein), it is especially important to overcome high-variance (over-tting) during ML regression modelling. Following the feature selection described above, PCA

88

is applied to systematically

reduce the number of important features dimension by transforming the data into the space dened by the so-called principal components" (PCs). There PCs are expressed as linear

8

ACS Paragon Plus Environment

Page 9 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

combinations of the original features as described by:

X = t1 p01 + t2 p02 + ... + tA p0A + E = T P 0 + E

In Equation 2,

X

components, and

is the original data matrix,

E

is the residual matrix.

A

(2)

is the total number of extracted principal

The new latent variables,

t

scores, dene the

importance of each principal component ( p) when reconstructing the original data this study,

99%

of the variance is retained to determine

X.

In

A.

Articial neural network regressor Considerable progress has recently been made to the development of articial neural networks (ANN)

89,90

and their application to dierent scientic and engineering topics.

91,92

Among the numerous advantages oered by ANN is the ability to detect undened nonlinear relations between dependent and/or independent variables by examining large number of possible interactions between feature predictors,

93,94

which is ideal for challenging regression

problems. We have used a 3-layer ANN with same number of neurons on the hidden layer as

A

(see equation 2), which is the reduced dimension from PCA. The output layer has one

neuron predicting the property; in this case the energy of the Fermi level,

red . EFP ermi

The hyper-

parameters used to optimise for the ANN includes: (1) optimiser, (2) weights initialisation, (3) epochs of training, (4) batch size, (5) weight regulariser,

l2weights ,

and (6) activity regu-

activity 95 lariser, l2 . Self-adaptive optimisers, including RMSprop and Adam, are grid searched as they automatically adjust the learning rate and are capable of avoiding suboptimal saddle points.

96

over-tting,

Rectier activation function is adopted on the hidden layer.

L2

regularisation (weight decay or Ridge)

99

97,98

To suppress

decreases complexity of the model,

penalises neuron weights and activity (output) of extremely large magnitudes by minimising their

L2

norm, and has been proven to be ecient across dierent implementations.

weights activity larger l2 or l2 are, the stronger is the regularisation.

9

ACS Paragon Plus Environment

100

The

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

Results and Discussion The IP and EA properties are dened adiabatically with the respect to the total energy of the neutral structure

E , and the corresponding anion E −

and cation

E+

using the quasi-particle

approximation, such that:

The Fermi energy,

EF ermi

IP = E + − E

(3)

EA = E − E −

(4)

is extracted directly from the individual simulations, and band

gap is obtained from the relation:

Egap = IP − EA

(5)

The histogram plots of Fermi energy level ( EF ermi ), electron anity (EA), ionization potential (IP) and band gap ( Egap ) for the overall 425 samples are illustrated in Fig. 2. It is found that

Egap is highly correlated to the nanoparticle size (see the Supporting Infor-

mation), therefore this study has focused on property related to the transfer of electrons: the energy of the Fermi level ( EF ermi ), which is extracted directly from the individual DFTB+ simulations (results of EA and IP are provided in the Supporting Information following the same workow). It can be seen from Fig. 2a that

EF ermi

for this data ensemble has at least

2 clusters (with centroids at -3.95 eV and -3.76 eV, and boundary at -3.85 eV). There are also outliers where

EF ermi > −3.55

eV, which were removed in the following studies due to

insucient data to be meaningful.

Features of Ag nanoparticles Each individual nanoparticle has been characterised for its structural and morphological features, as listed in Tab. 1. All features are standardised by removing the mean and scaling to unit variance. The histogram plots of each feature for the overall dataset are provided

10

ACS Paragon Plus Environment

Page 11 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

(b)

(c)

(d)

Figure 2: Histograms of energetic distribution of the (a)

EF ermi , (b) EA, (c) IP and (d) Egap

over the entire silver nanoparticle dataset (425 structures).

(see the Supporting Information). From the histogram plots of the characterised dataset, it can be seen that SCN1, SCN2, and SCN12, are constant zeros for all structures.

It is also found that ABNum, AGNum

and PDEN are all linearly correlated to NUM. Therefore these feature are omitted from further analysis, and the remaining 25 features listed in Tab. Fig.

1 are retained.

2, it can be seen that we have an imbalanced distribution of

EF ermi

Based on

across dierent

sub-ranges. In some bins there are more samples, while others have very few. In this case the stratied sampling ensured the relative frequencies are approximately preserved in each bin for training and testing partitions.

This also guarantees that (1) in each sub-ranges,

more data points are adopted for model training than model testing, and (2) the complete range of the studied property,

EF ermi ,

would be tested. In this study, 80% of data is used

for model training, while 20% of data is used for testing, as illustrated in Fig. 3a.

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1: Feature Characterisation of Ag Nanoparticle. abbreviation

feature

SP

shape

NUM

number of atoms

DEN

mass density

AgCN

Ag-Ag coordinate number

VA

volume per atom

PDEN*

particle density

ABM

Ag-Ag bond mean length

ABV

Ag-Ag bond length variance

ABNum*

Ag-Ag bond number

AGM

mean value (Angle Ag-Ag-Ag)

AGV

Angle Ag-Ag-Ag Angle variance

AGNum*

total number of Ag-Ag-Ag Angle

NSA

number of surface atoms

AD

average diameter

AR

aspect ratio

FCC

face centered cubic population

HCP

hexagonal close packed population

ICO

icosahedral population

AMORPH

amorphous population

SCN1*

surface coordination number 1

SCN2*

surface coordination number 2

SCN3

surface coordination number 3

SCN4

surface coordination number 4

SCN5

surface coordination number 5

SCN6

surface coordination number 6

SCN7

surface coordination number 7

SCN8

surface coordination number 8

SCN9

surface coordination number 9

SCN10

surface coordination number 10

SCN11

surface coordination number 11

SCN12*

surface coordination number 12

* These features were omitted as discussed in following section.

12

ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

Figure 3: Histogram plots of

(b)

EF ermi

for (a) stratied split results for training and testing

after anomaly detection, where we highlighted in green the train partition and red the test partition, and (b) binary split after

k -means

clustering, with standardised centroid at 0.229

and 0.636, with boundary at 0.43.

In many practical applications of Ag nanoparticles, it is important to tune the selectivity of electron charge transfer reactions, dierent features listed in Tab.

54

and it is interesting to have an insights into how

1 impact the selectivity of

data was split into sub-classes with

k -mean

EF ermi .

For this purpose, the

clustering, and the number of sub-classes was

determined using the elbow criteria (see the Supporting Information for more details). The resultant classes retain 78.01% of the explained variance, and have the standardised centroids at 0.229 and 0.636, with a boundary at 0.43, as shown in Fig. 3b. As illustrated in Fig. 4a, LR based on binary classications with all 25 selected features on the testing dataset reaches 91% and 100% precision for both sub-classes, and 100% and 90% recall, respectively. The accuracy score of RFE with LR for each features is shown in Fig.

4b.

We found that the HCP, SCN9, AMORPH, SCN6 and SP features individually

provides over

80% accuracy for the binary classication.

Similarly, the RF model accurately

classied the testing dataset into 2 sub-classes as give in Fig. 4c. The scores of each feature and its variance across 1,000 trees" is shown in Fig. 4d. This result conrm the HCP, SP, AMORPH, SCN6 and SCN9 features are the most important (see Table 1 for denitions).

13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4:

(a)

(b)

(c)

(d)

Page 14 of 32

Binary classication (a) confuse matrix of LR model performance on testing

dataset, (b) accuracy of LR with each individual features on testing dataset, (c) confuse matrix of RF model performance on testing dataset, and (d) average score of each feature (red bar) and variance (blue error bar) over 1,000 DTs from RF.

Using only these features we can accurately predict the selectivity of

EF ermi

for this Ag

nanoparticle dataset. As mentioned above, structural and morphological features are often inter-connected. We found that decahedra and icosahedra are correlated with HCP population, which is intuitive

14

ACS Paragon Plus Environment

Page 15 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

(b)

(c)

(d)

Figure 5: Ag nanoparticles characterised for surface and bulk features (SCN6: yellow, SCN9: red, HCP: purple, Other: powder blue). Listed here are: (a) icosahedron with dominating

EF ermi < −3.85 eV, (b) decahedron with dominating SCN9 population and EF ermi < −3.85 eV, (c) rhombi-truncated octahedron with dominating SCN6 and AMORPH population and EF ermi > −3.85 eV, and (d) tetrakis hexahedron with no HCP population and EF ermi > −3.85 eV. HCP population and

as they are the only nanoparticles containing twin planes. We also nd that HCP population is correlated with AMORPH, SCN6 and SCN9 (more strongly to the latter than the former; see the Supporting Information for more details). Given the size- and temperature-dependent phase diagram of Ag nanoparticle

26

predicts that small icosahedron and intermediate-size

decahedron (which are enriched with HCP and SCN9 populations), are expected at room temperature, while other morphologies would emerge within other size- and temperature ranges, we anticipate that temperature dependent synthesis may provide a way to tune the selectivity of

EF ermi

for Ag nanoparticles in the range of 0.5 nm to 4.9 nm.

Examples

are shown in Fig. 5 illustrating how dierent shapes enriched with dierent population of

15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AMORPH, SCN6, SCN9 and HCP impact

Page 16 of 32

EF ermi .

ANN predictor From PCA on the selected 25 features, we found that 16 principal components are needed to achieve over

99%

of the cumulative explained variance (see the Supporting Information).

With the 16 principal components to adopt, 16 neurons formed the input layer of the fully connected ANN, with bias neurons deployed as intercepts of the activation function.

89

The

nal architecture of which is shown in Fig. 6.

Figure 6: Architecture of the 3 layer fully-connected ANN regressor. Each Ag nanoparticle is characterised for its 25 structural and morphological features based on Tab.

1.

After

PCA, 16 principal components are adopted by the input layer of ANN (X1-X16). The same number of neurons are deployed on hidden layer (H1-H16). used as output for

EF ermi .

Single output neuron (O1) is

For hidden and output layers, bias neurons B1 and B2 is deployed.

Anomalous data samples were identied as nanoparticles that do not conform to other items in a dataset, which means their properties are poorly described by the features of the

16

ACS Paragon Plus Environment

Page 17 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

majority population.

This is not indicative of the ability (or inability) of any particular

electronic structure methods to calculate the property in question, since the same computational method as been used for all structures, but the ability of the majority to represent the minority (regardless of the method of data generation). These nanoparticles are eectively noise, and were omitted to obtain an accurate and general ANN regressor that models the majority of the dataset well. Without this anomaly removal procedure, performance of the ANN regressor could not be fairly evaluated and as a result the hyper-parameters space may not be explored eciently. During the cross-validation, we found that and the variance of

R2

R2 varies across k -folds

appeared to be abnormal. By comparing the predicted

its original value across all

k -folds using l2

EF ermi

and

norm, as shown in the Fig. 7a, we conrmed that

this was due to the anomalies (outliers). After removing anomalous nanoparticles, was repeated recursively until the variance of

R2

across all

k -fold

k -fold CV

CVs is achieved

< 0.1,

as

shown in Fig. 7b. The hyper-parameters of ANN regressor that gives a stable performance across all

k -folds

were identied as a result, with only 14 nanoparticles requiring omission.

(a)

Figure 7:

l2

norm of

R2

(b)

error for cross-validation (a) prior and (b) post anomaly detection.

After grid searching of a range of hyper-parameters, the RMSprop" gradient descent

weights optimiser, together with l2

= 0.0012, and l2activity = 0.002 was applied. 17

ACS Paragon Plus Environment

The ANN model

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

was trained after 500 epochs with a batch size of 5. The summaried

R2

and testing dataset is illustrated in Fig. 8, where we can see that reached

91 ± 1.0%,

compared to

95%

for the training dataset.

of the ANN model with the determined hyper-parameters, the the testing dataset.

R2

for training, CV

score for the 3-fold CV

To measure the generality

R2

of

93%

was obtained for

This result indicates that the ANN model overcomes both the high

bias (under-tting) and high variance (over-tting) extremely well, given amount of data available. The l2 norm of

R2

error for CV and testing dataset is provided in the Supporting

Information. Also provided is a comparison between regression results from the multivariate linear regression, Gaussian processes regression, boosting tree.

101103

random forest, as well as the gradient

104,105

Conclusions Silver nanoparticles have tremendous application potential in many scientic and engineering areas, including the optical, chemical and medical domains.

By combining electronic

structure methods with machine learning algorithms, we have investigated the relationship between the electron charge transfer property of 425 virtual silver nanoparticles, and a variety of structural and morphological features. This work has successfully mapped the Fermi energy onto simple features that can be extracted from initial unrelaxed" structures, using

k -mean,

logistic regression and random forest algorithms. We have shown that icosahedra

and decahedra, which are enriched with hexagonal close packed (twins) and surface atoms with an atomic coordination number of 9, give rise to unique Fermi energy level performance compare to other shapes, with an accuracy of 95.0% for logistic regression and 94.0% for random forest. When interpreted in combination with the silver nanoparticle phase diagram, this suggests that the formation conditions are an important driver o the eventual charge transfer properties. Our accurate prediction of this electron charge transfer property with the reduced geo-

18

ACS Paragon Plus Environment

Page 19 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

(b)

Figure 8:

(c)

R2

score of the ANN model on (a) cross-validation (CV), (b) training and (c) 2 testing partition of Ag nanoparticle dataset with anomaly detection. The R score for 3 2 folds CV reaches 0.91 ± 0.01, and for training is 0.95 while the testing R is 0.93.

19

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

metrical, structural and morphological features, was based on a concise three layer articial neural network and principal component analysis that is easy to implement and generally applicable to other nanoparticle systems. Anomaly detection achieved by examining the score variance across

k -fold,

R2

was essential in achieving the high accuracy of 91% and 93%

on the cross-validation and testing dataset, respectively. This study has also provided practical insight into the implementation of machine learning to quantum mechanical datasets, while preserving the high accuracy and generality that are hallmarks of these computational methods. This is an ideal workow for ecient and accurate high-throughput screening of large-scale virtual nanoparticles libraries, which can be used with condence.

Acknowledgement Computational resources for this project have been supplied by the Australian National Computing Infrastructure national facility under Grant q27. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. The characterisation of Ag dataset was supported by Dr. George Opletal.

Supporting Information Available IPython Notebook that implements data processing, feature analytic, and machine learning modelling. This information is available free of charge via the Internet at http://pubs.acs.org

References (1) Bachilo, S. M.; Strano, M. S.; Kittrell, C.; Hauge, R. H.; Smalley, R. E.; Weisman, R. B. Structure-assigned Optical Spectra of Single-walled Carbon Nanotubes.

Science 2002, 298, 23612366.

20

ACS Paragon Plus Environment

Page 21 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(2) Han, K. Y.; Willig, K. I.; Rittweger, E.; Jelezko, F.; Eggeling, C.; Hel, S. W. Threedimensional Stimulated Emission Depletion Microscopy of Nitrogen-vacancy Centers in Diamond using Continuous-wave Light.

Nano Lett. 2009, 9, 33233329.

(3) Han, M. Y.; Özyilmaz, B.; Zhang, Y.; Kim, P. Energy Band-Gap Engineering of Graphene Nanoribbons.

Phys. Rev. Lett. 2007, 98, 206805.

(4) Barnard, A. S. Impact of Distributions on the Photocatalytic Performance of Anatase Nanoparticle Ensembles.

J. Mater. Chem. A 2015, 3, 6064.

(5) Shi, H.; Rees, R. J.; Per, M. C.; Barnard, A. S. Impact of Distributions and Mixtures on the Charge Transfer Properties of Graphene Nanoakes.

Nanoscale 2015, 7, 1864

1871.

(6) Barnard, A. S.; Wilson, H. F. Optical Emission of Statistical Distributions of Silicon Quantum Dots.

J. Phys. Chem. C

2015, 119, 79697977.

(7) Sun, B.; Barnard, A. S. Impact of Speciation on the Electron Charge Transfer Properties of Nanodiamond Drug Carriers.

Nanoscale 2016, 8, 1426414270.

(8) Sun, B.; Fernandez, M.; Barnard, A. S. Statistics, Damned Statistics and NanoscienceUsing Data Science to Meet the Challenge of Nanomaterial Complexity.

Nanoscale

Horiz. 2016, 1, 8995. (9) Fernandez, Archetypes.

M.;

Barnard,

A.

S.

Identication

of

Nanoparticle

Prototypes

and

ACS Nano 2015, 9, 1198011992.

(10) Baletto, F.; Mottet, C.; Ferrando, R. Reentrant Morphology Transition in the Growth of Free Silver Nanoclusters.

(11) Bocquet, M.-L.;

Phys. Rev. Lett. 2000, 84, 55445547.

Rappe, A.;

Dai, H.-L. A Density Functional Theory Study of

Adsorbate-induced Work Function Change and Binding Energy: Olens on Ag(111).

Mol. Phys. 2005, 103, 883890. 21

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

(12) Milek, T.; Zahn, D. Molecular Simulation of Ag Nanoparticle Nucleation from Solution: Redox-reactions Direct the Evolution of Shape and Structure.

Nano Lett. 2014,

14, 49134917. (13) Csányi, G.; Albaret, T.; Payne, M. C.; De Vita, A. Learn on the Fly: A Hybrid Classical and Quantum-Mechanical Molecular Dynamics Simulation.

Phys. Rev. Lett.

2004, 93, 175503. (14) Fernandez, M.; Barnard, A. S. Geometrical Properties Can Predict CO 2 and N2 Adsorption Performance of Metal-Organic Frameworks (MOFs) at Low Pressure.

ACS

Comb. Sci. 2016, 18, 243252. (15) Fernandez, M.; Wilson, H.; Barnard, A. S. Impact of Distributions on the Archetypes and Prototypes in Heterogenous Nanoparticle Ensembles.

Nanoscale 2016, 13, 2879

2882.

(16) Sun, Y.; Mayers, B.; Xia, Y. Transformation of Silver Nanospheres into Nanobelts and Triangular Nanoplates through a Thermal Process.

(17) Gracia-Pinilla, M. Á.;

Pérez-Tijerina, E.;

Nano Lett. 2003, 3, 675679.

García, J. A.;

Fernández-Navarro, C.;

Tlahuice-Flores, A.; Mejía-Rosales, S.; Montejano-Carrizales, J. M.; José-Yacamán, M. On the Structure and Properties of Silver Nanoparticles.

J. Phys. Chem. C 2008, 112,

1349213498.

(18) Loo, C.; Lowery, A.; Halas, N.; West, J.; Drezek, R. Immunotargeted Nanoshells for Integrated Cancer Imaging and Therapy.

Nano Lett. 2005, 5, 709711.

(19) Sotiriou, G. A.; Hirt, A. M.; Lozach, P.-Y.; Teleki, A.; Krumeich, F.; Pratsinis, S. E. Hybrid, Silica-Coated, Janus-Like Plasmonic-Magnetic Nanoparticles.

2011, 23, 19851992.

22

ACS Paragon Plus Environment

Chem. Mater.

Page 23 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(20) Velev, O. D.; Kaler, E. W. In situ Assembly of Colloidal Particles into Miniaturized Biosensors.

Langmuir

1999, 15, 36933698.

(21) McFarland, A. D.; Van Duyne, R. P. Single Silver Nanoparticles as Real-Time Optical Sensors with Zeptomole Sensitivity.

Nano Lett. 2003, 3, 10571062.

(22) Haes, A. J.; Haynes, C. L.; McFarland, A. D.; Schatz, G. C.; Van Duyne, R. P.; Zou, S. Plasmonic Materials for Surface-Enhanced Sensing and Spectroscopy.

MRS

Bull. 2005, 30, 368375. (23) Hoa, X.; Kirk, A.; Tabrizian, M. Towards Integrated and Sensitive Surface Plasmon Resonance Biosensors: A Review of Recent Progress.

Biosens. Bioelectron. 2007, 23,

151160.

(24) Macklin, J. J.; Trautman, J. K.; Harris, T. D.; Brus, L. E. Imaging and Time-Resolved Spectroscopy of Single Molecules at an Interface.

Science 1996, 272, 255258.

(25) Haynes, C. L.; Van Duyne, R. P. Plasmon-Sampled Surface-Enhanced Raman Excitation Spectroscopy.

J. Phys. Chem. B

2003, 107, 74267433.

(26) González, A. L.; Noguez, C.; Beránek, J.; Barnard, A. S. Size, Shape, Stability, and Color of Plasmonic Silver Nanoparticles.

J. Phys. Chem. C

2014, 118, 91289136.

(27) Russell, A.; Hugo, W. Antimicrobial Activity and Action of Silver.

Prog. Med. Chem.

1994, 31, 351370. (28) Sondi, I.; Salopek-Sondi, B. Silver Nanoparticles as Antimicrobial Agent: a case Study on E-coli as A Model for Gram-negative Bacteria.

J. Colloid Interface Sci. 2004, 275,

177182.

(29) Kim, J. S.; Kuk, E.; Yu, K. N.; Kim, J.-H.; Park, S. J.; Lee, H. J.; Kim, S. H.; Park, Y. K.; Park, Y. H.; Hwang, C.-Y.; Kim, Y.-K.; Lee, Y.-S.; Jeong, D. H.; Cho, M.-

23

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

H. Antimicrobial Eects of Silver Nanoparticles.

Page 24 of 32

Nanomedicine Nanotechnology, Biol.

Med. 2007, 3, 95101. (30) Rai, M.; Yadav, A.; Gade, A. Silver Nanoparticles as A New Generation of Antimicrobials.

Biotechnol. Adv. 2009, 27, 7683.

(31) Kumar, A.; Vemula, P. K.; Ajayan, P. M.; John, G. Silver Nanoparticle Embedded Antimicrobial Paints based on Vegetable Oil.

Nat. Mater. 2008, 7, 236241.

(32) Sharma, V. K.; Yngard, R. A.; Lin, Y. Silver Nanoparticles: Their Antimicrobial Activities.

Green Synthesis and

Adv. Colloid Interface Sci. 2009, 145, 8396.

(33) Prabhu, S.; Poulose, E. K. Silver Nanoparticles: Mechanism of Antimicrobial Action, Synthesis, Medical Applications, and Toxicity Eects.

Int. Nano Lett. 2012, 2, 32.

(34) Swathy, J. R.; Sankar, M. U.; Chaudhary, A.; Aigal, S.; Anshup,; Pradeep, T. Antimicrobial Silver: An Unprecedented Anion Eect.

Sci. Rep. 2014, 4, 7161.

(35) Richter, A. P.; Brown, J. S.; Bharti, B.; Wang, A.; Gangwal, S.; Houck, K.; Cohen Hubal, E. A.; Paunov, V. N.; Stoyanov, S. D.; Velev, O. D. An Environmentally Benign Antimicrobial Nanoparticle based on A Silver-infused Lignin Core.

Nat. Nanotechnol.

2015, 10, 817823. (36) Kottmann, J. P.; Martin, O. J. F.; Smith, D. R.; Schultz, S. Plasmon Resonances of Silver Nanowires with a Nonregular Cross Section.

Phys. Rev. B

2001, 64, 235402.

(37) Jin, R.; Cao, Y.; Mirkin, C. A.; Kelly, K. L.; Schatz, G. C.; Zheng, J. G. Photoinduced Conversion of Silver Nanospheres to Nanoprisms.

Science 2001, 294, 19011903.

(38) Sun, Y.; Xia, Y. Shape-Controlled Synthesis of Gold and Silver Nanoparticles.

Science

2002, 298, 21762179. (39) Lofton, C.; Sigmund, W. Mechanisms Controlling Crystal Habits of Gold and Silver Colloids.

Adv. Funct. Mater. 2005, 15, 11971208. 24

ACS Paragon Plus Environment

Page 25 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(40) Wiley, B. J.; Xiong, Y.; Li, Z.-Y.; Yin, Y.; Xia, Y. Right Bipyramids of Silver: A New Shape Derived from Single Twinned Seeds.

Nano Lett. 2006, 6, 765768.

(41) Wiley, B. J.; Chen, Y.; McLellan, J. M.; Xiong, Y.; Li, Z.-Y.; Ginger, D.; Xia, Y. Synthesis and Optical Properties of Silver Nanobars and Nanorice.

Nano Lett. 2007,

7, 10321036. (42) Kim, M. H.; Lu, X.; Wiley, B.; Lee, E. P.; Xia, Y. Morphological Evolution of SingleCrystal Ag Nanospheres during the Galvanic Replacement Reaction with HAuCl(4).

J. Phys. Chem. C

2008, 112, 78727876.

(43) Cobley, C. M.; Skrabalak, S. E.; Campbell, D. J.; Xia, Y. Shape-Controlled Synthesis of Silver Nanoparticles for Plasmonic and Sensing Applications.

Plasmonics 2009, 4,

171179.

(44) Sau, T. K.; Rogach, A. L. In

Appl.;

Complex-Shaped Met. Nanoparticles Bottom-Up Synth.

Sau, T. K., Rogach, A. L., Eds.; Wiley-VCH Verlag GmbH & Co. KGaA:

Weinheim, Germany, 2012.

(45) Lu, Y.; Chen, W. Sub-nanometre Sized Metal Clusters: from Synthetic Challenges to the Unique Property Discoveries.

Chem. Soc. Rev. 2012, 41, 35943623.

(46) Harra, J.; Mäkitalo, J.; Siikanen, R.; Virkki, M.; Genty, G.; Kobayashi, T.; Kauranen, M.; Mäkelä, J. M. Size-controlled Aerosol Synthesis of Silver Nanoparticles for Plasmonic materials.

J. Nanoparticle Res. 2012, 14, 870.

(47) Xia, X.; Zeng, J.; Oetjen, L. K.; Li, Q.; Xia, Y. Quantitative Analysis of the Role Played by Poly(vinylpyrrolidone) in Seed-mediated Growth of Ag Nanocrystals.

J.

Am. Chem. Soc. 2012, 134, 17931801. (48) Doye, J. P. K.; Calvo, F. Entropic Eects on the Size Dependence of Cluster Structure.

Phys. Rev. Lett. 2001, 86, 35703573. 25

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 32

(49) Baletto, F.; Ferrando, R. Structural Properties of Nanoclusters: Energetic, Thermodynamic, and Kinetic Eects.

Rev. Mod. Phys. 2005, 77, 371423.

(50) Angulo, A. M.; Noguez, C. Atomic Structure of Small and Intermediate-Size Silver Nanoclusters.

J. Phys. Chem. A 2008, 112, 58345838.

(51) Pal, S.; Tak, Y. K.; Song, J. M. Does the Antibacterial Activity of Silver Nanoparticles Depend on the Shape of the Nanoparticle? A Study of the Gram-Negative Bacterium Escherichia coli.

J. Biol. Chem. 2007, 73, 17121720.

(52) Chen, L.; Tran. T, T.; Huang, C.; Li, J.; Yuan, L.; Cai, Q. Synthesis and Photocatalytic Application of Au/Ag Nanoparticle-sensitized ZnO Films.

Appl. Surf. Sci. 2013, 273,

8288.

(53) Sánchez-Iglesias, A.;

Aldeanueva-Potel,

P.;

Ni, W.;

Pérez-Juste, J.;

Pastoriza-

Santos, I.; Alvarez-Puebla, R. A.; Mbenkum, B. N.; Liz-Marzán, L. M. Chemical Seeded Growth of Ag Nanoparticle Arrays and Their Application as Reproducible SERS Substrates.

Nano Today 2010, 5, 2127.

(54) Sun, B.; Barnard, A. S. Impact of Size and Shape Distributions on the Electron Charge Transfer Properties of Silver Nanoparticles.

Nanoscale 2017, 9, 1269812708.

(55) Porezag, D.; Frauenheim, T.; Köhler, T.; Seifert, G.; Kaschner, R. Construction of Tight-binding-like Potentials on the Basis of Density-functional Theory: Application to Carbon.

Phys. Rev. B

1995, 51, 1294712957.

(56) Frauenheim, T.; Seifert, G.; Elstner, M.; Niehaus, T.; Köhler, C.; Amkreutz, M.; Sternberg, M.; Hajnal, Z.; Carlo, A. D.; Suhai, S. Atomistic Simulations of Complex Materials: Ground-state and Excited-state Properties.

2002, 14, 30153047.

26

ACS Paragon Plus Environment

J. Phys. Condens. Matter

Page 27 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(57) Aradi, B.; Hourahine, B.; Frauenheim, T. DFTB+, a Sparse Matrix-Based Implementation of the DFTB Method.

J. Phys. Chem. A 2007, 111, 56785684.

(58) Szücs, B.; Hajnal, Z.; Frauenheim, T.; González, C.; Ortega, J.; Pérez, R.; Flores, F. Chalcogen Passivation of GaAs(1 0 0) Surfaces: Theoretical Study.

Appl. Surf. Sci.

2003, 212-213, 861865. (59) Szucs, B.; Hajnal, Z.; Scholz, R.; Sanna, S.; Frauenheim, T.; Szűcs, B. Theoretical Study of the Adsorption of a PTCDA Monolayer on S-passivated GaAs(l 0 0).

Appl.

Surf. Sci. 2004, 234, 173177. (60) Barnard,

A.

S.;

Sun,

B.

Silver

Nanoparticle

Data

Set.

2017,

DOI:

10.4225/08/595f2a960c870 . (61) Ozer, D. J. Correlation and the Coecient of Determination.

Psychol. Bull. 1985, 97,

307315.

(62) Bolboaca, S.-D.; Jäntschi, L. Pearson versus Spearman, Kendall's tau Correlation Analysis on Structure-activity Relationships of Biologic Active Compounds.

Leonardo

J. Sci. 2006, 5, 179200. (63) Willmott, C. J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance.

Clim.

Res. 2005, 30, 7982. (64) Chai, T.; Draxler, R. R. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)? -Arguments against Avoiding RMSE in the Literature.

Geosci. Model Dev.

2014, 7, 12471250. (65) Van Der Aalst, W. M.; Rubin, V.; Verbeek, H. M.; Van Dongen, B. F.; Kindler, E.; Günther, C. W. Process Mining: A Two-step Approach to Balance between Undertting and Overtting.

Softw. Syst. Model. 2010, 9, 87111. 27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 32

(66) Guyon, X.; Yao, J.-f. On the Undertting and Overtting Sets of Models Chosen by Order Selection Criteria.

J. Multivar. Anal. 1999, 70, 221249.

(67) Narayan, S.; Tagliarini, G. An Analysis of Undertting in MLP Networks.

Proc. Int.

Jt. Conf. Neural Networks, 2005. IJCNN'05. 2005, 984988. (68) Hawkins, D. M. The Problem of Overtting.

J. Chem. Inf. Comput. Sci. 2004, 44,

112.

(69) Schaer, C. Overtting Avoidance as Bias.

Mach. Learn. 1993, 10, 153178.

(70) Lawrence, S.; Giles, C. Overtting and Neural Networks: Conjugate Gradient and Backpropagation.

Proc. IEEE-INNS-ENNS Int. Jt. Conf. Neural Networks. IJCNN.

2000, 1, 114119. (71) Breiman, L. Random Forest.

Mach. Learn. 1999, 45, 135.

(72) Picard, R.; Berk, K. Data Splitting.

Am. Stat. 1990, 44, 140147.

(73) Fushiki, T. Estimation of Prediction Error by Using K-fold Cross-validation.

Stat.

Comput. 2011, 21, 137146. (74) Wiens, T. S.; Dale, B. C.; Boyce, M. S.; Kershaw, G. P. Three Way k-fold Crossvalidation of Resource Selection Functions.

Ecol. Modell. 2008, 212, 244255.

(75) He, H.; Garcia, E. A. Learning From Imbalanced Data.

IEEE Trans. Knowl. Data

Eng. 2009, 21, 12631284. (76) Chawla, N. V. Data Mining for Imbalanced Datasets: An Overview.

Data Min. Knowl.

Discov. Handb. 2005, 853867. (77) Batista, G. E. A. P. A.; Prati, R. C.; Monard, M. C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.

Newsl. 2004, 6, 2029. 28

ACS Paragon Plus Environment

ACM SIGKDD Explor.

Page 29 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(78) Lipton, Z. C.; Elkan, C.; Naryanaswamy, B. Optimal Thresholding of Classiers to Maximize F1 Measure. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2014; pp 225239.

(79) Goutte, C.; Gaussier, E. A Probabilistic Interpretation of Precision, Recall and FScore, with Implication for Evaluation.

27th Eur. Conf. IR Res. ECIR 2005, Santiago

Compost. 2005, 3408, 345359. (80) Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation.

Adv. Artif. Intell. 2006,

4304, 10151021. (81) Bewick, V.; Cheek, L.; Ball, J. Statistics Review 14: Logistic regression.

Crit. Care

2005, 9, 112118. (82) Utgo, P. E.; Corporation, C.; Street, W. Decision Tree Induction Based on Ecient Tree Restructuring.

Mach. Learn. 1997, 29, 544.

(83) Olaru, C.; Wehenkel, L. A Complete Fuzzy Decision Tree Technique.

Fuzzy Sets Syst.

2003, 138, 221254. (84) Brown, S. D.; Myles, A. J. Decision Tree Modeling in Classication.

Compr. Chemom.

2010, 3, 541569. (85) Myles, A. J.; Feudale, R. N.; Liu, Y.; Woody, N. A.; Brown, S. D. An Introduction to Decision Tree Modeling.

2004, 18, 275285.

(86) Mingers, J. An Empirical Comparison of Pruning Methods for Decision Tree Induction.

Mach. Learn. 1989, 4, 227243. (87) Strobl, C.; Boulesteix, A. L.; Augustin, T. Unbiased Split Selection for Classication Trees based on the Gini Index.

Comput. Stat. Data Anal. 2007, 52, 483501. 29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(88) Jackson, J. E.

Page 30 of 32

A User's Guide to Principal Components ; John Wiley & Sons, Inc.,

2004.

(89) Lange, N.; Bishop, C. M.; Ripley, B. D. Neural Networks for Pattern Recognition.

J.

Am. Stat. Assoc. 1997, 92, 1642. (90) LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning.

Nature 2015, 521, 436444.

(91) Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; Hassabis, D. Mastering the game of Go with deep neural networks and tree search.

(92) Rusk, N. Deep Learning.

Nature 2016, 529, 484489.

Nat. Methods 2015, 13, 35.

(93) Sargent, D. J. Comparison of Articial Neural Networks with Other Statistical Approaches.

Cancer 2001, 91, 16361642.

(94) Westreich, D.; Lessler, J.; Funk, M. J. Propensity Score Estimation: Neural Networks, Support Vector Machines, Decision Trees (CART), and Meta-classiers as Alternatives to Logistic Regression.

J. Clin. Epidemiol. 2010, 63, 826833.

(95) Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization.

Represent. 2015

Int. Conf. Learn.

2015, 115.

(96) Ruder, S. An Overview of Gradient Descent Optimization Algorithms.

sebastianruder.com/optimizing-gradient-descent/ ,

http://

(accessed 2017-01-05).

(97) Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectier Neural Networks.

AISTATS

'11 Proc. 14th Int. Conf. Artif. Intell. Stat. 2011, 15, 315323. (98) Maas, A. L.; Hannun, A. Y.; Ng, A. Y. Rectier Nonlinearities Improve Neural Network Acoustic Models.

Proc. 30 th Int. Conf. Mach. Learn. 2013, 28, 6. 30

ACS Paragon Plus Environment

Page 31 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(99) Gupta, A.; Lam, S. M. Weight Decay Backpropagation for Noisy Data.

Neural Net-

works 1998, 11, 11271137. (100) Gnecco, G.; Sanguineti, M. The Weight-decay Technique in Learning from Data: an Optimization Point of View.

Comput. Manag. Sci. 2009, 6, 5379.

(101) Rasmussen, C. E.; Williams, C. K. I. Gaussian Processes for Machine Learning.

Int.

J. Neural Syst. 2004, 14, 69106. Neural Networks Mach. Learn.

(102) Mackay, D. J. C. Introduction to Gaussian Processes.

1998, 168, 133165. (103) Boyle, P.; Frean, M. Dependent Gaussian Processes.

Adv. Neural Inf. Process. Syst.

2005, 217224. (104) Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine.

Ann.

Stat. 2001, 29, 11891232. (105) Friedman, J. H. Stochastic Gradient Boosting.

Comput. Stat. Data Anal. 2002, 38,

367378.

31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

32

ACS Paragon Plus Environment

Page 32 of 32