Toolkit for the Construction of Reproducing Kernel-Based

Jun 30, 2017 - Toolkit for the Construction of Reproducing Kernel-Based Representations of Data: Application to Multidimensional Potential Energy Surf...
1 downloads 11 Views 878KB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

A Toolkit for The Construction of Reproducing Kernel-Based Representations of Data: Application to Multi-Dimensional Potential Energy Surfaces Oliver Thorsten Unke, and Markus Meuwly J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00090 • Publication Date (Web): 30 Jun 2017 Downloaded from http://pubs.acs.org on July 1, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A Toolkit for The Construction of Reproducing Kernel-Based Representations of Data: Application to Multi-Dimensional Potential Energy Surfaces Oliver T. Unke and Markus Meuwly∗ Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland. E-mail: m.meuwly@unibas.ch Abstract In the early days of computation, slow processor speeds limited the amount of data that could be generated and used for scientific purposes. In the age of big data, the limiting factor usually is the method with which large amounts of data are analysed and useful information is extracted. A typical example from chemistry are high-level ab initio calculations for small systems, which have nowadays become feasible even if energies at many different geometries are required. Molecular dynamics simulations often require several thousand distinct trajectories to be run. Under such circumstances suitable analytical representations of potential energy surfaces (PES) based on ab initio calculations are required to propagate the dynamics at an acceptable cost. In this work we introduce a toolkit which allows the automatic construction of multi-dimensional PESs from gridded ab initio data based on reproducing kernel Hilbert space (RKHS) theory. The resulting analytical representations require no tuning of parameters and allow energy and force evaluations at ab initio quality at the same cost as empirical force fields.

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Although the toolkit is primarily intended for constructing multi-dimensional potential energy surfaces for molecular systems, it can also be used for general machine learning purposes. The software is published under the MIT license and can be downloaded, modified and used in other projects for free.

2

ACS Paragon Plus Environment

Page 2 of 34

Page 3 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1

Introduction

High-level electronic structure calculations have not only become more accurate, but also more efficient in recent years. 1–3 Along with the ever-increasing computational power, these advances allow the calculation of thousands of energies at various geometries for small polyatomic systems within chemical accuracy (0.5 kcal/mol). 4 Nevertheless, analytical representations of potential energy surfaces (PESs) are still required when statistically significant numbers (typically 105 or more) of molecular dynamics (MD) simulations need to be performed.

The most direct way to obtain an analytical representation of a PES is to use a parametrized, pre-determined functional form for the energy 5 and fit the parameters to a set of ab initio data using linear or non-linear least squares procedures. 6 Although such fits have demonstrated to achieve root mean squared errors (RMSEs) within chemical accuracy, 7 the fitting process and choosing a functional form require human intuition and can be tedious and time-intensive. It would be considerably more convenient to supply only a set of ab initio data to a toolkit that generates the interpolation (and meaningful extrapolation) of the PES along with all required parameters automatically. Such methods are usually termed Machine-learning methods.

Machine-learning methods allow to estimate an unknown function value using a model that was “trained” with a set of known data. 8 For intermolecular interactions, Rabitz and coworkers 9–11 have popularized the use of reproducing kernel Hilbert space (RKHS) theory, 12 that allows to construct a PES from a training set based on ab initio data. In the machine learning community, this method is well-known as kernel ridge regression (KRR). 8,13 The RKHS 15 method has been successfully applied e.g. for constructing PESs for NO+O, 14 N+ 2 + Ar

or H2 O. 16 KRR has also been applied to predict atomization energies 17 and to find density functionals. 18 3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Over the past years alternative interpolation techniques including the modified Shepard algorithm, 19–21 the moving least squares method, 22–24 permutation invariant polynomials 25–27 or neural network approaches 28–30 have been used to construct multi-dimensional PESs. 31–37 However, they all have in common that they primarily minimize the RMSE to the given training data. While this may be acceptable in many cases, the fundamental difference is that the RKHS method reproduces all training data exactly by construction (apart from numerical round-off errors). This can be of importance for example when spectroscopic properties need to be computed with high accuracy. 38 In general, whenever exact training data is available, it can be advantageous to use the RKHS method since the interpolation itself will introduce no fitting errors.

Unfortunately, the RKHS method can be quite difficult to implement efficiently in code, which might explain why its use is less widespread given the undeniable advantages. In this work, we introduce a toolkit for constructing multi-dimensional PESs with the RKHS method given a grid of training data from ab initio calculations or empirical expressions. It is published under the MIT license and can be downloaded from github. 39 The toolkit has no dependencies, does not require a complex build system and can be freely modified and included in other projects. A multi-dimensional PES and its first and second derivatives of any number of dimensions can be constructed with only a few function calls and evaluated at arbitrary configurations. Surfaces that were constructed with the toolkit can be saved and loaded from a file, such that the PES can be directly evaluated without repetition of setup steps.

This manuscript introduces a new, fast method to calculate the RKHS-coefficients which makes the method applicable to extremely large data sets (> 106 ). In addition, an evaluation algorithm that works for an arbitrary number of dimensions is introduced that does

4

ACS Paragon Plus Environment

Page 4 of 34

Page 5 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

not build on prior knowledge of dimensionality and explicit unrolling of loops contrary to what was required in previous work. 10 This makes it possible to write a general-purpose, flexible toolkit which does not require application-dependent modifications. For example, RKHS is - to our knowledge - applied for the first time to a 6-dimensional PES. For illustrative purposes, one explicit step-by-step example for constructing a PES with the toolkit for H3 is given in the supporting information. Further we correct known inconsistencies in previous publications.57,58 Although the toolkit is primarily intended for constructing PESs for applications in classical and quantum molecular simulations, it can also be applied to other, more general machine learning problems as will be discussed further below. Finally, for illustrative purposes, one explicit step-by-step example for constructing a PES with the toolkit for H3 is given in the supporting information.

In the following, a brief overview of the RKHS method is given. Sections 2.1 and 2.2 give details on efficient algorithms for setting up and evaluating the RKHS model function based on data on a regular grid. Since available data does not necessarily have this structure, section 3 describes how these methods can still be applied to arbitrary data sets. Section 4 gives an overview of important kernel functions and section 5 exemplifies the application of the RKHS method to a high-dimensional problem with large data sets. All relevant algorithms are given in pseudocode in the supporting information. For a more general introduction to machine learning methods, 40 or specifically kernel-based methods 41,42 the reader is referred to the literature.

2

Overview of the RKHS method

A Hilbert space is called a “reproducing kernel Hilbert space” (RKHS) if all its evaluation functionals are continuous. 43 A RKHS has many useful properties which make it particularly suited for data interpolation and machine learning applications. 11 In particular, if the values

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 34

fi of a function f (x) are known for a set of N training examples xi , the representer theorem 44 states that f (x) can always be approximated as a linear combination of kernel products

fe(x) =

N X

ci K(x, xi )

(1)

i=1

where ci are coefficients and K(x, x′ ) is the reproducing kernel of the RKHS. A function K(x, x′ ) is said to be a reproducing kernel of a Hilbert space if the evaluation functional can be represented as an inner product of f (x) with K(x, x′ ). 45 The coefficients ci satisfy the linear relation fj =

N X

(2)

ci Kij

i=1

with Kij = K(xi , xj ) and can therefore be calculated from the known values fi in the training set through solution of the system of linear equations given by Eq. 3. 









 K11 K12 . . . K1N   c1   f1        K21 K22 . . . K2N   c2   f2        . = .   . .. ..  ...      .. .  .  ..   ..        fN cN KN 1 KN 2 . . . K N N

(3)

Since the kernel matrix K = [Kij ] is symmetric and positive-definite by construction, the efficient Cholesky decomposition 46 can be used to solve Eq. 3. Once the coefficients ci have been determined, the function value at an arbitrary position x can be calculated using Eq. 1. The interpolatory power of the RKHS method is demonstrated in Fig. 1.

Note that derivatives of fe(x) of any order can be calculated analytically by replacing the

kernel function K(x, x′ ) in Eq. 1 with its corresponding derivative. As long as fe(x) approx-

imates f (x) reasonably well, derivatives of fe(x) can be expected to be good approximations of derivatives of f (x).

6

ACS Paragon Plus Environment

Page 7 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

 Figure 1: RKHS method applied to the function V (r) = r19 − r16 (cos2 (7r) + 1). The model function fe(x) (red dashed line) constructed from the training samples (black dots) is virtually identical to the analytical expression (grey line). Note that the RKHS method does not rely on a given functional form and is thus able to reproduce the complicated oscillatory features of V (r), provided that the training data captures all features that should be reproduced. The explicit form of the multi-dimensional kernel function K(x, x′ ) is chosen depending on the problem to be solved. In general, it is possible to construct D-dimensional kernels as products of one-dimensional kernels k(x, x′ )



K(x, x ) =

D Y

k (d) (x(d) , x′(d) )

(4)

d=1

where k (d) is the one-dimensional kernel of dimension d and x(d) and x′(d) are the d-th compo  nents of the D-vectors x = x(d) and x′ = x′(d) , respectively. Throughout the remainder

of this work superscripts in parentheses denote dimensions unless specified otherwise. Some of the most useful one-dimensional kernels for constructing multi-dimensional PESs and for machine learning applications are given in section 4.

In principle Eqs. 1-3 are already sufficient for a simple implementation of the RKHS method.

7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 34

In practice, however, the solution of Eq. 3 is only possible if the kernel matrix K is wellconditioned. If K becomes ill-conditioned, a regularized solution can be obtained for example by Tikhonov regularization. 47 This amounts to adding a small positive constant λ to the diagonal of K, such that fj =

N X

ci (Kij + λδij )

(5)

i=1

is solved instead of Eq. 2 when determining the coefficients ci (here, δij is the Kronecker delta). Note that the same method is used in KRR to prevent over-fitting. 8

If no regularization procedure is used for calculating the coefficients ci , the model function fe(x) reproduces all samples fj in the training set exactly by construction (see Eq. 2). On

the other hand, if λ is too large, fe(x) evaluates to the average of the training set.

An appealing feature of the RKHS method is that the quality of the model function fe(x) can be systematically improved by adding new data to the training set.

Unfortunately, the simple implementation described above has two major drawbacks when the training set size N becomes large: 1) Since the Cholesky decomposition used to solve Eq. 3 scales with O(N 3 ), the calculation of the ci can become prohibitive for large training sets. However, since the coefficients need to be only calculated once, this is only problematic for extremely large N (N ≫ 105 ). 2) A more severe drawback is that the evaluation of the model function fe(x) requires a sum

over all training samples and thus scales as O(N ). If only few evaluations of Eq. 1 are required, linear scaling is acceptable. However, for RKHS representation of a PES used in MD simulations, many thousands of evaluations are typically needed during the course of the dynamics and the linear scaling directly translates to a linear increase in simulation time.

Fortunately both drawbacks can be resolved: A new, fast method for calculating the coefficients is described in section 2.1 and a rapid method for evaluating the model function fe(x) 8

ACS Paragon Plus Environment

Page 9 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

developed by Hollebeek et al. 10 is described in section 2.2. Both methods take advantage of a multi-dimensional grid structure of the training set, which is readily available from ab initio calculations of PESs. For general machine learning applications, however, a purposeful construction of the training set as a multi-dimensional grid can prove difficult. Section 3 details how both fast methods can be applied to incomplete grids, such that, even in the extreme case of randomly scattered data, at least the fast evaluation method becomes applicable. Alternatives to the methods mentioned in this work are reduced set methods, which also allow for a more efficient computation of the kernel expansion. 48

2.1

Fast calculation of coefficients ci

If the training set is constructed by scanning through combinations of sets of N (d) points in each dimension, the resulting training set possesses a multi-dimensional grid structure. Note that N (d) can have a different values for each dimension d. The total size N of the training set in this case is simply N=

D Y

N (d)

(6)

d=1

Further, if the multi-dimensional kernel K(x, x′ ) can be written as a product of one-dimensional kernels (Eq. 4) the kernel matrix K can be expressed as a Tensor product of N (d) × N (d) matrices k(d) K = k(1) ⊗ k(2) ⊗ · · · ⊗ k(D) where k

(d)

=

h

(d) kij

i

(d)

(d)

(7)

(d)

and kij = k (d) (xi , xj ). Due to the properties of Tensor products it

is then possible to calculate the coefficients ci from the Cholesky decompositions of the D one-dimensional N (d) × N (d) matrices k(d) . 49 Since typically, N (d) ≪ N , the cost of these decompositions is negligible and the coefficients ci can be computed at essentially the cost of a matrix vector multiplication. 49

Note that machine learning efforts other than PES construction, for example the prediction 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 34

of electronic properties of molecules 50,51 generally use scattered training data. 52,53 However, it would in principle be possible to systematically construct training data with a multidimensional grid structure by performing reference calculations at fractional particle numbers with molecular grand-canonical ensemble density functional theory. 54 The full algorithm for the fast calculation of coefficients is given in the supporting information (Algorithm S1).

2.2

Fast evaluation of model function fe(x)

The methods described in this section were originally developed and derived by Hollebeek et al. 10 and are given in a condensed form for completeness. For further details on their derivation, the reader is referred to the literature. 10 As was mentioned earlier, in general it is necessary to sum over all N training samples for a single evaluation of the model function fe(x) (Eq. 1). For particular classes of onedimensional kernel functions however, it is possible to decompose the function k(x, x′ ) into

different contributions of x and x′



k(x, x ) =

f1 M X



p1k f1k (x)f1k (x ) +

k=1

f2 M X

p2k f2k (x< )f3k (x> )

(8)

k=1

where x> and x< are the larger and smaller of x and x′ respectively. It should be noted that f1 and M f2 are independent of N and only such decompositions are not unique. Further, M

depend on the chosen kernel function. For simplicity, Eq. 8 can equivalently be written as

k(x, x′ ) =

M2 X

p2k f2k (x< )f3k (x> )

(9)

k=1

if all p1k f1k (x)f1k (x′ ) terms are rewritten as p2k f2k (x< )f3k (x> ) with f1k = f2k = f3k and f1 + M f2 and is typically between 2 and 5 (see p1k = p2k . Note that M2 in Eq. 9 is equal to M

supporting information). For one dimension, inserting Eq. 9 into Eq. 1 and reversing the

10

ACS Paragon Plus Environment

Page 11 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

order of summation leads to: fe(x) =

M2 X k=1

σ2k (z) = p2k

[f3k (x)σ2k (z) + f2k (x)σ3k (z)]

X

ci f2k (xi )

(10)

i≤z

σ3k (z) = p2k

X

ci f3k (xi )

i>z

where z is an index such that xz ≤ x < xz+1 for all xi in the training set. In this form, it is only necessary to sum over the M2 terms of the kernel decomposition (Eq. 9) instead of all N training samples for an evaluation of fe(x). Since usually, M2 ≪ N , this can result in a substantial speedup (see Fig. 2).

The values of σmk are independent of x and can be pre-computed and stored in a lookup table for all possible indices z. The correct index z corresponding to σmk (z) for an arbitrary value of x can be efficiently found by binary search 55 or other search algorithms. In practice, the slow evaluation of the model function fe(x) using Eq. 1 will be faster up to

a specific value of N (see Fig. 2) due to the overhead associated with the search algorithm that is needed to find the appropriate z.

For multiple dimensions, equations similar to Eq. 10 can be found by applying the same reasoning recursively to each dimension, which leads to the following equations for the general,

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Average time per evaluation of the model function fe(x) in nanoseconds versus grid size N . Timings were averaged over 100 sets of 100000 evaluations per data point (error bars show one standard deviation). As is expected, the fast evaluation method (red) scales with O(log N ) (complexity of binary search), whereas the slow evaluation method (black) scales with O(N ) (the continuous lines are a linear (black) and a logarithmic (red) fit to the data). Note that due to the overhead of the binary search algorithm, the slow evaluation method is actually faster up until N ≈ 200. It is therefore recommended to benchmark the two methods in time-critical applications. The benchmark here was performed on a Desktop R R computer equipped with an Intel Xeon Processor E3-1275 at 3.40 GHz. Note that in principle, it is possible to further reduce the complexity of the fast evaluation method to O(1) if σmk is stored in a hash table. 56

12

ACS Paragon Plus Environment

Page 12 of 34

Page 13 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

multi-dimensional case 57 (1)

Xh

M2

fe(x) = γm(1) k(1) ···m(d) k(d) =

(1)

(1)

f3k (x(1) )γ2k + f2k (x(1) )γ3k

k=1

(d+1)

X h

i

M2

(d+1) f3k (x(d+1) )γm(1) k(1) ···m(d) k(d) 2k(d+1)

+

(d+1) f2k (x(d+1) )γm(1) k(1) ···m(d) k(d) 3k(d+1)

k=1

γm(1) k(1) ···m(D) k(D) = σm(1) k(1) ···m(D) k(D) (z) σm(1) k(1) ···m(D) k(D) (z) =

X

···

i(1) ∈s(1) (m(1) )

(d)

(d)

s (m ) =

X

ci(1) i(2) ···i(D) ×

(d)

(d)

(d)

p2k(d) fm(d) k(d) (xi(d) )

d=1

i(D) ∈s(D) (m(D) )

     1, 2, . . . , z (d)

D Y

m(d) = 2

    z (d) + 1, z (d) + 2, . . . , N (d) m(d) = 3

(11)

 (d) (d) (d) Here z = z (d) and the indices z (d) are chosen such that xz(d) ≤ x(d) < xz(d) +1 for all xi in (d)

(d)

the training set; fmk and p2k refer to the functions and constants in the decomposition of the

one-dimensional kernel functions k (d) (x(d) , x′(d) ) (Eq. 9). The multi-index i(1) i(2) · · · i(D) selects the coefficient ci that corresponds to the point x = {xi(d) } in the training set. The model function fe(x) can then be evaluated by iteratively updating the values for γ, starting from

σm(1) k(1) ···m(D) k(D) (z), which is taken from the pre-computed lookup table (see one-dimensional case).

Note that Eq. 11 is only applicable if the multi-dimensional kernel K(x, x′ ) can be written in the form given by Eq. 4. A complete algorithm for the fast evaluation of a multi-dimensional model function fe(x) according to Eq. 11 is given in the supporting information (Algorithm S2).

At first glance it appears that the precomputation of σm(1) k(1) ···mq (D)k(D) for all possible index vectors z scales as O(N 2 ). But in fact, it is possible to calculate all required values in just O(N ) time, which is the same complexity as a single evaluation of fe(x) using Eq. 1. This is possible because the sums for different z correspond to sums over different quadrants of 13

ACS Paragon Plus Environment

i

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 34

a subspace of the full D-dimensional space. Since the quadrants overlap, the value of a specific σm(1) k(1) ···m(D) k(D) (z) can be computed from a single evaluation at x = {xz(d) } and other previously computed σm(1) k(1) ···m(D) k(D) (z′ ). The recurrence relation used to calculate the different σm(1) k(1) ···m(D) k(D) (z) is 58

σm(1) k(1) ···m(D) k(D) (z) = t(z) +

D X

(−1)d+1

d=1

"

X

σm(1) k(1) ···m(D) k(D) (z + i)

i∈sd

#

(12)

Here, sd is the set of unique vectors i containing D − d zeros and d entries with either 1 or −1, depending on the m(d) for the σm(1) k(1) ···m(D) k(D) that is currently computed. More precisely, if m(d) = 2 then i(d) = −1 and if m(d) = 3 then i(d) = 1. The term t(z) is

t(z) = cz(1) z(2) ···z(D)

D Y

(d)

(d)

pm(d) k(d) fm(d) k(d) (xz(d) )

(13)

d=1

Figure 3 shows a graphical representation of the recurrence relation for a two-dimensional example.

Figure 3: Graphical representation of the evaluation of Eq. 12 for a two-dimensional example. Here, the entry for z (1) = 4 and z (2) = 3 is calculated for σ2k2k . Note that the subspace shaded in magenta is counted twice and must therefore be subtracted. It is assumed that the values for σ2k2k ([3, 3]), σ2k2k ([4, 2]) and σ2k2k ([3, 2]) have already been calculated. A complete algorithm for the fast pre-calculation of the lookup table according to Eq. 12 is given in the supporting information (Algorithm S3). In simple terms, the fast evaluation method given by Eq. 11 trades the computational overhead of summing over all training samples (Eq. 1) against the storage overhead that is required to store the lookup table of 14

ACS Paragon Plus Environment

Page 15 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

σm(1) k(1) ···m(D) k(D) values. The memory requirement in bytes for storing the lookup table is

D

Memory = 2 size

D Y

(d)

M2 (N (d) + 1)

(14)

d=1

where size is the amount of bytes required to store a single value of σm(1) k(1) ···m(D) k(D) . Typically, size is either 4 or 8, for single or double precision floating point variables, respectively. Assuming double precision and typical kernel functions (M2 = 2, see supporting information) for a 3-dimensional example with an unusually large grid of N (d) = 100 in each dimension, the storage requirement is below 500 MB of RAM. This shows that even for extremely large training sets (N = 1000000 in this example), the storage overhead is manageable for modern computers.

3

Handling incomplete grids

Although the previous sections showed the advantage of a multi-dimensional grid structure, in practice this strict format of the training set can be either difficult to obtain or be suboptimal. For example, consider the case of a reactive PES for the triatomic system ABC. The training data (ab initio energies) for such a PES must contain points that correspond to the diatom-atom configurations AB-C and AC-B as well as BC-A. Then, unless a special coordinate system is used (e.g. hyperspherical coordinates 59–61 ), the grid will automatically contain points that correspond to geometries where all three atoms are either unphysically close (“fusion”) or very distant (full atomization). Depending on the problem to be studied, such geometries lie in energetically inaccessible regions of the PES and the time spent performing ab initio calculations for such geometries should rather be spent to compute more points in relevant regions. Even when this problem is avoided, some points in the grid might be difficult to converge with the chosen ab initio method, or in a more general machine learning context, data might 15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 34

simply not be available for some points.

Luckily, such “holes” have no consequence for the implementation of the RKHS method when Eq. 1 is used to evaluate the model function and Eq. 3 is solved to obtain the coefficients ci . Both methods require no underlying structure of the training set and work equally well with scattered data. But from Eq. 1 it is evident that adding additional points to the training set will not influence the result, provided that the coefficients of such points are zero. Thus it is always possible to construct complete grids by filling all holes in the training set with additional points and setting their coefficients to zero. As such, the fast evaluation method discussed in section 2.2 becomes applicable even for scattered data, provided that the data set is padded such that it forms a multi-dimensional grid.

As was discussed in section 2, solving Eq. 3 with Cholesky decomposition becomes time intensive for large data sets because of the O(N 3 ) scaling. In that case it can be advantageous to complete the grid by assuming arbitrary function values for the missing points and use the fast method outlined in section 2.1 to calculate the coefficients instead. Although the coefficients b ci obtained in this way are strictly wrong, the correct coefficients ci can be

calculated from

ci = b ci −

X

δj [K]−1 ij

(15)

j∈h

where h is the set of all indices that correspond to the holes in the grid, δj are correction coefficients and [K]−1 ij is the entry at position (i, j) of the inverse of kernel matrix K (see Eq. 7). For a derivation of Eq. 15, the reader is referred to the literature. 11 Note that it is not necessary to calculate the full inverse matrix [K]−1 , but only the slices that contain holes in the grid. This is efficiently accomplished using Algorithm S1 (see supporting information).

16

ACS Paragon Plus Environment

Page 17 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The correction coefficients δj are obtained from solving the matrix equation 









c1   Q11 Q12 . . . Q1H   δ1   b       Q21 Q22 . . . Q2H   δ2   b       c2  =  .     .. ..   ..   ..  ..  .. . . .  .   .         QH1 QH2 . . . QHH δH b cH

(16)

where H is the number of holes in the grid and the matrix Q = {Qij } is the submatrix of the full inverse matrix [K]−1 that corresponds to the holes in the grid. The validity of Eq. 16 can easily be seen from the requirement that all correct coefficients ci that correspond to holes must be zero (see Eq. 15). As such, the calculation of the coefficients for incomplete grids scales with O(H 3 ) instead of O(N 3 ), where usually H ≪ N . If the matrix Q has the tensor product form itself, which can be the case if a structured subset of the multi-dimensional grid is excluded from the training set, the solution of Eq. 16 can also be performed using Algorithm S1 (see supporting information) to avoid the O(H 3 ) scaling.

4

One-dimensional kernel functions

In this section, several one-dimensional kernel functions for different purposes are introduced. The main focus is given to kernels which are useful for constructing multi-dimensional PESs, but some kernels for general machine-learning application are also given. All kernel functions introduced here can be decomposed into the form given by Eq. 9 and thus the fast evaluation method described in section 2.2 is applicable if they are used to construct a multi-dimensional kernel K(x, x′ ) (Eq. 4). The kernel functions used here differ from the permutationally invariant polynomials 25,29 as they allow to encode physical knowledge (i.e. radial decay behaviour) into the functional form of the PES in a general manner. Permutational invariance is automatically guaranteed

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 34

for RKHS interpolations, because any symmetry in the data is preserved by the symmetric kernels (see Eq. 1).

Kernels for nuclear distances For the description of nuclear distances in a PES, Hollebeek et al. 11 derived two kernel functions that are both defined in the interval [0, ∞), but possess a different asymptotic decay behaviour. Namely, the reciprocal power decay kernel should be chosen whenever a certain interaction in the PES follows a known r−n law, whereas the exponential decay kernel should be chosen for short-range contributions to the intermolecular interaction that often decay exponentially to zero for large distances. A general formula for both kernels is given below, for the derivation, the reader is referred to the literature. 11

The reciprocal power decay kernel is given by



kn,m (x, x ) =

−(m+1) n 2 x> B(m

+ 1, n)2 F1



x< −n + 1, m + 1; n + m + 1; x>



(17)

where x> and x< are the larger and smaller of x and x′ , B(a, b) is the beta function and 2 F1 (a, b; c; d)

is the Gauss’ hypergeometric function. The integers n and m determine the

smoothness of the kernel function and its asymptotic behaviour. To be precise, kn,m (x, x′ ) has n − 1 smooth derivatives and decays asymptotically with x−(m+1) . Explicit formula for the reciprocal power decay kernel and its decomposition (Eq. 9) for n = 2, 3 and m = 0 . . . 6 are given in the supporting information.

The exponential decay kernel is given by n−1

kn (x, x′ ) =

n · n! −βx> X (2n − 2 − k)! e [β(x> − x< )]k β 2n−1 (n − 1 − k)!k! k=0

(18)

where x> and x< are the larger and smaller of x and x′ and the integer n determines the 18

ACS Paragon Plus Environment

Page 19 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

smoothness. Like the reciprocal power decay kernel, the kernel function kn (x, x′ ) has n − 1 smooth derivatives. The positive parameter β determines the asymptotic decay behaviour e−βx and can be chosen such as to be conform with physical observation. Explicit formula for the exponential decay kernel and its decomposition (Eq. 9) for n = 2, 3 are given in the supporting information.

It should be noted though that choice of a non-optimal kernel function (e.g. with the wrong asymptotic decay behaviour) is almost inconsequential, provided that the training set contains enough samples in the asymptotic regions (large r). However, a correctly chosen kernel function allows the use of much smaller training sets, which directly translates to less CPU time spent for performing the ab initio calculations needed to obtain the training data. Fig. 4 demonstrates the ability of a properly chosen kernel function to reproduce the correct asymptotic behaviour, even when no training examples are present in the asymptotic regions.

Kernels for angular coordinates and arbitrary functions The Taylor spline kernel derived by Hollebeek et al. 11 is defined on the interval [0, 1] and can be used to model any finite interval, provided that coordinates are transformed through appropriate scaling functions first. For example, for handling an angular coordinate α that is given in the interval [0, π], the new coordinate y ∈ [0, 1]

y(α) =

1 − cos(α) 2

(19)

could be introduced. Similar transformations can be found for any other finite interval, such that the Taylor spline kernel is applicable also for general machine learning applications. It is given by ′

kn (x, x ) =

n−1 X

xi> xi
2 F1

i=0

19



x< 1, −n + 1; n + 1; x>

ACS Paragon Plus Environment



(20)

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: RKHS method applied to a charge interacting with a point dipole and a repulsive term. The potential energy is given by V (r, θ) = r19 − cos(θ) . Contour lines are drawn in 0.075 r2 increments starting from V (r, θ) = −0.475. The true potential energy is given by the grey contours, whereas the RKHS model is given by the red dashed contours. The training data used to construct the RKHS model is indicated by black dots. Note that the RKHS model is virtually identical to the true potential energy even in the asymptotic region (r > 2) where no training data are present, because the coordinate r is modelled using a reciprocal power decay kernel with m = 1.

20

ACS Paragon Plus Environment

Page 20 of 34

Page 21 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

where x> and x< are the larger and smaller of x and x′ and 2 F1 (a, b; c; d) is the Gauss’ hypergeometric function. Similar to the previously introduced kernel functions, the Taylor spline kernel has n − 1 smooth derivatives. Explicit formula for the Taylor spline kernel and its decomposition (Eq. 9) for n = 2, 3 are given in the supporting information.

Kernels for periodic functions The periodic spline kernel 62 is defined on the interval [0, 1) and can be used to model periodic functions, where it is assumed that a single period of the function occurs between 0 and 1. Similar to the Taylor spline kernel described earlier, this kernel is applicable to any period length, provided that the coordinates are transformed through appropriate scaling functions first. The periodic spline kernel is

kn (x, x′ ) = (−1)n−1

B2n (x> − x< ) (2n)!

(21)

where x> and x< are the larger and smaller of x and x′ and B2n is the 2n-th Bernoulli polynomial. Similar to the previously introduced kernel functions, the periodic spline kernel has n − 1 smooth derivatives.

Radial basis function kernels Radial basis function (RBF) kernels are a popular default choice for machine learning applications. 8 They take multi-dimensional input vectors x and x′ and map them to an infinitedimensional feature space. 63 Examples are the Gaussian kernel given by 

||x − x′ ||2 K(x, x ) = exp − 2σ 2 ′

21



ACS Paragon Plus Environment

(22)

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 34

and the Laplacian kernel given by 

||x − x′ || K(x, x ) = exp − σ ′



(23)

where σ is a positive parameter which determines the length scale on which the kernels operate. For the fast evaluation method (section 2.2) to become applicable, it is not only necessary to express K(x, x′ ) as a product over one-dimensional kernels (Eq. 4), but these one-dimensional kernels need to be decomposable into the form given by Eq. 9 as well. Unfortunately, this is generally not possible for RBF kernels.

The Laplacian kernel is an exception to this if certain restrictions on x and x′ are imposed. It can equivalently be written in the form of Eq. 4 with the one-dimensional kernel function 

||x − x′ || k(x, x ) = exp − σ ′



(24)

where x and x′ are one of the D components of the D-dimensional feature vectors x and x′ . Restricting x and x′ to the interval [0, ∞), Eq. 24 can be rewritten as x   x  < > exp k(x, x′ ) = exp − σ σ

(25)

where x> and x< are the larger and smaller of x and x′ , respectively. This is the form (Eq. 9) required for the fast evaluation method (section 2.2). In practice, the restriction of x and x′ to positive values should not limit the applicability of the Laplacian kernel, as such a restriction corresponds to a simple coordinate transformation (i.e. the introduction of a new coordinate y = x + C where C is a sufficiently large positive constant).

22

ACS Paragon Plus Environment

Page 23 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

5

Example

To demonstrate the general applicability of the present toolkit for using the RKHS method to high-dimensional problems and large data sets, an oscillatory six-dimensional test function

f (x) =

6 X

exp(−x(d) x(d−1) ) cos(x(d)

d=2

π ) 0.37

(26)

with x(d) ∈ [0, 1] was considered and training sets of different size were generated. The grid size N (d) of each dimension is chosen to be equal and training points are equally spaced between x(d) = 0 and x(d) = 1 in each dimension. A Taylor spline kernel with n = 2 was chosen as the kernel function for each dimension. Note that both, equal grid size and equally spaced training points are not required by the RKHS method and merely chosen for simplicity. The toolkit was applied to data sets from N (d) = 3 to N (d) = 14 (which corresponds to a total training set size of N = 729 to N = 7529536, see Eq. 6). Fig. 5 shows the systematic improvement of the mean squared error (MSE) of the RKHS model with increasing N and the correlation of the RKHS model with the analytical expression. It is worthwhile to point out that the direct solution of Eq. 3 for the calculation of the coefficients ci with Cholesky decomposition is impossible for the largest dataset (N = 7529536) with modern computers. Generously assuming that the Cholesky decomposition of a 10000 × 10000 matrix takes approximately two minutes, the decomposition of a 7529536 × 7529536 matrix would take well over one-thousand years. With the fast algorithm (section 2.1) the calculation of the coefficients is faster than the time it takes to read in the dataset from a file and only takes a couple of minutes.

The interpolation capabilities of the RKHS method were also tested for high-dimensional model PESs and similar results were obtained. For example, a RKHS model for the multi-

23

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Left. Mean squared error (MSE) of the RKHS model function compared to the true function (Eq. 26) versus training set size N . Data points were generated by averaging the squared error of 1000 randomly chosen points (error bars show one standard deviation, note that they appear distorted due to the logarithmic scale). The grey line shows a fit to the data points. Right. Correlation of the RKHS model (N = 7529536) with the analytical expression (Eq. 26) for 1000 randomly chosen points. The red line depicts a perfect correlation and the MSE is 6.7 × 10−4 .

24

ACS Paragon Plus Environment

Page 24 of 34

Page 25 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

dimensional Morse potential given by

f (x) =

6 X d=1

(d)

with De

(d)

= a(d) = xe

  2 ) − 1 De(d) 1 − exp −a(d) (x(d) − x(d) e

(27)

= 1 was constructed from a regular grid of 10 points in each di-

mension (N = 1000000) between x(d) = 0.1 and x(d) = 3. The exponential decay kernel was chosen as kernel function for each dimension and the MSE of the RKHS model for 1000 randomly chosen points was 2.9 × 10−3 .

The supporting information contains a comprehensive step-by-step guide for the construction of a PES, which highlights all necessary steps from the generation of the data set to how the constructed PES can be used in e.g. molecular dynamics software.

6

Conclusion

In this work, a convenient and “all-purpose” toolkit for concrete applications of the RKHS method to problems arising in computational chemistry was introduced. Special focus was given to the construction of PESs but the range of applications is considerably larger. Our toolkit implements efficient algorithms and automatically handles incomplete grids. Further, a wide range of kernel functions are implemented, such that it is possible to construct a PES with the desired kernel function with just a few function calls. If other kernel functions are required, the code can be easily extended. New PESs for arbitrary systems can be readily constructed and do not require manual fitting of parameters or other human input, provided that the training data is available.

The toolkit supports any number of dimensions and can also be used for general machine learning applications, e.g. for the interpolation of D-dimensional functions. It allows analytical calculation of all first and second order partial derivatives (higher order derivatives 25

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

can be easily implemented in case they are needed).

The full source code along with usage examples and a documentation can be downloaded from github. 39

Acknowledgments This work was supported by the Swiss National Science Foundation through grants 2000217117810 and the NCCR MUST (to MM) and the University of Basel. The authors acknowledge fruitful discussions with Prof. Hershel Rabitz and Dr. Tak-San Ho.

Conflict of Interest The authors are not aware of any conflict of interest to declare.

Abbreviations PES

potential energy surface

RKHS

reproducing kernel Hilbert space

MD

molecular dynamics

RMSE

root mean squared error

KRR

kernel ridge regression

RBF

radial basis function

MSE

mean squared error

Supporting Information The supporting information for this article contains the explicit form of the most common one-dimensional kernel functions and all algorithms in pseudocode. Further, it includes a 26

ACS Paragon Plus Environment

Page 26 of 34

Page 27 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

comprehensive step-by-step guide for the construction and application of a PES using the toolkit.

Notes and References (1) White, C. A.; Johnson, B. G.; Gill, P. M.; Head-Gordon, M. Linear Scaling Density Functional Calculations via the Continuous Fast Multipole Method. Chem. Phys. Lett. 1996, 253, 268–278. (2) Adler, T. B.; Knizia, G.; Werner, H.-J. A Simple and Efficient CCSD(T)-F12 Approximation. J. Chem. Phys. 2007, 127, 221106–224100. (3) Knizia, G.; Adler, T. B.; Werner, H.-J. Simplified CCSD(T)-F12 Methods: Theory and Benchmarks. J. Chem. Phys. 2009, 130, 054104. (4) Bokhan, D.; Ten-No, S.; Noga, J. Implementation of the CCSD(T)-F12 Method Using Cusp Conditions. Phys. Chem. Chem. Phys. 2008, 10, 3320–3326. (5) Aguado, A.; Paniagua, M. A New Functional Form to Obtain Analytical Potentials of Triatomic Molecules. J. Chem. Phys. 1992, 96, 1265–1275. (6) Law, M. M.; Hutson, J. M. I-NoLLS: A Program for Interactive Nonlinear Least-Squares Fitting of the Parameters of Physical Models. Comput. Phys. Commun. 1997, 102, 252–268. (7) Boothroyd, A. I.; Keogh, W. J.; Martin, P. G.; Peterson, M. R. A Refined H3 Potential Energy Surface. J. Chem. Phys. 1996, 104, 7139–7152. (8) Rupp, M. Machine Learning for Quantum Mechanics in a Nutshell. Int. J. Quantum Chem. 2015, 115, 1058–1073.

27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(9) Ho, T.-S.; Rabitz, H. A General Method for Constructing Multidimensional Molecular Potential Energy Surfaces from Ab Initio Calculations. J. Chem. Phys. 1996, 104, 2584–2597. (10) Hollebeek, T.; Ho, T.-S.; Rabitz, H. A Fast Algorithm for Evaluating Multidimensional Potential Energy Surfaces. J. Chem. Phys. 1997, 106, 7223–7227. (11) Hollebeek, T.; Ho, T.-S.; Rabitz, H. Constructing Multidimensional Molecular Potential Energy Surfaces from Ab Initio Data. Annu. Rev. Phys. Chem. 1999, 50, 537–570. (12) Aronszajn, N. Theory of Reproducing Kernels. Trans. Amer. Math. Soc. 1950, 68, 337–404. (13) Hofmann, T.; Sch¨olkopf, B.; Smola, A. J. Kernel Methods in Machine Learning. Ann. Stat. 2008, 1171–1220. (14) Castro-Palacio, J. C.; Nagy, T.; Bemish, R. J.; Meuwly, M. Computational Study of Collisions Between O(3 P) and NO(2 Π) at Temperatures Relevant to the Hypersonic Flight Regime. J. Chem. Phys. 2014, 141, 164319. (15) Unke, O. T.; Castro-Palacio, J. C.; Bemish, R. J.; Meuwly, M. Collision-Induced Rotational Excitation in N2+ (2 Σ+ g , v = 0)–Ar: Comparison of Computations and Experiment. J. Chem. Phys. 2016, 144, 224307. (16) Ho, T.-S.; Hollebeek, T.; Rabitz, H.; Harding, L. B.; Schatz, G. C. A Global H2 O Potential Energy Surface for the Reaction O(1 D) + H2 → OH + H. J. Chem. Phys. 1996, 105, 10472–10486. (17) Rupp, M.; Tkatchenko, A.; M¨ uller, K.-R.; Von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301.

28

ACS Paragon Plus Environment

Page 28 of 34

Page 29 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(18) Snyder, J. C.; Rupp, M.; Hansen, K.; M¨ uller, K.-R.; Burke, K. Finding Density Functionals with Machine Learning. Phys. Rev. Lett. 2012, 108, 253002. (19) Franke, R.; Nielson, G. Smooth Interpolation of Large Sets of Scattered Data. Int. J. Numer. Meth. Eng. 1980, 15, 1691–1704. (20) Nguyen, K. A.; Rossi, I.; Truhlar, D. G. A Dual-Level Shepard Interpolation Method for Generating Potential Energy Surfaces for Dynamics Calculations. J. Chem. Phys. 1995, 103, 5522–5530. (21) Bettens, R. P.; Collins, M. A. Learning to Interpolate Molecular Potential Energy Surfaces with Confidence: A Bayesian Approach. J. Chem. Phys. 1999, 111, 816–826. (22) Lancaster, P.; Salkauskas, K. Surfaces Generated by Moving Least Squares Methods. Math. Comp. 1981, 37, 141–158. (23) Ischtwan, J.; Collins, M. A. Molecular Potential Energy Surfaces by Interpolation. J. Chem. Phys. 1994, 100, 8080–8088. (24) Dawes, R.; Thompson, D. L.; Wagner, A. F.; Minkoff, M. Interpolating Moving LeastSquares Methods for Fitting Potential Energy Surfaces: A Strategy for Efficient Automatic Data Point Placement in High Dimensions. J. Chem. Phys. 2008, 128, 084107. (25) Cassam-Chena¨ı, P.; Patras, F. Symmetry-Adapted Polynomial Basis for Global Potential Energy Surfaces-Applications to XY4 Molecules. J. Math. Chem. 2008, 44, 938–966. (26) Braams, B. J.; Bowman, J. M. Permutationally Invariant Potential Energy Surfaces in High Dimensionality. Int. Rev. Phys. Chem. 2009, 28, 577–606. (27) Paukku, Y.; Yang, K. R.; Varga, Z.; Truhlar, D. G. Global Ab Initio Ground-State Potential Energy Surface of N4 . J. Chem. Phys. 2013, 139, 044309.

29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(28) Sumpter, B. G.; Noid, D. W. Potential Energy Surfaces for Macromolecules. A Neural Network Technique. Chem. Phys. Lett. 1992, 192, 455–462. (29) Bowman, J. M.; Braams, B. J.; Carter, S.; Chen, C.; Czako, G.; Fu, B.; Huang, X.; Kamarchik, E.; Sharma, A. R.; Shepler, B. C.; et al., Ab-Initio-Based Potential Energy Surfaces for Complex Molecules and Molecular Complexes. J. Phys. Chem. Lett. 2010, 1, 1866–1874. (30) Jiang, B.; Li, J.; Guo, H. Potential Energy Surfaces from High Fidelity Fitting of Ab Initio Points: the Permutation Invariant Polynomial-Neural Network Approach. Int. Rev. Phys. Chem. 2016, 35, 479–506. (31) Jordan, M. J.; Thompson, K. C.; Collins, M. A. The Utility of Higher Order Derivatives in Constructing Molecular Potential Energy Surfaces by Interpolation. J. Chem. Phys. 1995, 103, 9669–9675. (32) Jordan, M. J.; Thompson, K. C.; Collins, M. A. Convergence of Molecular Potential Energy Surfaces by Interpolation: Application to the OH + H2 → H2 O + H Reaction. J. Chem. Phys. 1995, 102, 5647–5657. (33) Skokov, S.; Peterson, K. A.; Bowman, J. M. An Accurate Ab Initio HOCl Potential Energy Surface, Vibrational and Rotational Calculations, and Comparison with Experiment. J. Chem. Phys. 1998, 109, 2662–2671. (34) Collins, M. A. Molecular Potential-Energy Surfaces for Chemical Reaction Dynamics. Theor. Chem. Acc. 2002, 108, 313–324. (35) Duchovic, R. J.; Volobuev, Y. L.; Lynch, G. C.; Truhlar, D. G.; Allison, T. C.; Wagner, A. F.; Garrett, B. C.; Corchado, J. C. POTLIB 2001: A Potential Energy Surface Library for Chemical Systems. Comput. Phys. Commun. 2002, 144, 169–187.

30

ACS Paragon Plus Environment

Page 30 of 34

Page 31 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(36) Zhang, X.; Braams, B. J.; Bowman, J. M. An Ab Initio Potential Surface Describing Abstraction and Exchange for H + CH4 . J. Chem. Phys. 2006, 124, 021104. (37) Li, J.; Wang, Y.; Jiang, B.; Ma, J.; Dawes, R.; Xie, D.; Bowman, J. M.; Guo, H. Communication: A Chemically Accurate Global Potential Energy Surface for the HO + CO→ H + CO2 Reaction. J. Chem. Phys. 2012, 136, 041103. (38) Meuwly, M.; Hutson, J. Morphing Ab Initio Potentials: A Systematic Study of Ne-HF. J. Chem. Phys. 1999, 110, 8338–8347. (39) https://github.com/MeuwlyGroup/RKHS (accessed May 04, 2017). (40) Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheffler, M.; Von Lilienfeld, O. A.; Tkatchenko, A.; Muller, K.-R. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. J. Chem. Theory Comput. 2013, 9, 3404–3419. (41) Smola, A. J.; Sch¨olkopf, B.; M¨ uller, K.-R. The Connection Between Regularization Operators and Support Vector Kernels. Neural Netw. 1998, 11, 637–649. (42) M¨ uller, K.-R.; Mika, S.; R¨atsch, G.; Tsuda, K.; Sch¨olkopf, B. Handbook of Neural Network Signal Processing; CRC Press Boca Raton, 2001. (43) Weinert, H. L. Reproducing Kernel Hilbert Spaces: Applications in Statistical Signal Processing; Hutchinson Ross Pub. Co. London, 1982; Vol. 25. (44) Sch¨olkopf, B.; Herbrich, R.; Smola, A. J. A Generalized Representer Theorem. International Conference on Computational Learning Theory. 2001; pp 416–426. (45) Berlinet, A.; Thomas-Agnan, C. Reproducing Kernel Hilbert Spaces in Probability and Statistics; Springer Science & Business Media Dordrecht, 2011. (46) Golub, G. H.; Van Loan, C. F. Matrix Computations; JHU Press Baltimore, 2012; Vol. 3. 31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 34

(47) Tikhonov, A. N.; Arsenin, V. I.; John, F. Solutions of Ill-Posed Problems; Winston Washington, DC, 1977; Vol. 14. (48) Sch¨olkopf, B.; Mika, S.; Burges, C. J.; Knirsch, P.; Muller, K.-R.; Ratsch, G.; Smola, A. J. Input Space Versus Feature Space in Kernel-Based Methods. IEEE Trans. Neural Netw. 1999, 10, 1000–1017. (49) Fernandes, P.; Plateau, B. Triangular Solution of Linear Systems in Tensor Product Format. ACM SIGMETRICS Performance Evaluation Review 2001, 28, 30–32. (50) Montavon, G.;

Rupp, M.;

Gobre, V.;

Vazquez-Mayagoitia, A.;

Hansen, K.;

Tkatchenko, A.; M¨ uller, K.-R.; Von Lilienfeld, O. A. Machine Learning of Molecular Electronic Properties in Chemical Compound Space. New J. Phys. 2013, 15, 095003. (51) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Sci. Data 2014, 1 . (52) Blum, L. C.; Reymond, J.-L. 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733. (53) Ruddigkeit, L.; Van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. (54) Von Lilienfeld, O. A.; Tuckerman, M. E. Molecular Grand-Canonical Ensemble Density Functional Theory and Exploration of Chemical Space. The Journal of chemical physics 2006, 125, 154104. (55) Sedgewick, R. Algorithms; Pearson Education India Delhi, 1988. (56) Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. Introduction to Algorithms; MIT press Cambridge, 2001; Vol. 6.

32

ACS Paragon Plus Environment

Page 33 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(57) Eq. 7 in Ref. 10 is equivalent. It should be noted though that in the specification of si (m), si (3) should start from z i + 1 instead of z i . (58) Note that Eq. 9 in Ref. 10 has to include the term d = D. Also, the vector i must contain entries of −1 if m(d) = 3. (59) Johnson, B. On Hyperspherical Coordinates and Mapping the Internal Configurations of a Three Body System. J. Chem. Phys. 1980, 73, 5051–5058. (60) Pack, R. T. Coordinates for an Optimum CS Approximation in Reactive Scattering. Chem. Phys. Lett. 1984, 108, 333–338. (61) Pack, R. T.; Parker, G. A. Quantum Reactive Scattering in Three Dimensions Using Hyperspherical (APH) Coordinates. Theory. J. Chem. Phys. 1987, 87, 3888–3921. (62) Wahba, G. Spline Models for Observational Data; SIAM Philadelphia, 1990; Vol. 59. (63) Steinwart, I.; Hush, D.; Scovel, C. An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory 2006, 52, 4635–4643.

33

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 34 of 34