Iterative Solvers for Empirical Partial Atomic ... - ACS Publications

Iterative Solvers for Empirical Partial Atomic Charges: Breaking the Curse of Cubic Numerical Complexity. Arslan R. Shaimardanov , Dmitry A. Shulga , ...
0 downloads 0 Views 926KB Size
Subscriber access provided by UNIV OF CAMBRIDGE

Computational Chemistry

Iterative solvers for empirical partial atomic charges: breaking the curse of cubic numerical complexity Arslan Shaymardanov, Dmitry A. Shulga, and Vladimir A. Palyulin J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00848 • Publication Date (Web): 18 Mar 2019 Downloaded from http://pubs.acs.org on March 18, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Iterative Solvers for Empirical Partial Atomic Charges: Breaking the Curse of Cubic Numerical Complexity Arslan R. Shaimardanov, Dmitry A. Shulga, Vladimir A. Palyulin*

Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russian Federation

E-mail: [email protected]

Abstract Rational drug design involves a vast amount of computations to get thermodynamically reliable results and often relies on atomic charges as means to model electrostatic interactions within the system. Computational inefficiency often hampers the development of new and wider dissemination of the known methods, thus any source to speed up the calculations without a sacrifice in quality is warranted. At the heart of many empirical methods of calculating atomic 1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 50

charges is the solution of a system of linear algebraic equations (SLAE). The classical method of solving SLAE – the Gauss method – has in general case a cubic computational complexity. It is shown that the use of iterative methods for solving SLAE, characteristic to typical empirical atomic charge calculation methods, makes it possible to significantly reduce the amount of calculations and to obtain a computational complexity approaching a quadratic one. Despite this phenomenon is well known in numerical methods, iterative solvers surprisingly do not seem to have been systematically applied to calculation of atomic charges via empirical schemes. Another finding is the relative values of the matrix elements, determined by the physical grounds of the interactions within the empirical system, generally lead to SLAE’s with welldefined matrices, suited to use with iterative solvers to fasten computation compared to using the non-iterative solvers. This finding broadens the applicability range of atomic charges obtained with empirical methods for such cases as e.g. account of polarizability via “on-the-fly” recalculation of charges in changing surroundings within the force fields in molecular dynamics settings.

Introduction 2 ACS Paragon Plus Environment

Page 3 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Molecular modeling is an important tool in the search for molecules active against biologically relevant protein targets. Electrostatic interactions play a crucial role both in the distant and near recognition of a ligand by the protein. A tremendous growth of application of semi- and nonempirical methods of modeling – due to the recent surge in availability of computational resources – perhaps surprisingly comes in parallel with widening the applicability domain of the known as well as accelerating development of new empirical methods, which are indispensable in such fields as large system modeling and molecular dynamics of biological systems, mainly aimed at developing new drugs. Both fields require a vast amount of computations to get thermodynamically reliable results and often rely on atomic charges as means to model electrostatic interactions within the system. Empirical charge calculation methods are a group of methods to assign partial atomic charges (basically in organic molecules), which are fast, reasonably flexible and provide means for continuous development by refining physics and mathematics approximations underlying empirical formulations. These properties, together with reasonable achievable quality of charges, comparable to the direct fit charges in reproducing quantum chemical calculated

3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 50

molecular electrostatic potential (MEP),1–3 make empirical charges a sensible choice as a platform for continuous development of refined electrostatics for the next generation force fields. Modern molecular modeling is aimed at both providing qualitative insights and very accurate numbers.4 The increased power of modern computers made it possible both to reparametrize the existing methods5,6 (for example, force fields7,8), and also start widely using the approaches that more precisely model certain interactions, for example, halogen bonding,7 “on the fly” recalculation of polarization,9–14 etc. However, the use of advanced/extended modeling methods often leads to a significant growth of the computational complexity of existing algorithms. As a close example, one can compare the molecular simulation via force fields with fixed charges (non-polarized environment) with modeling, which takes into account the polarizability caused by the surrounding atoms and fragments of molecules. In the former case, the charges are calculated once, before the simulation begins. In the latter case, the charges must be re-calculated "on the fly", and the application of such approach is complicated by the increased calculations time taken by accounting of changed environment. In molecular

4 ACS Paragon Plus Environment

Page 5 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

dynamics settings, where system energy and gradient evaluation have to be taken typically 106 to 109 times, the increased computational cost for each step leads to a catastrophic slowdown for the whole modeling experiment. That is why any source of optimization is warranted in order to speed up calculations and still preserve the increased accuracy and validity of interaction description. It can generally be stated, that computational inefficiency often hampers the development of new more accurate methods, thus any way to speed up the calculations without sacrifice in quality is warranted. Initially, our task was just to modify the available software implementation of the empirical method of calculating the charges – via Dynamic Electronegativity Relaxation method (DENR)15 – by replacing the Gauss solver of the SLAE with the Jacobi iterative solver for subsequent implementation of parallel computations algorithms. It was mentioned however that the number of Jacobi method iterations to solve the SLAE only depends on the required accuracy/precision but not on SLAE size (i.e. the number of atoms/equations/variables/etc.). It was not the expected behavior and, surprisingly, we were unable to find literature on either systematic study of using iterative solvers, or at least just a mere fact of iterative solver application to calculating

5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 50

empirical atomic charges. Thus, we decided to study the subject deeper in order to understand the underlying grounds. Empirical methods for calculating atomic charges are based on the solution of a system of linear algebraic equations (SLAE), which size is proportional to the number of atoms in a molecule (N). The computational complexity of the classical SLAE solution method (the Gaussian elimination method) grows cubically with the increase in the dimensionality of the system (the number of atoms), i.e. scales as O(N3), and none of the Gauss-like methods (LUdecomposition, Cholesky factorization) cannot actually break “the curse of cubic numerical complexity” since their numerical complexity is equivalent to that of Gauss method up to a constant. Another disadvantage of Gauss elimination method in its classical form is low parallelization potential. However, the possibility of parallelization is inherent in the iterative methods of solving SLAE (Jacobi method, bi-conjugate gradient method, etc.), since the most complicated operation – matrix-vector multiplication – is well-parallelizable.16 Also, the same feature in principle allows for enjoying benefits of using sparse matrices for storage, which significantly reduces memory costs. It should be mentioned also that there is a group of fast matrix-to-matrix

6 ACS Paragon Plus Environment

Page 7 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

multiplication algorithms, the most well-known of which is Strassen algorithm.17 Those algorithms can reduce Gaussian elimination computational cost to O(Nlog2(7)) ≈ O(N2.81), for Strassen algorithm, and lower, up to O(N2.38), for Coppersmith–Winograd algorithm.18 However, the only practically applicable method of that group (Strassen method) gives insignificant complexity reduction while the others (such as Coppersmith–Winograd algorithm) only provide an advantage for matrices so large that they cannot be processed by modern hardware.19 An alternative, compared to the deterministic methods described above, way to solve SLAE is by using iterative methods (solvers). The performance of iterative methods to solve SLAE is known to depend heavily on the properties of the SLAE matrix, specifically its condition number. The properties of SLAE matrices, pertinent to empirical methods, depend in turn on both the physics grounds built into the method and the numerical values of parameters used. In this research we study computational complexity needed to obtain atomic charges via iterative solvers for the two methods chosen as representative ones for the larger groups of methods. The first group is represented by DENR method (in current implementation), which is based on incomplete electronegativity equalization principle (hence Dynamical ElectroNegativity Relaxation) and as such accounts for only the inductive effect (as for e.g. Gasteiger method 20). The second group which covers different but algebraically and numerically similar methods 7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 50

based on complete electronegativity principle (or chemical potential equalization) is represented by perhaps the most cited and straightforward EEM method. The methods of the latter group inherently account for the through space polarizability effect – dependence of atomic charges on the charges on the other atoms depending on the separating geometrical distance, not chemical bonds. The main purpose of this work is to study the applicability of iterative methods for solving SLAE to obtain charges in two empirical schemes for charge calculation – DENR15 and EEM.21 To this end we take five model molecular systems with heavily varying number of atoms in order to check the scalability of each solver – either iterative or the traditional Gauss one – in the task of calculation of atomic charges. Additionally, we study the origin of increase of performance for atomic charge calculation. Despite Jacobi iterative solver applies well to DENR charge calculation, this solver has a critical flaw for other widespread empirical methods – due to its fundamentals it cannot operate with matrices having zero(es) on the main diagonal.16 That makes it impossible to use Jacobi iterative method for solving SLAE, appearing in EEM, so we made a decision to go further and

8 ACS Paragon Plus Environment

Page 9 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

replace Jacobi iterative solver with biconjugate gradient method (bicg) solver, which has certain advantages over both the simpler conjugate gradient and Jacobi iterative methods. The results obtained in this work show that due to the physical fundamentals of empirical charge calculation methods, the effective computational complexity of calculating charges approaches a quadratic one, O(N2). Moreover, by using iterative solvers one can control the appropriate level of accuracy needed and enjoy fewer iterations for less accurate solutions, which makes possible a reasonable trade-off between the accuracy and the computational burden. The same property makes calculation of atomic charges, which starts from an already close approximation to the solution, faster. This property will be of particular use for incorporation of the empirical charge methods into molecular dynamics, since at each time step the system (judged by the interatomic distances) changes only slightly at typical modeling temperatures. The striking additional result is that by using the sparse matrix storage and special matrix-vector multiplication one can achieve even sub-quadratic run-time complexity and simultaneously reduce the amount of consumed memory. The results obtained expand the applicability domain of empirical charge methods in practical modeling.

9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 50

In what follows we first choose the molecular systems of different size and obtain the reference DENR15 and EEM21 charges by Gauss method. Next, we estimate the convergence of solutions obtained by the iterative solver to the reference solution. After that, we estimate how many iterations are necessary to calculate atomic charges with finite still reasonable accuracy. Then we analyze the reasons behind the fast charge calculation based on physics and numerical matrix properties which arise within each of the representative charge methods – DENR and EEM. The prospects to reach linear complexity scaling are outlined next. After that we describe the similar works for the different application fields as well as the efforts to optimize EEM for large molecules.22,23 Finally, we outline the future directions for the development.

Methods Test set For charges calculation, four molecules which differ greatly in the number of atoms were chosen: aspirin (21 atoms), taxol (113 atoms), insulin (PDB ID:3I40, 780 atoms), MHC II (PDB ID:1I3rR, one of the subunits, 3423 atoms), ribosome maturation factor Rea1 (PDB ID:6HYP, 12129 atoms) (Fig. 1). Such choice makes it possible to estimate the effective computational complexity of charge-calculation algorithms. 10 ACS Paragon Plus Environment

Page 11 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. Molecular systems to check scalability of method to solve SLAE depending on the number of atoms. From left to right: aspirin, taxol, insulin, MHC II subunit, ribosome maturation factor Rea1

The test set molecules were prepared in the following way. Small molecules (1-2) were downloaded from DRUGBANK.24 Larger protein molecules were downloaded from RSCB site,25 then preprocessed with tleap utility from the amber package,26 after that the longest chain was chosen for the further operations. It should be mentioned that for empirical methods we use in this work, the protonation state and even partially incomplete sequence or several missing atoms do not affect the estimates of the computational complexity.

SLAE solution methods 11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 50

The Gauss method (as a method for finding the reference solution) and the bi-conjugate gradient method (an iterative method) were used as methods for solving SLAE. The Gaussian elimination method (LU-decomposition) is a classical method for solving SLAE.16 Its computational complexity (as well as the computational complexity of any Gausslike method) is proportional to N3 and it is hardly amenable to parallelization since the algorithm steps (forward elimination and back substitution) both have strong dependencies of the inner calculations order. Another drawback is that even for a sparse matrix supplied on input, a dense matrix of the LU-decomposition is obtained as a result of the solution.

Iterative SLAE solving methods Iterative methods of solving SLAE is the group of methods based on the use of an initial guess to generate a sequence of improving approximate solutions, in which the current approximation is derived from the previous ones. Each iteration could be described by the following formula (1)

𝐱(𝑛) = H𝐱(𝑛 ― 1) + 𝐠

(1)

where x(n) — n-th (current) approximation, x(n–1) — (n–1)-th (previous) approximation (or an initial guess), H — matrix, g — vector. 12 ACS Paragon Plus Environment

Page 13 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The biconjugate gradient method (bicg) is an iterative Krylov subspace method for solving the SLAE.16 The computational complexity of each iteration is of the order of N2 (multiplication of a matrix by a vector). In contrast to the conjugate gradient method, it is suitable for solving SLAE with asymmetric matrices (necessary for DENR).16 Unlike simpler Jacobi iterative method, it is suitable to solve SLAE with zero elements on the main diagonal (necessary for EEM).16 As iterative SLAE solving methods require initial guess values, a vector of zero initial charges was used in each case – assuming electroneutral molecules. It should be noted that computational complexity of empirical charge calculation methods being tested does not depend on whether the whole molecule or its parts are neutral or formally charged. To measure the runtime of each SLAE solving method, SciPy27 implementation of bicg (scipy.sparse.linalg.bicg) and Gaussian elimination (scipy.linalg.solve) were used. However, in order to better control the number of iterations and be able to measure convergence error values on each step we had to use our own Python implementation of bicg method.

Charges calculation methods As methods for calculating partial atomic charges, Electronegativity Equalization Method (EEM)21 and Dynamic Electronegativity Relaxation method (DENR)15 were chosen. Both 13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 50

methods have been software implemented using Python programming language (with an option of choosing the SLAE solver). It should be noted that EEM and DENR methods were chosen as the representative ones among the other well-known methods9,28–34 (which eventually result in similar complexity due to SLAE solution), thus, generally, the same technique could be applied to speed up those methods too. However, the methods selected represent different subclasses in terms of physics behind them which leads to qualitatively different predicted physical and numerical properties, as it will be shown later. In particular, several methods21,28,30–32 could be conceptually generalized as EEM-like in terms of the energy function composed and the way to minimize it. However, as Cheli et al.35 found, EEM (also called fluctuating charge) approach leads to qualitatively incorrect polarization increase with the increase of molecule size. This phenomenon, which was later called "superlinear polarizability",36 had spurred research in the field resulted in the proposal of atom-atom charge transfer (AACT)35 and split-charge equilibration (SQE)37 methods, which generally state that the charge transfer during the electronegativity (chemical potential) equilibration should be restricted to covalently bound

14 ACS Paragon Plus Environment

Page 15 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

atoms (an exception is made for conjugated systems, however). This assumption leads to more correct qualitative and eventually quantitative description of charge distribution within the organic molecules.38–40 DENR method approach is similar to AACT and SQE in that it is based on the redistribution of charge along the covalent bonds. The main difference with the current DENR implementation is the level of atomic environment perturbation included in the empirical energy as function of charge. So far DENR accounts for the perturbation caused by covalently bound neighboring atoms, which is the most significant perturbation describing an atom appearing in its chemical environment in a molecule. DENR approach affords natural supplementing its empirical energy function with the through space charge-charge polarization, which is generally less significant – in the level of perturbation caused – but still important contribution. Once it is done, we believe DENR method will become well suited for correctly describing both through covalent bonds (induction) and through space (polarization) effects. Thus, we consider DENR approach as a promising platform for developing molecular electrostatics for the new generation force fields. Additionally, since the current DENR implementation does not contain explicit through space interactions it is shown below (section

15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 50

“Linearity prospects”) that it represents a limiting case in which sub-quadratic memory and runtime scaling is possible.

Electronegativity Equalization Method EEM (2) is an empirical partial atomic charges calculation method.21,41,42 It is based on the principle of complete equalization of electronegativities. It was developed and first parameterized in 1986;21 in 2002 Bultinck et al. refreshed the approach and refined the charge parameters.41,42 Electronegativity equalization with charge conservation results in a SLAE of size N+1. To calculate the charges, the parameters obtained in Ref. 41 were used. The method attracted additional attention after massive parametrization in order to reproduce the charges calculated at different quantum-chemical approximations.7 This is also the simplest yet effective method to account for the polarizability, which is the dependence of the atomic charge not only on charges on the covalently bound atoms, but also on the distant (not chemically bound) atoms.9 Different implementations of EEM are widely used in cheminformatics software such as OpenBabel.43

16 ACS Paragon Plus Environment

Page 17 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(

2𝜂1 1 𝑅12 1𝑅 2𝜂2 12 ⋮ ⋮ 1𝑅 1𝑅 𝑁1 𝑁2 1 1

)( ) ( )

⋯ 1 𝑅1𝑁 ⋯ 1 𝑅2𝑁 ⋱ ⋮ ⋯ 2𝜂𝑁 ⋯ 1

―1 𝑞1 ― 𝜒1 𝑞 ― 𝜒2 2 ―1 ⋮ = ⋮ ⋮ 𝑞𝑁 ― 𝜒𝑁 ―1 𝜒𝑒𝑞 𝑄𝑚𝑜𝑙 0

(2)

where ηi — hardness of the i-th atom, χi — electronegativity of the i-th atom, qi — charge of the i-th atom, Rij — distance between the i-th and j-th atoms, Qmol – total charge of all atoms.

Dynamic Electronegativity Relaxation method DENR (3) is an empirical iterative partial atomic charges calculation method.15 It is based on incomplete equalization of electronegativity (hence “relaxation”) and charge redistribution along molecular bonds (i.e. in the molecular graph). In this work, two DENR iterations were performed during the calculation. Each DENR iteration requires a solution of SLAE (3). To calculate the charges, the parameters described in Ref. 44 were used. A ⋅ 𝐪(𝑛) = 𝐪(𝑛 ― 1) ― 𝑐𝛥𝑡 ⋅ 𝐚0 A = (I + 𝑐𝛥𝑡 ⋅ B0), B0 = L𝛈, 𝐚0 = L𝛘 𝑁

𝑄𝑚𝑜𝑙 =

∑𝑞

(𝑛) 𝑖

𝑖=1

𝑁

=

∑𝑞

(𝑛 ― 1) 𝑖

(3)

= 𝑐𝑜𝑛𝑠𝑡

𝑖=1

where q(n) — the vector of charges on n-th iteration, q(n–1) — the vector of charges on (n–1)th iteration, L — Laplace matrix, η — the matrix with hardnesses on the diagonal, χ — the vector

17 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 50

of electronegativities, cΔt=0.05 — the empirical adjusted parameter, Qmol — the total charge of all atoms.

Accuracy control The reference charges for EEM and DENR methods were calculated using the Gaussian elimination method for solving the corresponding SLAE. The charges deviation (RMSE) at each iteration of the bicg method was calculated relatively to these charges. The criterion of insignificant relative change of charges from iteration to iteration was used to study the overall convergence of the charges, obtained by the iterative method, to the reference values. In the case of non-complete convergence, the criterion for stopping the iterations upon achieving the RMSE value order of ~ 10-6 au charge was used to estimate the number of iterations necessary for the approximate solution. In the case of complete convergence, the criterion for stopping the iterations was reaching the constant value of RMSE(Ax(n), b) i.e. the minimal achievable RMSE of the right hand SLAE part reproduced by the solution x(n) obtained on the n-th bicg iteration up to double precision floating point epsilon which is about 10-16. The number of iterations of the bicg method for the DENR method was calculated as the total number of iterations at both DENR steps. The precision specified in Table 1 is the deviation 18 ACS Paragon Plus Environment

Page 19 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(RMSE) of the final solution, calculated relatively to the solution obtained by Gaussian elimination method.

Results and discussion First, we show the amount of computations necessary for the complete convergence of charges obtained by iterative methods to the reference charges obtained by classical (LU) Gauss method. Second, we consider the possibility of setting up loose quality criteria for iterative solvers. Next, we interpret the fast convergence achieved in terms of physics and mathematics inherit to the empirical charge calculation methods. Finally, the prospects for further optimization and usage are outlined.

Complete convergence At first, we decided to study the overall convergence of bicg method for both of charge calculation methods described above (EEM and DENR) to prove that we can achieve a final solution precision of the same order the Gaussian elimination method achieves. It is shown (Fig. 2 and Fig. 3) that eventually the solution obtained by bi-conjugate gradient method converges to the solution obtained by the Gauss method – the error of solution found

19 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 50

by the bicg method has the same order of magnitude as the error of the solution obtained by the Gaussian elimination method. It can be seen when comparing the behavior of the two norms with iterations. First, the norm of the residual, RMSE(Ax(n), b), of the left-hand part matrix, A, the solution obtained by the iterative method on the n-th iteration, x(n), and the right-hand vector b, decreases until reaching the plateau. Second, "Gauss solution discrepancy" – the norm of the residual, RMSE(Axgauss, b), between the left-hand part matrix, A, with the Gaussian solution, xgauss, and the right-hand vector b, sets the numerical accuracy limit reached by the Gauss method. The RMSE(Axgauss, b) depends on the condition number of the SLAE matrix45 and machine epsilon for double floating point type. Upon reaching this divergence value, the iterations do not lead to an increase in the accuracy of the solution despite the relative change of the solution vector still continues to decrease. As can be seen from the graph – the residuals went on the plateau. Interesting, the discrepancy between the two solutions – RMSE(x(n), xgauss), obtained on the

n-th bicg iteration relative to the Gauss solution, xgauss – also stops decreasing upon reaching the same plateau. It is interesting to note that DENR method seems to have better conditioned

20 ACS Paragon Plus Environment

Page 21 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

matrix than EEM method since both residual norms, the RMSE(Axgauss, b) and RMSE(Ax(n), b), for the former method are much closer to the machine epsilon for double floating-point type, which is of the order 10-16.

Figure 2. Value of errors on each iteration of biconjugate gradient method while calculating charges for one of MHC II (1I3R) subunit by EEM method, where RMSE (Ax(n), b) is the RMSE of the right hand SLAE part reproduced by the solution x(n) obtained on the n-th bicg iteration; RMSE(x(n), xgauss) is the RMSE of solution, obtained on the n-th bicg iteration relative to the Gauss solution; max(|x(n+1)–x(n)|) is the maximal difference in the components of solution vector from different iterations; the RMSE(Axgauss, b) is the RMSE of the right hand SLAE part reproduced by the Gauss solution. 21 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 50

Figure 3. Value of errors on each iteration of biconjugate gradient method while calculating charges for one of MHC II (1I3R) subunit by 2-iterational DENR method, where RMSE (Ax(n), b) is the RMSE of the right hand SLAE part reproduced by the solution x(n) obtained on the n-th bicg iteration; RMSE(x(n), xgauss) is the RMSE of solution, obtained on the n-th bicg iteration relative to the Gauss solution; max(|x(n+1)–x(n)|) is the maximal difference in the components of solution vector from different iterations; the RMSE(Axgauss, b) is the RMSE of the right hand SLAE part reproduced by the Gauss solution.

Three points should be stressed here. First, iterative solution converges to the reference solution (by Gaussian elimination). Second, the number of iterations necessary to achieve even this level of accuracy is less than the dimensionality of the system (the columns "complete 22 ACS Paragon Plus Environment

Page 23 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

convergence" in Table 1, Fig. 4). That means a significant reduction in computations is achieved due to the use of iterative solvers to obtain charges via the empirical methods studied. And third, it should be mentioned that all of the error criteria (RMSE (Ax(n), b), RMSE(x(n), xgauss) and max(|x(n+1)–x(n)|)) have roughly the same order therefore the simplest of them, max(|x(n+1)–x(n)|), can be used for the convergence control as the most convenient one.

Table 1. The number of bicg method iterations needed for achieving a given solution accuracy (RMSE) for EEM and DENR charges calculation methods.

DENR

DENR (RMSE ~1×10–6)

N, atoms

(complete convergence)

EEM

EEM (RMSE ~1×10–6)

(complete convergence)

Iters

RMSE

Iters

RMSE

Iters

RMSE

Iters

RMSE

21 (aspirin)

12

5.09×10–6

37

4.19×10–17

21

1.30×10–6

28

2.29×10–16

113 (taxol)

15

1.55×10–6

57

8.82×10–17

34

2.15×10–6

92

6.68×10–15

780 (insulin)

15

1.12×10–6

56

1.04×10–16

58

1.21×10–6

168

1.14×10–14

3423 (MHC II subunit)

15

1.23×10–6

56

1.08×10–16

89

1.14×10–6

291

3.93×10–13

23 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

12129 Rea1)

(factor

16

3.65×10–6

55

1.30×10–16

303

Page 24 of 50

1.01×10–6

718

4.06×10–12

Figure 4. The number of operations (i.e. computational complexity) required to solve the SLAE appearing in EEM and DENR methods in the case of complete convergence.

Partial (incomplete) convergence After the general convergence has been established it is interesting to study how many iterations are necessary to achieve practically useful solutions. Calculation of charges on the selected six molecules by iterative method showed (Table 1) that only several dozen iterations are required to achieve a given divergence of the order RMSE = 10–6 (a reasonable guess for many real applications) relative to the reference charges, and this number only slightly depends

24 ACS Paragon Plus Environment

Page 25 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

on the size of the molecules. In case the other order of magnitude of the accuracy criterion is chosen, the number of iterations required changes only linearly (Fig. 2 and Fig. 3). Since each step of the iterative methods has quadratic computational complexity on the number of atoms (multiplication of a matrix by a vector), a fixed number of iterations indicates that the computational complexity of the whole method is actually comparable to the computational complexity of each iteration (Fig. 5).

Figure 5. The number of operations (i.e. computational complexity) required to solve the SLAE appearing in EEM and DENR methods in the case of incomplete convergence.

25 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 50

Due to the structure of iterative SLAE solving methods (dependence on initial guess values), closer (i.e. less RMSE to solution) initial guess values (for example, obtained on a previous step of molecular dynamics) should require even fewer number of iterations to recalculate atomic charges in changed environment (i.e. to account polarization).

Physical and mathematical interpretation To make the following explanation clearer, examples of SLAE appearing for methanol (Fig. 6) charges calculation by both of EEM (2) and DENR (3) methods are presented (4, 7).

Figure 6. Methanol molecule.

Origin of fast convergence The effective quadratic computational complexity is achieved due to the fact that the matrices obtained in the charges computation methods satisfy the basic criterion of fast convergence for iterative methods – the diagonal predominance of the SLAE matrix. There are two main reasons for that. First, due to the fundamental relationship between the parameters of electronegativity 26 ACS Paragon Plus Environment

Page 27 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

and hardness for atoms with interatomic perturbation parameters (taking into account the influence of other atoms on the charge on a given atom). Electronegativity and hardness are the first and second derivatives of energy with respect to charge; both quantities are nonnegative by their physical meaning. Second, the main contribution to the magnitude of the charge on an atom is made by a limited number of its neighbors.

Properties of empirical atomic charges calculation methods In the EEM method (2), there is the double hardness (2 ∙ η) on the diagonal of the matrix (4), and, compared to it, even the largest off-diagonal terms (1 / R), as a rule, take a smaller value. Such a phenomenon can be explained by the physics fundamentals of electron density redistribution: atoms’ ability to keep their electron density near their cores has more influence on their electron density than the electron density disturbance/perturbation caused by the neighbor atoms presence. One can consider the redistribution of charges (electronic density) as a consequence of perturbation of an isolated atom caused by the potentials of neighboring atoms.

27 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6

(

0.100 0.659 0.484 0.484 0.484 0.374 0.273 1.000

0.200 0.484 1.319 0.295 0.297 0.257 0.231 1.000

0.300 0.484 0.295 1.319 0.297 0.257 0.231 1.000

0.400 0.484 0.297 0.297 1.319 0.258 0.186 1.000

0.500 0.374 0.257 0.257 0.258 1.089 0.544 1.000

0.600 0.273 0.231 0.231 0.186 0.544 1.319 1.000

―0.000 ―1.000 ―1.000 ―1.000 ―1.000 ―1.000 ―1.000 0.000

Page 28 of 50

)( ) ( ) 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5 𝑞6 𝜒𝑒𝑞

=

―0.362 ―0.206 ―0.206 ―0.206 ―0.730 ―0.206 ―0.000

(4)

The DENR method (3) effectively simulates the perturbation caused by the neighboring atoms by using a molecular graph represented by the Laplace matrix (5),46 on its diagonal is, in fact, the number of non-zero elements in the line, while the off-diagonal elements take the value 0 (in case of the absence of the bond) or –1. The matrix B0 (6) is obtained from the Laplace matrix by multiplying it by the diagonal matrix of the hardness. And finally, the diagonal predominance of the resulting SLAE matrix (7) is achieved by adding of identity matrix to the matrix with offdiagonal elements L ∙ η multiplied by a small value of cΔt = 0.05. As a result (7), the diagonal terms of such a matrix are large in comparison with η ∙ cΔt.

L

=

1 2 3 4 5 6

(

―1 4 ―1 ―1 ―1 ―1 0

―2 ―1 1 0 0 0 0

―3 ―1 0 1 0 0 0

―4 ―1 0 0 1 0 0

―5 ―1 0 0 0 2 ―1

―6 0 0 0 0 ―1 1

28 ACS Paragon Plus Environment

)

(5)

Page 29 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

B0 = L ⋅ 𝑑𝑖𝑎𝑔(𝜂1 𝜂2 𝜂3 𝜂4 𝜂5 = L ⋅ 𝑑𝑖𝑎𝑔(2.957 19.989 19.989 19.989 11.828 ―19.989 ―19.989 ―19.989 ―2.957 19.989 0 0 ―2.957 0 19.989 0 ―2.957 0 0 19.989 ―2.957 0 0 0 0 0 0 0

( 1 2 3 4 5 6

(

― 0.100 1.591 ―0.148 ―0.148 ―0.148 ―0.148 0.000

―0.200 ―0.999 1.999 0.000 0.000 0.000 0.000

―0.300 ―0.999 0.000 1.999 0.000 0.000 0.000

―0.400 ―0.999 0.000 0.000 1.999 0.000 0.000

―0.500 ―0.046 0.000 0.000 0.000 1.091 ―0.046

𝜂6) = 0.915 7.523) = ―0.915 0 0 0 0 0 0 0 1.83 ―7.523 ―0.915 7.523

)

―0.600 0.000 0.000 0.000 0.000 ―0.376 1.376

(6)

)() ( ) 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5 𝑞6

=

―0.162 ―0.101 ―0.101 ―0.101 ―0.482 ―0.340

(7)

Speedup techniques According to the physics basics of EEM and DENR methods discussed above, we can arrange them on a scale of the level of influence from the atoms’ environment they take into account. At one end of this scale there is the DENR method, which only considers the inductive effect, i.e. redistribution of charge through bonds. At the other extreme of this scale there is the EEM method, which considers all electrons of the whole molecule as common, i.e. redistribution of charge through space. However, EEM seems to neglect the inductive effect, relying completely on through space charge-charge interactions. The latter are evidently of the less significant order of perturbation compared to the influence of the covalently bound atoms (the

29 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 50

inductive effect per se). Thus, the two charge calculation approaches considered naturally set up the two limiting cases for the computational complexity.

Linearity prospects Computational complexity of bicg method is determined by the most complex inner-method operation – matrix-vector multiplication (1). In such a case, bicg computational complexity achieves a quadratic one. But due to the DENR method “physical” principles such as: Laplacian matrix sparseness and maximal valency of atoms in organic molecules M ≤ 5 (generally small number, compared to the whole Laplacian size – N), there is also a set of optimizations which could be made: a storage method optimization and multiplication optimization. Both are achieved via the use of the special sparse matrix storage (only non-zero elements are stored) and matrix-vector multiplication which iterates only through non-zero stored elements. Due to the constant (M ≤ 5) non-zero row/column member number, the storage used for a sparse matrix is less or equal N·M, thus it is possible to achieve a linear in N memory consumption and linear computational complexity of such matrix-vector multiplications. To prove our concept about a SLAE solvers polynomial computational complexity dependency of atoms count, we measured runtime of SLAE solving in DENR charges 30 ACS Paragon Plus Environment

Page 31 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

calculation method using two types of matrix representation – dense and sparse ones. SciPy implementations of sparse matrix (scipy.sparse.csr_matrix), bicg and Gauss elimination methods were used. The convergence criteria were set of for complete convergence case described above. No optimizations have been made. Current implementation is intended only to demonstrate the actual dependence of runtime on N, i.e. the practical computational complexity. Due to the time measurement artifacts (uncertain time scalability behavior) in the case of small molecules (N < 300 atoms), it was decided to extend the test set by including additional large molecules (Table 2): 2RTV (308 atoms), 5O0U (513 atoms), and 5YH4 (2772 atoms). For the same reason the first two small molecules were excluded from the power fit. Table 2. Extended molecule test set used for runtime measurement

Molecule name Aspirin* Taxol*

2RTV

5O0U

3I40*

5YH4

1I3R*

6HYP*

308

513

780

2772

3423

12129

/ RSCB id

N

21

113

* – molecules used in the test for the number of iterations.

31 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 50

As it can be seen in Fig. 7, Gauss elimination method has a cubical complexity (Fig. 7, green) and bicg method has a quadratic one (Fig. 7, yellow). The former fact coincides with the theory47 whereas the latter one well corresponds to the findings described in the current research. In the case of sparse matrix implementation, a visible speedup is achieved. Despite the linear complexity has not been achieved here strictly, the power fit curve has the coefficient less than 2 (quadratic) and it is well beyond the statistical uncertainty (Fig. 7, blue). Thus, a sub-quadratic dependency could be reached with the help of combination of iterative solvers and special sparse matrix calculations. The first two points corresponding to small molecules were excluded from the limiting complexity analysis, because the main focus is on the power of dependence of the computational complexity on the size of the system, N, when N is large. One can also see that for small molecules Gauss method is even preferred above the iterative solution in our settings.

32 ACS Paragon Plus Environment

Page 33 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 7. Runtime (sec) of different SLAE solving methods for calculations of charges via DENR method on a set of different size (N atoms) molecules.

EEM optimization Due to the EEM physics and mathematics basics discussed above (dense matrix in SLAE with small magnitude but strictly non-zero off-diagonal elements) it is impossible to perform the same optimization trick (sparse matrices usage). But there are already some known methods of EEM speedup. The obtained results could be compared to the published earlier research aimed at speeding up EEM calculations especially for large molecules, developed by Ionescu et al.22,23 who applied the Divide and Conquer approach and employed the distance Cut-off scheme in two 33 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 50

forms (the EEM Cutoff, the EEM Cover), both leading to the same asymptotic complexity. The main idea is to localize the considered interactions by splitting the whole EEM system into N subsystems, taking a pivot atom and its nearest neighbors (r < R), then solve each of the subsystem’s SLAE in parallel by Gauss method. This results in ~O((R2)3N + R2NlogN) complexity in operations and ~O(R2+NlogN) (for each subsystem) in memory consumption. Using the iterative methods discussed in the work, computational complexity of solving a SLAE system with such a matrix lies between the Full (no cut-off applied) EEM approach complexity, ≈O(N2.5), and DENR approach (which can be considered as an extreme case of empirical charge calculation methods, in which only the inductive effect is modeled not the through space polarization) complexity ≈O(N2). The latter can be considered as a SLAE matrix with as many non-zero off-diagonal elements as there are covalent bonds of each atom, which is lesser than the number of atoms falling into a physically reasonable distance cut-off necessary for through space polarization account. One can compare the complexity of the methods described by substituting R2=M, which is the average number of neighbors within the distance R (the cut-off radius), which leads us to

34 ACS Paragon Plus Environment

Page 35 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the following estimated EEM Cutoff approach complexity: O(M3N + MNlogN) ≈ O(M3N). It was empirically observed that in proteins, on a cutoff distance R=10Å, the average number of neighbors M is approximately equal to 200 atoms. Direct comparison of computational complexities, necessary to iteratively solve Full EEM SLAE and using the EEM Cutoff approach, i.e. O(N2.5) against O(2003N), reveals that the direct iterative solution of EEM SLAE by the proposed approach has lower computational complexity in case of N < (2003)1/(2.5-1) ≈ 40000 atoms. A similar comparison of memory consumption complexities, O(N2) against O(NM2), reveals that the proposed direct iterative approach has lower computational complexity in case of N < 2002 = 40000 atoms. It should be however noted, that Ionescu’s approach could be further modified by replacing Gauss solver with an iterative one which theoretically leads to even less computational complexity ≈O(M2N). But, taking into account the average number of M=200, it can be seen (Table 1) that for such small systems, in case of complete convergence settings, the total number of iterations is almost equal to the size of a molecule, meaning either a negligible or no

35 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 50

speedup is possible. So, in the combined method a speedup can be achieved only in the case of using the incomplete convergence criteria. Additionally, it should be mentioned that the complexities provided above consider only the solution of SLAE. As a preparation step a cutoff scheme itself brings an additional cost of calculating all pair distances in the system, which requires O(N2) operations regardless of the method it is used for.48

Perspectives It is shown (Table 1) that the use of iterative solvers of SLAE in empirical charge methods makes it possible to move from the cubic computational complexity to the quadratic one (Figures 4 and 5). There are also additional advantages of using iterative methods. Firstly, the possibility of easy parallelization of such methods due to the internal method structure is possible: the most complex operation (matrix-vector multiplication) is highly amenable for parallelization. Secondly, in the case of using the initial values of charges close to the final ones (small RMSE at the start), even fewer iterations will be required. All the above opens up the opportunities for using this approach in molecular dynamics, where the next state clearly

36 ACS Paragon Plus Environment

Page 37 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

depends on and generally is very similar to the previous one, which is the basis of the iterative methods themselves. Analyzing the results obtained, we note that a similar phenomenon (reduction of computational complexity to a quadratic one using iterative methods on the similar systems of equations) is observed in other areas, for example, in the calculation of molecular electrostatics by the boundary element method,49 in solving integral equations of potential theory.50 The origin is the same: the SLAE matrix is sparse with a fixed number of nonzero (or non-negligible) values in each row of the matrix. Thus, in other areas of science, such solutions have already been substantiated and used. Moreover we found that a very similar approach was undertaken to speed-up calculation of polarization using induced dipoles approach, where a similar SLAE system arises.51–53 But, despite the fact that such an approach is commonly known in numerical methods and widely applied in some other fields of science which deal with sparse SLAE systems, we had not found any signs of application of such method for calculation of partial atomic charges via empirical methods.

37 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 50

We intend to use DENR as a platform for electrostatics calculation for more precise electrostatic interaction modeling within the next generation force fields. To that end, we estimated the optimization potential inherit to the method. It is reasonable to assume that when DENR will be supplemented with through space interactions (polarization), employing potentials behaving as 1/R, its computational complexity and memory consumption will be equal to those of EEM method as discussed above.

Conclusions Thus, it has been shown that the use of iterative methods for solving SLAE, arising in empirical methods of calculating charges, allows one to achieve a number of advantages. First, the possibility to parallelize the calculations. Second, it becomes possible to accelerate the recalculation of charges in the case of small changes of atoms environment. Third, the most important thing is that the effective quadratic computational complexity of the calculation is achieved, while the number of iterations necessary for solving the SLAE by the iterative method is practically independent of the size of the system. Finally, a useful finding is that all residual discrepancies (errors), which characterize the goodness of a solution, are well correlated to

38 ACS Paragon Plus Environment

Page 39 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

each other and with some dispersion monotonically decrease with the increase of the iteration number. This allows one to choose the most accessible one – the relative solution change metrics – to access the overall quality/accuracy of solution. The striking fact is the numerical properties leading to good conditioning of the SLAE matrix, necessary to obtain faster convergence, are determined by the physics interpretation of the relative numerical importance of different terms. Therefore, well-defined physics leads to the possibility of faster convergence. The understanding obtained and advantages shown allow for expanding the domain of practical “on-the-fly” recalculation of partial atomic charges in everyday molecular modelling. For example, they urge to reconsider the applicability of the method of accounting for polarizability by recalculating atomic charges in a changing environment to a wide range of problems of applied molecular modeling.

39 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 50

References (1)

Berente, I.; Czinki, E.; Náray-szabó, G. A Combined Electronegativity Equalization and Electrostatic Potential Fit Method for the Determination of Atomic Point Charges. J.

Comput. Chem. 2007, 28, 1936–1942.

(2)

Puranen, J. S.; Vainio, M. J.; Johnson, M. S. Accurate Conformation-Dependent Molecular Electrostatic Potentials for High-Throughput in Silico Drug Discovery. J.

Comput. Chem. 2009, 1722–1732.

(3)

Shulga, D. A.; Oliferenko, A. A.; Pisarev, S. A.; Palyulin, V. A.; Zefirov, N. S. Parameterization of Empirical Schemes of Partial Atomic Charge Calculation for Reproducing the Molecular Electrostatic Potential. Dokl. Chem. 2008, 419, 57–61.

(4)

Neese, F.; Atanasov, M.; Bistoni, G.; Maganas, D.; Ye, S. Chemistry and Quantum Mechanics in 2019: Give Us Insight and Numbers. J. Am. Chem. Soc. 2019, jacs.8b13313.

(5)

Harder, E.; Damm, W.; Maple, J.; Wu, C.; Reboul, M.; Xiang, J. Y.; Wang, L.; Lupyan, D.; Dahlgren, M. K.; Knight, J. L.; Kaus, J. W.; Cerutti, D. S.; Krilov, G.; Jorgensen, W. L.; 40 ACS Paragon Plus Environment

Page 41 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Abel, R.; Friesner, R. A. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J. Chem. Theory Comput. 2016, 12, 281–296.

(6)

Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713.

(7)

Soteras Gutiérrez, I.; Lin, F. Y.; Vanommeslaeghe, K.; Lemkul, J. A.; Armacost, K. A.; Brooks, C. L.; MacKerell, A. D. Parametrization of Halogen Bonds in the CHARMM General Force Field: Improved Treatment of Ligand–protein Interactions. Bioorg. Med.

Chem. 2016, 24, 4812–4825.

(8)

Horta, B. A. C.; Merz, P. T.; Fuchs, P. F. J.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. J. Chem. Theory Comput. 2016, 12, 3825–3850.

(9)

Rick, S. W.; Berne, B. J. Dynamical Fluctuating Charge Force Fields: The Aqueous Solvation of Amides. J. Am. Chem. Soc. 1996, 118, 672–679.

41 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 50

(10) Halgren, T. A.; Damm, W. Polarizable Force Fields. Curr. Opin. Struct. Biol. 2001, 11, 236–242.

(11) Cieplak, P.; Dupradeau, F.-Y.; Duan, Y.; Wang, J. Polarization Effects in Molecular Mechanical Force Fields. J. Phys.: Condens. Matter 2009, 21, 333102.

(12) Yu, H.; van Gunsteren, W. F. Accounting for Polarization in Molecular Simulation.

Comput. Phys. Commun. 2005, 172, 69–85.

(13) Mackerell, A. D. Empirical Force Fields for Biological Macromolecules: Overview and Issues. J. Comput. Chem. 2004, 25, 1584–1604.

(14) Ren, P.; Ponder, J. W. Consistent Treatment of Inter- and Intramolecular Polarization in Molecular Mechanics Calculations. J. Comput. Chem. 2002, 23, 1497–1506.

(15) Shulga, D. A.; Oliferenko, A. A.; Pisarev, S. A.; Palyulin, V. A.; Zefirov, N. S. Fast Tools for Calculation of Atomic Charges Well Suited for Drug Design. SAR QSAR Environ. Res. 2008, 19, 153–165.

(16) Saad, Y. Iterative Methods for Sparse Linear Systems Second Edition. Philadelphia, 42 ACS Paragon Plus Environment

Page 43 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Pennsylvania SIAM 2003.

(17) Strassen, V. Gaussian Elimination Is Not Optimal. Numer. Math. 1969, 13, 354–356.

(18) Coppersmith, D.; Winograd, S. Matrix Multiplication via Arithmetic Progressions. J. Symb.

Comput. 1990, 9, 251–280.

(19) Robinson, B. S. Toward an Optimal Algorithm for Matrix Multiplication. SIAM News 2005,

38.

(20) Gasteiger, J.; Marsili, M. Iterative Partial Equalization of Orbital Electronegativity—a Rapid Access to Atomic Charges. Tetrahedron 1980, 36, 3219–3228.

(21) Mortier, W. J.; Ghosh, S. K.; Shankar, S. Electronegativity-Equalization Method for the Calculation of Atomic Charges in Molecules. J. Am. Chem. Soc. 1986, 108, 4315–4320.

(22) Ionescu, C.-M.; Sehnal, D.; Falginella, F. L.; Pant, P.; Pravda, L.; Bouchal, T.; Svobodová Vařeková, R.; Geidl, S.; Koča, J. AtomicChargeCalculator: Interactive Web-Based Calculation of Atomic Charges in Large Biomolecular Complexes and Drug-like Molecules. J. Cheminf. 2015, 7, 50. 43 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 50

(23) Ionescu, C.-M.; Svobodová Vařeková, R.; Prehn, J. H. M.; Huber, H. J.; Koča, J. Charge Profile Analysis Reveals That Activation of Pro-Apoptotic Regulators Bax and Bak Relies on Charge Transfer Mediated Allosteric Regulation. PLoS Comput. Biol. 2012, 8, e1002565.

(24) Wishart, D. S.; Feunang, Y. D.; Guo, A. C.; Lo, E. J.; Marcu, A.; Grant, J. R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; Assempour, N.; Iynkkaran, I.; Liu, Y.; Maciejewski, A.; Gale, N.; Wilson, A.; Chin, L.; Cummings, R.; Le, D.; Pon, A.; Knox, C.; Wilson, M. DrugBank 5.0: A Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082.

(25) Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, P. E. B. The Protein Data Bank http://www.rcsb.org/ (accessed Feb 20, 2019).

(26) D.A. Case, I.Y. Ben-Shalom, S.R. Brozell, D.S. Cerutti, T.E. Cheatham, III, V.W.D. Cruzeiro, T.A. Darden, R.E. Duke, D. Ghoreishi, M.K. Gilson, H. Gohlke, A.W. Goetz, D. Greene, R Harris, N. Homeyer, S. Izadi, A. Kovalenko, T. Kurtzman, T.S. Lee, S. L. and 44 ACS Paragon Plus Environment

Page 45 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

P. A. K. AMBER 2018. University of California, San Francisco 2018.

(27) Jones, E.; Oliphant, T.; Peterson, P.; Others. SciPy.org http://www.scipy.org/ (accessed Feb 20, 2019).

(28) Rappe, A. K.; Goddard, W. A. Charge Equilibration for Molecular Dynamics Simulations.

J. Phys. Chem. 1991, 95, 3358–3363.

(29) Chen, J.; Martínez, T. J. QTPIE: Charge Transfer with Polarization Current Equalization. A Fluctuating Charge Model with Correct Asymptotics. Chem. Phys. Lett. 2007, 438, 315– 320.

(30) Yang, Z.-Z.; Wang, C.-S. Atom−Bond Electronegativity Equalization Method. 1. Calculation of the Charge Distribution in Large Molecules. J. Phys. Chem. A 1997, 101, 6315–6321.

(31) Wang, C.-S.; Yang, Z.-Z. Atom–bond Electronegativity Equalization Method. II. Lone-Pair Electron Model. J. Chem. Phys. 1999, 110, 6189–6197.

(32) Cong, Y.; Yang, Z.-Z. General Atom-Bond Electronegativity Equalization Method and Its 45 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 50

Application in Prediction of Charge Distributions in Polypeptide. Chem. Phys. Lett. 2000,

316, 324–329.

(33) Oliferenko, A. A.; Palyulin, V. A.; Pisarev, S. A.; Neiman, A. V.; Zefirov, N. S. Novel Point Charge Models: Reliable Instruments For­molecular Electrostatic. J. Phys. Org. Chem. 2001, 14, 355–369.

(34) Oliferenko, A. A.; Pisarev, S. A.; Palyulin, V. A.; Zefirov, N. S. Atomic Charges via Electronegativity Equalization: Generalizations and Perspectives. Adv. Quantum Chem. 2006, 51, 139–156.

(35) Chelli, R.; Procacci, P.; Righini, R.; Califano, S. Electrical Response in Chemical Potential Equalization Schemes. J. Chem. Phys. 1999, 111, 8569–8575.

(36) Lee Warren, G.; Davis, J. E.; Patel, S. Origin and Control of Superlinear Polarizability Scaling in Chemical Potential Equalization Methods. J. Chem. Phys. 2008, 128, 144110.

(37) Nistor, R. A.; Polihronov, J. G.; Müser, M. H.; Mosey, N. J. A Generalization of the Charge Equilibration Method for Nonmetallic Materials. J. Chem. Phys. 2006, 125, 094108.

46 ACS Paragon Plus Environment

Page 47 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(38) Smalø, H. S.; Åstrand, P.-O.; Jensen, L. Nonmetallic Electronegativity Equalization and Point-Dipole Interaction Model Including Exchange Interactions for Molecular Dipole Moments and Polarizabilities. J. Chem. Phys. 2009, 131, 044101.

(39) Verstraelen, T.; Pauwels, E.; De Proft, F.; Van Speybroeck, V.; Geerlings, P.; Waroquier, M. Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides.

J. Chem. Theory Comput. 2012, 8, 661–676.

(40) Verstraelen, T.; Van Speybroeck, V.; Waroquier, M. The Electronegativity Equalization Method and the Split Charge Equilibration Applied to Organic Systems: Parametrization, Validation, and Comparison. J. Chem. Phys. 2009, 131, 044127.

(41) Bultinck, P.; Langenaeker, W.; Lahorte, P.; De Proft, F.; Geerlings, P.; Waroquier, M.; Tollenaere, J. P. The Electronegativity Equalization Method I: Parametrization and Validation for Atomic Charge Calculations. J. Phys. Chem. A 2002, 106, 7887–7894.

(42) Bultinck, P.; Langenaeker, W.; Lahorte, P.; De Proft, F.; Geerlings, P.; Van Alsenoy, C.; Tollenaere, J. P. The Electronegativity Equalization Method II: Applicability of Different Atomic Charge Schemes. J. Phys. Chem. A 2002, 106, 7895–7901. 47 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 50

(43) The Open Babel Package http://openbabel.org/ (accessed Feb 20, 2019).

(44) Tsareva, D. A.; Osolodkin, D. I.; Shulga, D. A.; Oliferenko, A. A.; Pisarev, S. A.; Palyulin, V. A.; Zefirov, N. S. General Purpose Electronegativity Relaxation Charge Models Applied to CoMFA and CoMSIA Study of GSK-3 Inhibitors. Mol. Inf. 2011, 30, 169–180.

(45) Trefethen, L. N.; Bau, D. Numerical Linear Algebra; SIAM, 1997.

(46) Chung, F. R. K. Spectral Graph Theory; American Mathematical Society, 1997.

(47) Farebrother, R. W. Linear least squares computations.; Dekker: New York u.a., 1988.

(48) Vařeková, R. S.; Koča, J. Optimized and Parallelized Implementation of the Electronegativity Equalization Method and the Atom-Bond Electronegativity Equalization Method. J. Comput. Chem. 2006, 27, 396–405.

(49) Liang, J.; Subramaniam, S. Computation of Molecular Electrostatics with Boundary Element Methods. Biophys. J. 1997, 73, 1830–1841.

(50) Rokhlin, V. Rapid Solution of Integral Equations of Classical Potential Theory. J. Comput.

Phys. 1985, 60, 187–207. 48 ACS Paragon Plus Environment

Page 49 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(51) Nocito, D.; Beran, G. J. O. Massively Parallel Implementation of Divide-and-Conquer Jacobi Iterations Using Particle-Mesh Ewald for Force Field Polarization. J. Chem.

Theory Comput. 2018, 14, 3633–3642.

(52) Aviat, F.; Levitt, A.; Stamm, B.; Maday, Y.; Ren, P.; Ponder, J. W.; Lagardère, L.; Piquemal, J.-P. Truncated Conjugate Gradient: An Optimal Strategy for the Analytical Evaluation of the Many-Body Polarization Energy and Forces in Molecular Simulations.

J. Chem. Theory Comput. 2017, 13, 180–190.

(53) Lagardère, L.; Lipparini, F.; Polack, É.; Stamm, B.; Cancès, É.; Schnieders, M.; Ren, P.; Maday, Y.; Piquemal, J.-P. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II. Toward Massively Parallel Computations Using Smooth Particle Mesh Ewald. J. Chem. Theory Comput. 2015, 11, 2589–2599.

49 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 50

For Table of Contents Use Only Iterative solvers for empirical partial atomic charges: breaking the curse of cubic numerical complexity Arslan R. Shaimardanov, Dmitry A. Shulga, Vladimir A. Palyulin*

50 ACS Paragon Plus Environment