SmilesDrawer: parsing and drawing SMILES-encoded molecular

Dec 19, 2017 - These features allow the rendering of thousands of molecular structure drawings on a single web page within seconds on a wide range of ...
2 downloads 17 Views 962KB Size
Subscriber access provided by RMIT University Library

Application Note

SmilesDrawer: parsing and drawing SMILES-encoded molecular structures using client-side JavaScript Daniel Probst, and Jean-Louis Reymond J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00425 • Publication Date (Web): 19 Dec 2017 Downloaded from http://pubs.acs.org on December 22, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1

SmilesDrawer: Parsing and Drawing SMILES-Encoded Molecular Structures Using Client-Side JavaScript Daniel Probst1,* and Jean-Louis Reymond1,* 1

Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland

*

e-mail: [email protected], [email protected]

Abstract Here we present SmilesDrawer, a dependency-free JavaScript component capable of both parsing and drawing SMILES-encoded molecular structures client-side, developed to be easily integrated into web projects and to display organic molecules in large numbers and fast succession. SmilesDrawer can draw structurally and stereochemically complex structures such as maitotoxin and C60 without using templates, yet has an exceptionally small computational footprint and low memory usage without the requirement for loading images or any other form of client-server communication, making it easy to integrate even in secure (intranet, firewalled) or offline applications. These features allow the rendering of thousands of molecular structure drawings on a single web page within seconds on a wide range of hardware supporting modern browsers. The source code as well as the most recent build of SmilesDrawer is available on Github (http://doc.gdb.tools/smilesDrawer/). Both yarn and npm packages are also available.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 18

2 Introduction The past decade has seen the releases of a myriad of web based applications in the fields of bio- and cheminformatics. A major advantage of these browser-rendered frontends is the availability of a large variety of JavaScript libraries and components available through package managers like bower or npm. Among these libraries, many components deal with the display and transmission of molecular structure information, in particular creating SMILES (simplified molecular-input line-entry system) from molecular structures drawn by the user,1, 2 such that the molecular information can be transmitted and processed rapidly. Indeed together with InChI,3 SMILES is the de facto standard for encoding chemical species as short, single-line ASCII strings.4 A SMILES string is created from a molecular structure by computing a spanning tree of the undirected graph representing the molecule (atoms as vertices, bonds as edges), retaining the broken cycles by indexing the removed edges (bonds) on both participating vertices (atoms) and identifying the longest path in the resulting spanning tree. Next, the SMILES is generated by following the longest path, writing out the current chemical elements symbol followed by a bond and the index of a broken cycle if available. In case of branching vertices (atoms), each branch is written enclosed by parentheses.5 Here we address the lack of the corresponding easy-to-use, small-footprint JavaScript components to perform the inverse task, that is, draw drug-like molecular structures from SMILES, which can help in application dealing with the display of molecular structures from very large databases such as the GDB and GDB-derived databases published by our group.6, 7 Currently, most web applications and database frontends (PubChem Sketcher, Marvin JS by ChemAxon Inc.) rely on a server-side backend for providing pre-generated images, dynamically generated images or atom coordinates.8-14 The drawbacks of loading information from a server are: (I) Pre-generating images

ACS Paragon Plus Environment

Page 3 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3 requires the creation, possible update as well as the storage of many images in persistent memory. Serving these images negatively influences loading times depending on the server hardware and the image-size, especially on high-latency mobile networks.15 Pre-generated images are also only available in certain resolutions, colors and drawing styles, possibly requiring the creation of a new set of images depending on the front-end. (II) Dynamically generating images using GET requests requires either the provisioning of a web service capable of doing so or the reliance on a service provided by a 3rd party, thus sending potentially confidential information to an external server. (III) Calculating atom coordinates server-side, as implemented in Marvin JS by ChemAxon, with subsequent drawing of the structure client-side resolves performance issues when implemented correctly, but would still require the provisioning of a web service resulting in infrastructure overhead and potential security issues. Rendering a molecular structure from SMILES directly is challenging since the SMILES notation only records topology but no spatial information, in contrast to other formats such as PDB or CML,16, 17 which explains the use of server-side computation to circumvent this limitation. The only currently available JavaScript component to convert SMILES to molecular structure drawings without any code server side is OpenChemLib-JS, a feature applied by JSME.2 OpenChemLib-JS is maintained by the Cheminformatics Department of the Swiss Federal Institute of Technology and is cross-compiled from Java to JavaScript with OpenChemLib as the codebase, which is part of Actelions DataWarrior.18 This implementation, however, has two major disadvantages. (I) The conversion-less structure drawing from SMILES is implemented using SVG (scalable vector graphics) implying retained mode rendering resulting in each element of the drawing (letters, lines, …) being added to the DOM and thus generating object management overhead for the web browser. This approach leads to unpredictable and generally lower performance across different devices and browsers while complicating development given the lack of an API. A canvas depicter option for OpenChemLib-JS is available but not well documented or

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 18

4 customizable. (II) The codebase of OpenChemLib-JS being written in Java and thus its reliance on being cross-compiled by GWT makes it virtually impossible to customize, optimize, or adapt for integration into a web application without considerable development overhead. Here, we introduce SmilesDrawer, a small (97 KB minified), dependency-free JavaScript component capable of drawing molecular structures from SMILES strings client-side, which is much smaller and faster and overcomes many of the limitations of OpenChemLib-JS. SmilesDrawer can be used by developers of web applications as well as JavaScript-based mobile and desktop applications to render molecular structures from SMILES strings without the need of additional libraries or communication with a server, which often presents a major drawback when processing sensitive information. The component is fully customizable, well-documented and its source code is available under the MIT license. SmilesDrawer is written in JavaScript ES6, transpiled adhering to the current ES6 implementation status using Babel and then packaged for both bower and npm. SmilesDrawer is useful for web-based tools that need to display thousands of molecules because it generates the drawing from SMILES, which reduces the amount of data transmitted. SmilesDrawer has been utilized for the 3D visualization of a multitude of chemical spaces populated by SureChEMBL data.19 Results and Discussion SmilesDrawer Components The SmilesDrawer JavaScript component achieves the conversion of a SMILES to a two-dimensional structure drawing by combining two modules, a SMILES parser to convert the SMILES back to its parent spanning tree, and a SMILES drawer to convert this spanning tree to a two-dimensional structure drawing. Both are written in JavaScript and while the drawer relies on the output of the parser, the parser is usable as a standalone function and can be applied in other projects. The parser accounts for

ACS Paragon Plus Environment

Page 5 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

5 approximately 1/10 of the source code and is not directly customizable, as it was generated by a parser generator. The parser module generates a parse tree from the input SMILES, in which each atom is encoded by a node object in a linked tree data structure. The topology of the parse tree is identical to the spanning tree used to generate the SMILES string. The parser was generated using PEG.js, a parser generator for JavaScript, and by translating the grammar provided by the OpenSMILES specification into an unambiguous parsing expression grammar (PEG).20, 21 PEG was chosen over a context free grammar (CFG) to avoid ambiguities (reduce-shift conflicts).22 Additionally, the generated parsers are an implementation of the packrat parsing algorithm and thus express a linear time complexity through memoization, resulting in increased space complexity.23, 24 In practice, parsing expression grammars simplify syntax definitions, conform closely to syntax practices (prioritized choices, unlimited lookahead), and allow linear time parsing for any TDPL grammar. The higher average space complexity of packrat, which is directly proportional to input length, is well compensated for by the generally short length of SMILES strings. In addition to generating the parse tree, the parser can identify the location of an erroneous symbol. The SMILES drawer module converts the parse tree obtained from the SMILES to a 2Dstructure drawing. The module positions acyclic atoms, atoms in fused rings and atoms in spiros based on Euclidean and molecular geometry according to the VSEPR model.25 The placement of bridged ringsystems with  ≥ 2 is treated as a two-dimensional graph embedding problem and solved based on graph theoretic distances as described by Kamada and Kawai.26 The algorithm sets up a virtual dynamic system, where weighted topological distances between all vertices are modelled as springs, whereas other spring embedders such as the Eades and Fruchterman-Reingold algorithms, which have been adapted to depict molecular structures, model edges as springs and introduce repulsive electrical forces

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 18

6 between non connected vertices to keep them apart.27-29 The formula for the energy functional of the  dynamic system according to Kamada and Kawai is shown in Equation 1, where , = /, is the

spring strength between vertices  and  based on the topological distance , and the constant ; and

, =  × , is the relaxed spring length based on the topological distance and the desired edge length

.

%$



= 

"$ "#$

1      ,  −   +  −   + , − 2,  −   +  −   ! 2

&   (*) (*) &  & (*) (*) (*) (*) ( ,  ,- + (' , ' ,- = − (' , ' , ' '  &' &' &' &' &  &   (*) (*) & (*) (*) (*) (*) (' , ' ,- +  (' , ' ,- = − ( ,  , &' &' &' &' ' '

(1)

(2)

(3)

The functional  of the system is then minimized by iterative local minimizations one vertex . at a

time, where vertex . is the vertex with the largest value of Δ. = 0(&/&' ) + (&/&' ) . Vertex

. is then moved by - and - until Δ. reaches a lower threshold. - and - are computed by applying

a two-dimensional Newton-Raphson method to solve Equations 1 and 2. Our implementation of the algorithm by Kamada and Kawai enables the SMILES drawer module to depict highly complex ring systems such as a buckminsterfullerene without the need for templates (Figure 1). A drawback in implementing Kamada and Kawais algorithm for structure drawing arises when depicting macrocycles and bridged ring-systems including macrocycles where rings might be distorted, ignore stereochemistry around double bonds or produce overlaps (Figure 1 compounds 6, 13, 15). Smallest set of smallest ring discovery is implemented applying a robust algorithm based on path-included distance matrices.30 Once atoms have been positioned, overlaps are resolved by rotating rotatable bonds where the resulting positions yield a lower overlap score. If overlaps are still present after the first step of overlap resolution, a second step rotates overlapping terminal vertices away from

ACS Paragon Plus Environment

Page 7 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

7 the overlap-causing position. The overlap score for each vector (atom) 1 is defined as 213456 = ∑

8%9:; %:< 9 8

>24  − 91 − 1 9 > 0, where  is the optimal bond length.

Chirality determination was implemented based on the algorithm developed by Teixeira et al and is based on the parity of permutation of ligands after ordering according to CIP rules compared to their index of occurrence in the SMILES string as defined in the OpenSMILES specification.31 For the depiction of wedges, we developed an algorithm which choses the bond to be flipped up, respectively down, based on the following simple set of rules (ordered by priority from highest to lowest): (1) Wedges are not to be drawn between two stereocenters, (2) if possible, wedges are not to be drawn inside a ring, (3) drawing wedges towards heteroatoms takes priority and (4) wedges are drawn in the direction of the shortest subtree. The resulting structures are drawn using the Canvas API supported by all modern browsers. Unlike the commonly used SVG (scalable vector graphics) format, Canvas implements immediate mode rendering, thus abolishing the need for the performance impacting object model kept in memory. After the HTML 5 standard specifying the Canvas API became the stable W3C recommendation in October 2014, the novel technology has been applied by several web-based cheminformatics and bioinformatics applications, asserting its increased performance and reduced code complexity compared to scalable vector graphics.32-34 The SMILES drawer module implements the complete OpenSMILES specification except for square planar, trigonal bipyramidal and octahedral chirality. These types of chirality are, according to the specification, only implemented by very few SMILES systems and we did not encounter them in any of the organic molecule databases known to us. In addition, the proposed extensions, including external R-groups, polymers and crystals, atom-based double bond configuration, radical centers and twisted SMILES, provided by the OpenSMILES specification are not supported.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 18

8 Assessing the Performance of SmilesDrawer The aesthetic performance of SmilesDrawer was assessed by visual inspection. SmilesDrawer performs extremely well for rendering structures of a wide range of molecules (Figure 1). Excellent drawings are produced for polycyclic hydrocarbons such as cubane (1), trinorbornane (2),35 heptacyclo[6.4.0.02,7.03,6.04,11.05,10.09,12]dodecane (3), dodecahedrane (4) or buckminsterfullerene (5) without using any template. Depictions of complex biomolecules are also very good, for example the essential coenzyme vitamin B12 (6), the steroid precursor hydromethylglutaryl-coenzyme A (7), its biosynthetic endproduct cholersterol (8), the polyunsaturated omega-6 fatty acid arachidonic acid (9) and the immune modulator prostaglandin E1 (10). Complex natural products are also satisfactorily rendered such as the alkaloids strychnine (11) and quinine (12), the antibiotic vancomycin (13) and the complex cytotoxic natural products calicheamycin γ1 (14), the immunosuppresor cyclic peptide cyclosporin A (15), and the complex polycyclic toxin maitotoxin (16), which possesses 98 chiral centers. In addition to drawing small to medium-sized molecules, SmilesDrawer also excels at drawing large and topologically complex peptides such as peptide dendrimers36, 37 with minimal overlap in contrast to OpenChemLib-JS and ChemAxon Marvin JS (Figure S1). To evaluate the runtime performance of SmilesDrawer, a test set including Drugbank ( =

7,238) and samples from ChEMBL, FDB-17, GDB-17 and SureChEMBL (each  = 7,238) was

assembled. This pooled set containing all compounds (*D*E8 = 36,190) was analyzed for SMILES

length and number of rings, as these two features intuitively have the highest impact on drawing speed. By running preliminary benchmarks, this assessment was confirmed through correlation analysis as shown in Figure 2c, d. While parsing time primarily correlates with the SMILES length of a molecule (HIJED = 0.28, HLMJE'E = 0.68), rendering time correlates well with both the length of the

ACS Paragon Plus Environment

Page 9 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

9 SMILES (HIJED = 0.26, HLMJE'E = 0.73) and the number of rings (HIJED = 0.15, HLMJE'E =

0.67). SMILES length was chosen as the measurement variable for ensuing performance benchmarks, as it correlates well with both parsing and rendering time. Surprisingly, whereas monotonic relationships were expected between rendering time versus the SMILES length and the number of rings respectively, the non-linear relationship between SMILES length and parsing time is unexpected due to the theoretically linear runtime of the parser. We suspect this behavior to be caused by current JavaScript implementations not supporting tail call optimization and the generated parser heavily relying on recursion.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 18

10 Fig. 1 Molecular structures drawn by SmilesDrawer. SmilesDrawer applies a dynamic system simulation based on Kamada and Kawais algorithm to determine the position of atoms when it encounters a bridged ring in a molecule. This enables SmilesDrawer to depict a wide range of molecules, such as cubane (drawing time: 3.5 ms) (1), trinorbornane (4.3 ms) (2), heptacyclo[6.4.0.02,7.03,6.04,11.05,10.09,12]dodecane (10.5 ms) (3), dodecahedrane (11.7 ms) (4), buckminsterfullerene (85.7 ms) (5), vitamin B12 (cyanocobalamin) (55.5 ms) (6), hydromethylglutaryl-coenzyme A (23 ms) (7), cholesterol (7.2 ms) (8), arachidonic acid in (9.2 ms) (9), prostaglandin E1 (2.6 ms) (10), strychnine (11.9 ms) (11), quinine (5.1 ms) (12), vancomycin (115 ms) (13), calicheamycin γ1 (31.4 ms) (14), cyclosporin A (9.8 ms) (15), maitotoxin (140 ms) (16). SMILES of all molecules shown are available in Table S1.

Fig. 2 Analysis of the pooled test sets. Test sets include Drugbank ( = 7,238) and samples from ChEMBL, FDB-17, GDB-17 and SureChEMBL (each  = 7,238). The sets were pooled into a super set containing all data (*D*E8 = 36,190). Subplots a and b show the distribution of ring count and length of SMILES respectively. The range from 0 to 15 rings covers 99.981% (36,082), the range from 0 to 150 characters 98.933% (35,704) of all molecules in the pooled set. SMILES length was chosen as a measurement variable as it correlates best with both parse and render time (c, d). The H values yielded by Pearson’s and Spearman’s methods suggest a strong non-linear, monotonic relationship between SMILES length and performance.

The theoretical time complexities are O() and O(P ) for the parser and drawer respectively. Benchmarks were conducted using the Drugbank dataset, containing 7,238 compounds.8 In addition, random subsets of equal size were extracted from the ChEMBL,9 FDB-17,38 GDB-17,6 and SureChEMBL13 databases. The performance was assessed using desktop as well as mobile hardware and software (Intel Core i7-7700 3.60GHz, 16.0GB DDR4 RAM, Windows 10.0.16299, Chrome Version

ACS Paragon Plus Environment

Page 11 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

11 63.0.3239.84 64-bit; Samsung exynos8895 0.455 – 2.314GHz, 4GB LPDDR4X, Android 7.0, Linux version 4.4.13-12401979, Chrome Version 63.0.3239.71). ̅ SmilesDrawer shows excellent performance for both parsing (�QME = 0.04.T ± 0.085.T)

̅ and drawing (QVEW = 2.445.T ± 14.144.T). The rendering time for molecules containing, in

terms of depiction using our proposed approach, complex bridged ring systems ( = 5,473 with an

̅ average of 4.033 ± 1.453 rings) is still low (QVEW,XVJV = 4.079 ± 17.731.T). Performance of

̅ ̅ drawing speed is excellent even on mobile hardware with QME = 0.169.T ± 0.557.T, QVEW = ̅ 7.481.T ± 23.471.T and QVEW,XVJV = 13.692 ± 42.751.T. Per set performance is shown in Figure 3. Comparison of SmilesDrawer on Different Devices and with OpenChemLib-JS To compare the total depiction time (parsing + rendering time) of SmilesDrawer with the JavaScript port of OpenChemLib, we ran the benchmarks on the latest version of OpenChemLib-JS using its undocumented canvas depicter. The performance of OpenChemLib-JS was assessed using the same desktop test setting as for the SmilesDrawer test case. The results are shown in Figure 3. The total depiction time values show that the performance of SmilesDrawer on a mobile phone is comparable to that of OpenChemLib-JS on a desktop computer (Figure 3a). While the render time for both SmilesDrawer and OpenChemLib-JS are close, SmilesDrawer shows generally lower variance on the desktop system (Figure 3b). SmilesDrawer’s parse time exhibits a runtime performance which is orders of magnitudes faster than that of OpenChemLib-JS on both the desktop and the mobile system (Figure 3c).

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 18

12

Fig. 3 Performance comparison between SmilesDrawer and OpenChemLib-JS. Performance was established for three different test cases. SmilesDrawer on a desktop computer (blue), OpenChemLib-JS on a desktop computer (green) and SmilesDrawer on a mobile phone (red). The total depiction time (parsing + rendering) values show that the performance of SmilesDrawer on a mobile phone is comparable to that of OpenChemLib-JS on a desktop computer (a). While the render time for both SmilesDrawer and OpenChemLib-JS are close, SmilesDrawer shows generally lower variance on the desktop system (b). SmilesDrawer’s parse time exhibits a runtime performance which is orders of magnitudes faster than that of OpenChemLib-JS on both the desktop and the mobile system (c).

To further assess the comparative runtime performance, we analyzed the data using two-dimensional KDE plots, which show a detailed comparison of the drawing (parsing + rendering) performance of SmilesDrawer with that of OpenChemLib-JS for the test sets (Figure 4a) GDB-17, (Figure 4b) FDB-17, (Figure 4c) SureChEMBL, (Figure 4d) ChEMBL and (Figure 4e) Drugbank. GDB-17 and FDB-17 are

ACS Paragon Plus Environment

Page 13 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

13 constrained to a relatively low maximum atom count of 17, causing the parser to take up a significant part of the drawing time. This fact reflects in Figure 4a, b, where bimodal distributions are caused by the significantly slower parser of OpenChemLib-JS.

Fig. 4 Comparison to OpenChemLib-JS. The two-dimensional KDE plots show a detailed comparison for the drawing (parsing + rendering) performance of SmilesDrawer to that of OpenChemLib-JS for the test sets (a) GDB-17, (b) FDB-17, (c) SureChEMBL, (d) ChEMBL and (e) Drugbank. GDB-17 and FDB-17 are constrained to a relatively low maximum atom count of 17, causing the parser to take up a significant part of the drawing time. This fact reflects in subplots a and b, where bimodal distributions are caused by the significantly slower parser of OpenChemLib-JS. 456 (1.262%) and 509 (1.409%) compounds were removed from the SmilesDrawer and OpenChemLib-JS set respectively, as they interfered with the readability of these plots.

The analysis of these benchmarks has shown that our JavaScript module generally performs better throughout the test sets compared to the transcompiled version of OpenChemLib-JS and that Kamada and Kawais algorithm is indeed suited for placing the atoms of bridged ring systems without any negative impact on overall rendering performance (Figure S2).

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 18

14 For mobile applications, the overall performance of SmilesDrawer measured during benchmarking matches the latency (depending on carrier and network generation: 5 – 100ms) of mobile networks, facilitating application-scale performance improvements over loading structures as image files from a web server on such networks.15 Conclusion SmilesDrawer is a highly customizable, easy-to-use and performant JavaScript component consisting of both a SMILES parser and a Canvas API drawing module. It is tailored to be used in modern web applications in need of a method to display molecular structures. SmilesDrawer differentiates itself from other previously reported JavaScript components for SMILES drawing in that it does not require any third-party libraries, has a codebase written entirely in JavaScript, does not require the deployment of web services and applies the algorithm proposed by Kamada and Kawai for positioning atoms in bridged rings while applying simple Euclidean geometry for the placement of other atoms. Given that SmilesDrawer was implemented and optimized for the limited use on SureChEMBL datasets, its performance carries over well to the Drugbank, ChEMBL, FDB-17 and GDB-17 datasets and even depicts complex molecules. SmilesDrawer should be generally useful to display molecules from SMILES in web applications. Acknowledgement. This work was supported financially by the University of Berne, the Swiss National Science Foundation and the NCCR TransCure. We thank ChemAxon for providing access to Marvin JS. Supporting information. Additional benchmarking of SmilesDrawer (Figures S1 and S2) and SMILES of all molecules shown in Figure 1 and S1 (Table S1) . This information is available free of charge at http://pubs.acs.org.

ACS Paragon Plus Environment

Page 15 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

15 Competing interests. The authors declare no competing financial interest. Authors’ contributions. D. P. designed and developed both modules of SmilesDrawer and wrote the paper. J.-L. R. co-designed, supervised the project and wrote the paper.

References 1. Ihlenfeldt, W. D.; Bolton, E. E.; Bryant, S. H. The Pubchem Chemical Structure Sketcher. J. Cheminform. 2009, 1, 20. 2.

Bienfait, B.; Ertl, P. Jsme: A Free Molecule Editor in Javascript. J. Cheminform. 2013, 5, 24.

3. Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. Inchi, the Iupac International Chemical Identifier. J. Cheminform. 2015, 7, 23. 4. Weininger, D. Smiles, a Chemical Language and Information-System .1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36. 5. Weininger, D.; Weininger, A.; Weininger, J. L. Smiles. 2. Algorithm for Generation of Unique Smiles Notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97-101. 6. Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J. L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database Gdb-17. J. Chem. Inf. Model. 2012, 52, 2864-2875. 7. Awale, M.; Visini, R.; Probst, D.; Arus-Pous, J.; Reymond, J. L. Chemical Space: Big Data Challenge for Molecular Diversity. Chimia 2017, 71, 661-666. 8. Wishart, D. S.; Knox, C.; Guo, A. C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. Drugbank: A Comprehensive Resource for in Silico Drug Discovery and Exploration. Nucleic Acids Res. 2006, 34, D668-D672. 9. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. Chembl: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100-D1107. 10. Awale, M.; van Deursen, R.; Reymond, J. L. Mqn-Mapplet: Visualization of Chemical Space with Interactive Maps of Drugbank, Chembl, Pubchem, Gdb-11, and Gdb-13. J. Chem. Inf. Model. 2013, 53, 509-518.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 18

16 11. Awale, M.; Reymond, J. L. Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and Chembl in High Dimensional Chemical Spaces. J. Chem. Inf. Model. 2015, 55, 1509-1516. 12. Awale, M.; Reymond, J. L. Web-Based 3d-Visualization of the Drugbank Chemical Space. J. Cheminform. 2016, 8, 25. 13. Papadatos, G.; Davies, M.; Dedman, N.; Chambers, J.; Gaulton, A.; Siddle, J.; Koks, R.; Irvine, S. A.; Pettersson, J.; Goncharoff, N.; Hersey, A.; Overington, J. P. Surechembl: A Large-Scale, Chemically Annotated Patent Document Database. Nucleic Acids Res. 2016, 44, D1220-D1228. 14. Awale, M.; Probst, D.; Reymond, J. L. Webmolcs: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces. J. Chem. Inf. Model. 2017, 57, 643-649. 15. Nikravesh, A.; Choffnes, D. R.; Katz-Bassett, E.; Mao, Z. M.; Welsh, M., Mobile Network Performance from User Devices: A Longitudinal, Multidimensional Analysis. . In Passive and Active Measurement. Pam 2014. Lecture Notes in Computer Science, Vol 8362. , Faloutsos M., K. A., Ed. Springer, Cham: 2014. 16. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235-242. 17. Murray-Rust, P.; Rzepa, H. S.; Wright, M. Development of Chemical Markup Language (Cml) as a System for Handling Complex Chemical Content. New J. Chem. 2001, 25, 618-634. 18. Sander, T.; Freyss, J.; von Korff, M.; Rufener, C. Datawarrior: An Open-Source Program for Chemistry Aware Data Visualization and Analysis. J. Chem. Inf. Model. 2015, 55, 460-473. 19. Probst, D.; Reymond, J. L. Fun: A Framework for Interactive Visualizations of Large, High Dimensional Datasets on the Web. Bioinformatics 2017, doi: 10.1093/bioinformatics/btx760. 20.

www.opensmiles.org (accessed December 12, 2017),

21. Ford, B., Parsing Expression Grammars. . In Proceedings of the 31st Acm Sigplan-Sigact Symposium on Principles of Programming Languages - Popl ’04, ACM Press: New York: New York, USA, 2004; pp 111-112. 22.

Parikh, R. J. On Context-Free Languages. . J. ACM 1966, 13, 570-581.

23.

Ford, B. Packrat Parsing. ACM SIGPLAN Not. 2002, 37, 36-47.

24. Mizushima, K.; Maeda, A.; Yamaguchi, Y., Packrat Parsers Can Handle Practical Grammars in Mostly Constant Space. In Proceedings of the 9th Acm Sigplan-Sigsoft Workshop on Program Analysis for Software Tools and Engineering, Paste, ACM: New York, 2010; pp 29-36. 25. Treichel, P. M. The Vsepr Model of Molecular Geometry (Gillespie, Ronald J.; Hargittai, Istvan). J. Chem. Educ. 1993, 70, A223.

ACS Paragon Plus Environment

Page 17 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

17 26. Kamada, T.; Kawai, S. An Algorithm for Drawing General Undirected Graphs. Inf. Process. Lett. 1989, 31, 7-15. 27.

Eades, P. A Heuristic for Graph Drawing. Congr. Numer. 1984, 42, 149-160.

28. Fruchterman, T. M. J.; Reingold, E. M. Graph Drawing by Force-Directed Placement. Softw. Pract. Exp. 1991, 21, 1129–1164. 29. Fraczek, T. Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram Generation of Complex Molecules and Ligand-Protein Interactions. J. Chem. Inf. Model. 2016, 56, 2320-2335. 30. Lee, C. J.; Kang, Y. M.; Cho, K. H.; No, K. T. A Robust Method for Searching the Smallest Set of Smallest Rings with a Path-Included Distance Matrix. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 17355-17358. 31. Teixeira, A. L.; P., L. J.; O., F. A. Automated Identification and Classification of Stereochemistry: Chirality and Double Bond Stereoisomerism. arXiv preprint 2013, arXiv:1303.1724. 32. Miller, C. A.; Anthony, J.; Meyer, M. M.; Marth, G. Scribl: An Html5 Canvas-Based Graphics Library for Visualizing Genomic Data over the Web. Bioinformatics 2013, 29, 381-383. 33. Taylor, S.; Noble, R. Html5 Pivotviewer: High-Throughput Visualization and Querying of Image Data on the Web. Bioinformatics 2014, 30, 2691-2692. 34. Vanderkam, D.; Aksoy, B. A.; Hodes, I.; Perrone, J.; Hammerbacher, J. Pileup.Js: A Javascript Library for Interactive and in-Browser Visualization of Genomic Data. Bioinformatics 2016, 32, 23782379. 35. Delarue Bizzini, L.; Muntener, T.; Haussinger, D.; Neuburger, M.; Mayor, M. Synthesis of Trinorbornane. ChemComm 2017, 53, 11399-11402. 36. Stach, M.; Siriwardena, T. N.; Kohler, T.; van Delden, C.; Darbre, T.; Reymond, J. L. Combining Topology and Sequence Design for the Discovery of Potent Antimicrobial Peptide Dendrimers against Multidrug-Resistant Pseudomonas Aeruginosa. Angew. Chem., Int. Ed. Engl. 2014, 53, 12827-12831. 37. Bergmann, M.; Michaud, G.; Visini, R.; Jin, X.; Gillon, E.; Stocker, A.; Imberty, A.; Darbre, T.; Reymond, J. L. Multivalency Effects on Pseudomonas Aeruginosa Biofilm Inhibition and Dispersal by Glycopeptide Dendrimers Targeting Lectin Leca. Org. Biomol. Chem. 2016, 14, 138-148. 38. Visini, R.; Awale, M.; Reymond, J.-L. Fragment Database Fdb-17. J. Chem. Inf. Model. 2017, 57, 700-709.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 18

18

Graphics for the Table of Contents:

ACS Paragon Plus Environment