Kekule.js: An Open Source JavaScript Chemoinformatics Toolkit

May 31, 2016 - Developed with web standards, the toolkit is ideal for building chemoinformatics applications over the Internet. Moreover, it is highly...
4 downloads 64 Views 894KB Size
Subscriber access provided by La Trobe University Library

Article

Kekule.js: An Open Source JavaScript Chemoinformatics Toolkit Chen Jiang, Xi Jin, Ying Dong, and Ming Chen J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00167 • Publication Date (Web): 31 May 2016 Downloaded from http://pubs.acs.org on June 1, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Kekule.js: An Open Source JavaScript Chemoinformatics Toolkit †





Chen Jiang, *, Xi Jin, Ying Dong and Ming Chen



† Department of Organic Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, China. ‡ Department of Foreign Languages, China Pharmaceutical University, Nanjing 210009, Jiangsu, China

Abstract Kekule.js is an open-source, object-oriented JavaScript toolkit for chemoinformatics. It provides methods for many common tasks in molecular informatics, including chemical data I/O, 2D/3D rendering of chemical structure, stereo identification, ring perception, structure comparison and sub-structure search. Encapsulated widgets to display and edit chemical structures directly in web context are also supplied. Developed with web standards, the toolkit is ideal for building chemoinformatics applications over Internet. Moreover, it is highly platform-independent and can also be used in desktop or mobile environment. Some initial applications, such as plugins for inputting chemical structures on web, and usages in chemistry education, have been developed based on the toolkit.

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 23

1. Introduction Dozens of proprietary and open source chemoinformatics toolkits, as foundation to build chemistry software, have been developed in the past decades.1 For the advantage of executing speed, those existing toolkits are usually written in compiled languages such as C/C++ and Java. 2

Although proved to be quite successful in desktop applications, their popularity has been

challenged by the increasing use of web and mobile platforms, which the traditional compiled languages have difficulties to execute on or port to. Currently the only widely accepted programming language for web client is JavaScript, which, together with other web standards such as HTML (Hyper Text Markup Language) 3 and CSS (Cascade Style Sheet) 4, forms the basis of web applications. On the mobile side, situation is much more complicated as different systems use different architectures and propose different programming languages. Fortunately, all modern mobile operation systems support web standards well and even some of them are built upon it (such as Firefox OS 5). As a result, a chemoinformatics toolkit developed on web technologies is ideal for cross-platform purpose. During the past few years, several such libraries or programs have emerged. Most of them are web viewers and editors focusing on molecule output and input, including free ones such as JSME6, ketcher7 and JSMol8. Some libraries provide more functions. For instance, ChemDoodle Web Components9 provides some basic chemoinformatics algorithms in JavaScript code to deduce bonds and hydrogens, split disconnected molecule structures and detect/search rings on client side while it also allows to perform more complex tasks such as molecule comparing by connecting to iChemLabs cloud services.

ACS Paragon Plus Environment

2

Page 3 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In this article, a new open source toolkit developed with web standards is described. Written in JavaScript, it mainly focuses on organic molecules, runs on web client and implements more chemoinformatics algorithms and functions but requires no backend server. It is released under MIT license10 and the source code is published on GitHub.11 Documentation, tutorial and demonstrations of the toolkit can be found in Supporting Information of this manuscript or at Kekule.js website.12

2. Technologies 2.1 JavaScript JavaScript is a dynamic programming language originating from web browsers and, despite its name, has no direct relation to Java. For many years, JavaScript was regarded as a slow interpreted language and has been used only as a tool to add simple interactivity to web pages. However, in recently years, just-in-time compiler and high level optimization has been introduced in several powerful JavaScript engines which greatly enhanced the execution speed of JavaScript code. Our toolkit benefits from that and it is proved that most chemoinformatics operations on molecule can be executed swiftly in web browser. As a dynamic language, JavaScript does provide some practical features that our toolkit can take advantage of. For example, the class system in JavaScript is prototype-based, different from strongly typed languages like C++ or Java. Object instances are not deriving from static class but inheriting from dynamic prototype. The prototype is mutable at runtime and can be modified (e.g., add new properties or methods) as any other normal objects while the modification will influence all instances inherited from this prototype. That feature is utilized for reducing memory

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 23

usage in the toolkit. For instance, atoms are implemented in our toolkit as objects inherited from the same prototype. Atom has common properties (e.g. atomic number and mass number) as well as some additional properties required by certain operations (e.g. coordinate for drawing molecule and label for canonicalization of molecule). Rather than defining those properties all at once, coordinate and canonicalization label are added to atom prototype only when they are actually needed (as shown in figure 1). So if an application using the toolkit does not involves in molecule drawing and canonicalization, memory will not be consumed to store coordinate or label. That approach is used in many parts of the toolkit and proved to be helpful in devices with limited memory size (e.g., some mobile phones).

Figure 1. Mutable prototype is used in toolkit to reduce memory consumption

2.2 Web Worker Traditionally, JavaScript code is executed in the main thread of web browser. Long time calculation task may cause the browser UI (user interface) to be blocked and stop reflecting to user input. The web worker standard13 is introduced in HTML514 to solve that problem and provides a simple means for web content to run scripts in background threads. In our toolkit, web worker is used to perform time-consuming jobs (e.g. generating 3D molecule structure from

ACS Paragon Plus Environment

4

Page 5 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

topological one), while many other algorithms about normal size molecule, including substructure search and ring detection, found to be quite fast in modern hardware in our test, are written directly in main thread to avoid complexity.

2.3 Emscripten As there already have been plenty of libraries written in compiled language implementing vast of chemoinformatics tasks, rewriting every piece of code into JavaScript for web usage is a duplication of labor. Emscripten15, a special compiler that compiles C/C++ code to highlyoptimizable JavaScript, provides an easy way to port existing libraries to web. An experimental porting of the OpenBabel16 library is performed in our work. OpenBabel is originally written in C++ and is famous for its ability of converting different formats of chemical data. In our work, it is firstly compiled into JavaScript code by Emscripten, then with the help of some adapter classes, the compiled code can be integrated into our toolkit as an extra module, which provides additional abilities such as supporting tens of more chemical file formats and running molecule force field calculation. What's more, in our test, the compiled JavaScript runs smoothly in both traditional and mobile environments, even nearly as fast as native code in some web browsers (details are provided in Support Information of this manuscript). However, the compiled JavaScript code of OpenBabel is about 10MB large, which may require a long time to be transferred over Internet. So we decide to provide it as an optional module of our toolkit.

3. Implementation and Features In our toolkit, all classes, objects and functions are encapsulated in namespace Kekule so that they will not obtrude other JavaScript libraries if used in the same web application. The whole

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 23

toolkit is divided into several relatively independent modules, user or developer only need to load essential ones in their application or web page by assigning additional parameters in script tag of HTML. For example:

the script tag above will load algorithm, chemistry widget and their pre-request modules while other non-relative modules will not be transferred. That approach can effectively reduce the transferred data size through network and is quite necessary on Internet environment. Features and implementations of some important modules are briefly explained in this part.

3.1 Core Module This module contains classes that represent basic chemical concepts such as element, atom, bond, molecule and reaction. Figure 2 shows a simplified UML diagram explaining the inheritance hierarchy between some of the fundamental classes.

Figure 2. Simplified UML diagram of core classes

ACS Paragon Plus Environment

6

Page 7 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In the above diagram, it is noteworthy that the relation between Atom and Bond, the two most important classes to form molecule, differ from many other chemoinformatics libraries. In class bond (and its ancestor class ChemStructureConnector), property connectedObjs that stores all objects the bond connected with is actually a dynamic array of ChemStructureObjs, the common ancestor of both Atom and Bond. The extraordinary design enables the bond to link up with unlimited number of endpoints while the endpoint can be either atom or another bond. That ability is quite useful to represent molecule with multicenter bond (e.g., ferrocene) or unusual bond-bond connection (e.g., the Zeise's salt). In order to reduce the memory consumption, flyweight pattern17 is also used in those core classes. Instead of storing atomic number and mass number in each single Atom object, instance of Isotope is created and shared by all atoms of the same type. Similar pattern is used between Bond and BondForm classes.

3.2 I/O Module This module provides the ability to input/output chemical data and files. All data I/O classes inherit from either ChemDataReader or ChemDataWriter. Each data format is represented by two separate classes implementing one of those interfaces. Factory method pattern18 is used in this module to simplify the work of supporting new formats. Meanwhile, users do not have to know the details of concrete I/O classes since a set of facade methods are provided to load or save chemical data as shown in the following codes: var molecule = Kekule.IO.loadMimeData(data, 'chemical/x-mdl-molfile'); // or Kekule.IO.loadFormatData(data, 'mol');

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 23

var newData = Kekule.IO.saveMimeData(molecule, 'chemical/x-cml'); // or Kekule.IO.saveFormatData(molecule, 'cml');

Currently, MDL mol/rxn/sdf19, CML20 and SMILES21 (for output) are supported in the toolkit natively. A special format to store chemistry objects directly into JSON(JavaScript Object Notation)22 is also introduced. If the JavaScript porting of OpenBabel library is applied, more formats are available as mentioned before.

3.3 Algorithm Module Some typical chemoinformatics algorithms mainly based on chemical graph theory are implemented in this module, including stereo atom / stereo bond identification23, ring perception for finding all rings or Smallest Set of Smallest Rings (SSSR)24, aromatic ring recognization by Hückel rule25, molecule canonicalization26, structure comparison and sub-structure search27. All those algorithms are written in JavaScript code and the calculations can be done purely in client web browser without any help from server side. As there often exists many different algorithms for the same chemoinformatics task, the algorithm module is also designed in a flexible way. For instance, the molecule canonicalization in the toolkit is based on a variation of Morgan algorithm by default, but if user wants to use a different approach, he can create a new canonicalization executor class and register it to replace the default one.

3.4 Render Module The ability to display and manipulate intuitive image of molecule (and other chemical objects) is one of the most important features of chemoinformatics-related programs. This render module provides a set of classes and functions to render 2D or 3D drawings of chemical structures in

ACS Paragon Plus Environment

8

Page 9 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

web context. The module is written in a flexible and expandable way. Figure 3 shows a simplified UML diagram of 2D render system. Factory method pattern is also applied here. For each type of chemical object that need a visual form, a specialized renderer class is made and registered. All renderer classes are deprived from the same ancestor and override some key methods (e.g., draw). So when a new type of chemical object is added to toolkit, only one new renderer class shall be written and all other existing code needn't be modified at all. Composite renderer, which holds and utilizes a set of child renderers, is also introduced for complex chemical objects. For example, Reaction2DRenderer assigns its child Mol2DRenderers the concrete rendering job to draw reactants and products in a reaction.

Figure 3. Simplified UML diagram of render module The render system also needs to overcome the fragmentation of graphic technologies on web environment. For example, there are several standards and technologies to manipulate 2D graphics in web context. The most dominating ones include VML (The Vector Markup Language)28, SVG (Scalable Vector Graphics)29 and Canvas2D30. They are supported by different web browsers. Instead of calling to those technologies directly in renderer, bridge pattern31 is used here to achieve platform independency as shown in Figure 3. Renderer assigns the concrete drawing work to a specified drawing bridge which is created automatically based on

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

current environment. In most modern web browsers, the 2D drawing will be based on Canvas2D for better performance. When Canvas is not available SVG will be chosen, and VML is used for some very ancient browsers. The 3D rendering faces a similar problem to 2D. WebGL32 is the default drawing technology for outputting 3D models at a high speed. If WebGL is not available, the system will fall back to Canvas2D or SVG. Two additional JavaScript libraries, Raphael.js33 and Three.js34 are currently used in render module for low level drawing functions in SVG/VML and WebGL. They should be also included together with Kekule.js when 2D or 3D drawing is required.

3.5 Widget Module In order to simplify the usage of our toolkit, additional web widgets based on HTML/CSS and JavaScript is built, which can be integrated into web applications directly. For example, user can create a periodical table widget, set its properties and insert it to an HTML element in several lines of JavaScript code: var widget = new Kekule.ChemWidget.PeriodicTable(document); widget.setEnableSelect(true) .setDisplayedComponents(['symbol', 'name', 'atomicNumber']) .appendToElem(document.getElementById('parent'));

Another way to add widget is using specified data attribute in normal HTML element. The following HTML code will bind periodical table widget directly to the div element and some properties of the widget are set by data- attributes simultaneously:

ACS Paragon Plus Environment

10

Page 11 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling



The two most useful widgets in the chemistry widget set are Viewer and Composer as shown in figure 4 and 5. The Viewer is designed to display 2D or 3D chemical structures in web page. It is also able to react to user interaction and perform basic manipulations including zoom, rotation, changing display type (skeletal or condensed for 2D molecule, wire, stick, ball stick or spacefill for 3D molecule) and exporting structure data. The Composer widget is a 2D chemical structure editor capable of handling both molecule structure and other objects such as text block and reaction symbol. It also supports unlimited undo/redo stack, clipboard operations and styles setting for individual chemical object, including (but not limited to) color, stroke width, font and text size of atom and bond. What's more, embedded structure tree and object inspector in Composer provides an advanced way for users to inspect and customize almost every aspect of chemical objects.

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 23

Figure 4. 2D and 3D chem viewer widget in web browser

Figure 5. Chem composer widget is designed to edit 2D chemical structure in web browser Aside from those related to chemistry, the toolkit ships with a series of general-purpose widgets like button, drop box, text editor, tab group and menu. They can be used to build general web applications of other areas.

3.6 Localization Module All text constants used by toolkit are stripped out from code and defined in separated language files. During runtime, the toolkit can automatically detect the language of web browser and choose the right localization resources to load. Currently two language sets, English and Chinese, are shipped with the toolkit.

ACS Paragon Plus Environment

12

Page 13 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.7 Test Cases To ensure the quality our toolkit, a series of automatic unit tests on algorithm and I/O modules are used. Two famous and widely used chemoinformatics libraries: CDK (the Chemistry Development Kit)35 and RDKit36 are used as the sources and references of those tests. Each test on basic molecule algorithms (including ring perception, stereo identification, aromatic ring recognization and sub structure search) follows the similar process: loads a test molecule from file (many of which are also from CDK or RDKit test set), then performs operation on molecule (e.g., finding all rings), at last compares the results (e.g., ring number, atoms and bonds in ring) with the ones returned by CDK or RDKit, ensures they are identical. The I/O functions are also automatically tested with a series of data files borrowed from CDK and RDKit test collections, ensuring the correct molecule structures are parsed from sources. As mentioned before, a variation of Morgan algorithm is used for molecule canonicalization and the algorithm is different from the one used in CDK or RDKit, so canonicalization tests are not compared with other libraries but ensure the same results are returned on molecules with randomized atom and bond indexes. All those unit tests have passed and their source code can be found in the Supporting Information of this manuscript. Other modules like the render and widget module, which involve user interface and interaction, are also tested by a set of manual tests.

4. Applications

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 23

Nowadays online HTML editor is an indispensable component in forums, chat rooms, CMS (Content Management System) and LMS (Learning Management System) to enable rich format texts and images inputting. It's a pity that almost none popular online editors are designed for chemistry. Usually chemists have to convert chemical structure into a bitmap image then insert it into the editor, enduring the loss of all chemical information. To solve the problem, a special plug-in for the widely used CKEditor37 is made by us. It utilizes Kekule.js toolkit, especially the Viewer and Composer widget, and provides an interface to insert 2D and 3D chemical structures directly into the editor or to edit existing structures with full chemical data reserved. Similar plug-ins can be made to other HTML editors as well, however, all those plug-ins need to be installed on server side by webmasters. Another approach to solve the problem is a Kekule.js based web browser extension which can be installed by chemists on their own computer for desktop Firefox or Chrome. Users are able to insert/edit chemical structures in any editable area on web page and publish it to server without any modification to the server side. People who have installed that extension are able to get both the visual forms and raw chemical data from published content while others may still read structure images at least. It can be a handy complement to editor plug-ins. The interfaces of both solutions are shown in Figure 6.

ACS Paragon Plus Environment

14

Page 15 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6. Plug-in for online editor and web browser to input chemical structure directly Furthermore, the toolkit has already been applied to the education of organic chemistry in our university. A series of web pages utilize the Viewer widget to demonstrate 3D models of organic molecule, helping students to comprehend molecule conformation and chirality in stereo chemistry. An online organic reaction self-test system utilizes it to let students edit molecule structures of reactions on web page and to compare answers submitted by students with predefined keys. As all jobs are able to be done by client side, the system is even available to static web servers.

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

5. Conclusion We have presented a brief introduction of a new open-source chemoinformatics toolkit written mainly in JavaScript. The toolkit includes classes to represent core chemistry concepts, to read and write chemical data, to display and edit chemical structure in screen and to perform common chemoinformatics algorithms. It is also well designed and flexible enough to be easily extended in functions. Developed with web standards, the toolkit is highly platform-independent and can be utilized to build applications in many circumstances: from web site to offline application, from desktop to mobile device, even on the server side with the help of Node.js38. The toolkit is released under MIT license which puts very limited restrictions and requirements on reuse of code. So it is available for both free and proprietary software. Some initial applications have proven the value of the toolkit and we expect more in the near future.

ASSOCIATED CONTENT Supporting Information Supporting Material includes:



Demos of the toolkit



Tutorial of the toolkit



Unit test source files of the toolkit

ACS Paragon Plus Environment

16

Page 17 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling



Detailed description of unit tests, detailed data of executing speed between native and emscripten compiled OpenBabel library, comparison of composer widget and other web molecule editors

This material is available free of charge via the Internet at http://pubs.acs.org. AUTHOR INFORMATION Corresponding Author *E-mail: [email protected]

REFERENCES (1) Chen, W. L. Chemoinformatics: Past, Present, and Future. J. Chem. Inf. Model. 2006, 46, 2230-2255. (2) Cheminformatics Toolkits. https://en.wikipedia.org/wiki/Cheminformatics_toolkits (accessed May 15, 2016). (3) What is HTML. http://www.w3.org/html/ (accessed May 15, 2016). (4) Cascading Style Sheets Home Page. http://www.w3.org/Style/CSS/ (accessed May 15, 2016). (5) Firefox OS. https://www.mozilla.org/en-US/firefox/os/ (accessed May 15, 2016). (6) Bienfait, B.; Ertl, P. JSME: a Free Molecule Editor in JavaScript. J. Cheminf. 2013, 5:24.

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 23

(7) Ketcher Home Page. http://lifescience.opensource.epam.com/ketcher/index.html (accessed May 15, 2016). (8) JSMol Home Page. http://sourceforge.net/projects/jsmol/ (accessed May 15, 2016). (9) Burger., M.C. ChemDoodle Web Components: HTML5 Toolkit for Chemical Graphics, Interfaces and Informatics. J. Cheminf. 2015, 7:35 (10) The MIT License. http://opensource.org/licenses/mit-license.php (accessed May 15, 2016). (11) Kekule.js Project on Github. https://github.com/partridgejiang/Kekule.js (accessed May 15, 2016). (12) Kekule.js Website. http://partridgejiang.github.io/Kekule.js/ (accessed May 15, 2016). (13) Web Worker Standard. https://html.spec.whatwg.org/multipage/workers.html (accessed May 15, 2016). (14) HTML Standard. https://html.spec.whatwg.org/ (accessed May 15, 2016). (15) Emscripten Home Page. http://kripken.github.io/emscripten-site/ (accessed May 15, 2016). (16) O'Boyle, N. M.; Banck, M.; James C.A.; Morley, C.; Vandermeersch, T.; Hutchison, G.R. Open Babel: An Open Chemical Toolbox. J. Cheminf. 2011, 3:33 (17) Erich, G.; Helm, R.; Johnson, R.; Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. 1995; pp 205-206.

ACS Paragon Plus Environment

18

Page 19 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(18) Erich, G.; Helm, R.; Johnson, R.; Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. 1995; pp 108-109. (19) CTfile Formats. http://download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip (accessed May 15, 2016). (20) Murray-Rust, P.; Rzepa, H.S. Chemical Markup XML, and the Worldwide Web. 1. Basic Principles. J. Chem. Inf. Comput. Sci. 1999, 39, 928-942. (21) Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36. (22) Introducing JSON. http://www.json.org/ (accessed May 15, 2016). (23) Dalby, A.; Nourse, J.G.; Hounshell, W.D.; Gushurst, A.K.; Grier, D.L.; Leland, B.A.; Laufer, J. Description of Several Chemical Structure File Formats Used By Computer Programs Developed at Molecule Design Limited. J. Chem. Inf. Comput. Sci. 1992, 32, 244-255. (24) Hanser, Th.; Jauffret, Ph.; Kaufmarm, G. A New Algorithm for Exhaustive Ring Perception in a Molecular Graph. J. Chem. Comput. Sci. 1996, 36, 1146-1152. (25) Roos-Kozel, B.L.; Jorgensen, W.L. Computer-Assisted Mechanistic Evaluation of Organic Reactions: 2. Perception of Rings, Aromaticity, and Tautomers. J. Chem. Inf. Comput. Sci. 1981, 21, 101-111. (26) Morgan, H.L. The Generation of a Unique Machine Description for Chemical Structures A Technique Developed at Chemical Abstracts Service. J. Chem. Doc.1965, 5, 107-113

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 23

(27) Xu, J. GMA: A Generic Match Algorithm for Structural Homomorphism, Isomorphism, and Maximal Common Substructure Match and Its Applications. J. Chem. Inf. Comput. Sci. 1996, 36, 25-34 (28) Vector Markup Language. http://www.w3.org/TR/NOTE-VML (accessed May 15, 2016). (29) Scalable Vector Graphics. http://www.w3.org/Graphics/SVG/ (accessed May 15, 2016). (30) HTML Canvas 2D Context. http://www.w3.org/TR/2dcontext/ (accessed May 15, 2016). (31) Erich, G.; Helm, R.; Johnson, R.; Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. 1995; pp 158-160. (32) WebGL Specification. https://www.khronos.org/registry/webgl/specs/1.0/ (accessed May 15, 2016). (33) Raphaël—JavaScript Library. http://dmitrybaranovskiy.github.io/raphael/ (accessed May 15, 2016). (34) Three.js Home Page. http://threejs.org/ (accessed May 15, 2016). (35) Steinbeck C.; Han Y.Q.; Kuhn S.; Horlacher O.; Luttmann E.; Willighagen E. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemoand Bioinformatics. J. Chem Info. Comput. Sci. 2003, 43, 493-500. (36) RDKit: Open-Source Cheminformatics Software. http://www.rdkit.org (accessed May 15, 2016). (37) CKEditor Home Page. http://ckeditor.com/ (accessed May 15, 2016).

ACS Paragon Plus Environment

20

Page 21 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(38) Node.js Home Page. https://nodejs.org/ (accessed May 15, 2016).

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 23

TOC Graphic Kekule.js: An Open Source JavaScript Chemoinformatic s Toolkit

Chen Jiang,* Xi Jin, Ying Dong and Ming Chen

ACS Paragon Plus Environment

22

Page 23 of 23

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment