ESIgen: Electronic Supporting Information ... - ACS Publications

Mar 5, 2018 - suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 2013, 41, 557−561. (1...
1 downloads 8 Views 968KB Size
Subscriber access provided by UNIV OF DURHAM

ESIgen: Electronic Supporting Information Generator for Computational Chemistry Publications Jaime Rodríguez-Guerra Pedregal, Pablo Gómez-Orellana, and Jean-Didier Pierre Maréchal J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00714 • Publication Date (Web): 05 Mar 2018 Downloaded from http://pubs.acs.org on March 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ESIgen: Electronic Supporting Information Generator for Computational Chemistry Publications Jaime Rodríguez-Guerra Pedregal,* Pablo Gómez-Orellana, Jean-Didier Maréchal* Departament de Química, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona) Spain.

ABSTRACT

Electronic Supporting Information (ESI) occupies a fundamental position in the way scientists report their work. It is a key element in lightening the writing of the core manuscript and makes concise communication easier for the authors. Computational chemistry, as all fields related to structural studies of molecules, tends to generate huge amounts of data that should be inserted in the ESI. ESI reports raising from computational chemistry works generally reach tens of sheets long and include 3D depictions, coordinates, energies and other characteristics of the structures involved in the molecular process understudy. While most experienced users end up building scripts that dig throughout the output files searching for the relevant data, this is not the case for users without programming experience or time. Here we present an automated ESI Generator supported by both web-based and command line interfaces. Focused on quantum mechanics calculations outputs so far, we trust that the community would find this tool useful. Source code

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 14

is freely available at https://github.com/insilichem/esigen. A web app public demo can be found at http://esi.insilichem.com.

Introduction One of the major exercises scientists are confronted to is to write articles in a concise, direct and convincing manner. This exercise is only possible if the core manuscript presents the most important concepts, results and discussions while technical details and further analysis are part of additional documents. Supporting Information (SI) and its electronic counterpart (ESI) represent a key asset in scientific communication. SI reports allow researchers to provide details on their studies so that reviewers and readers can assess the quality of the practices and reach relevant information to reproduce or apply them in the context of their own investigation. It is therefore a common procedure to submit extensive SI documents along the main manuscript; something that is highly supported by scientific journals, which provide specific contexts and guidelines for this process. In computational chemistry, at least two challenges exist in the generation of SI documents. The first one is the handling of massive amounts of data. For example, the mechanistic study of a chemical reaction likely includes 3D representations for the reactants, products and relevant intermediates for the paths under study, as well as text data like energies, stoichiometry or transition state vectors, at least. These data are normally scattered in complexly structured output files and the second challenge consists therefore in gathering them efficiently. Researchers with the convenient programing skills generally end up building scripts to generate their SI documents. Those go over the output files, search for the relevant data, and write it an (un)formatted file that is finally pasted in a more advanced text processor. When developing new scripts, one could bet on available programs and libraries to handle the parsing1–4 and then

ACS Paragon Plus Environment

2

Page 3 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

simply focus on the style of output format. However, users without programming experience will avoid the technical complexities involved and become experts in copy-pasting exercises: a cumbersome handcrafting process easily prone to human error. While there are several software projects that could be helpful in this task, such as spreadsheet automation,5 desktop graphical interfaces for results analysis (GaussView,6 GaussSum1) or custom database creation (GaussDal,7 MyChem8), they mainly operate as standalone data collectors. In the end, yet another system would be needed to automate the full pipeline, be it a custom script or a multipurpose workflow editor such as KNIME-CDK9 or Taverna.10 The only resource currently available that allows to create automated ESI reports is ioChem-BD,11 a fullfledged web interface to store, organize and share chemical computation output files. While powerful, this platform must be hosted by the institution the user works for, which, if not academic, can incur in additional costs. Alternatively, an account must be created at one of the running instances after submitting a request form. ioChem-BD offers far more than online ESI reports and all these steps are surely worth in the long run once the users have their projects categorized and uploaded. However, some users might not need all the organization features, which can potentially get in the way if only some quick, customizable ESI reports are needed. Here we present ESIgen, a Python-based tool with a minimal entry barrier that embeds existing parsers to provide well formatted ESI files. As opposed to ioChem-BD, the purpose of ESIgen is to generate offline, downloadable reports that can be attached to the publications during the submission process. The ESIgen output includes 3D depictions of the chemical structures as well as key text information like coordinates, energies and so on. ESIgen can be run on a web server as well as installed locally, in case the privacy of the files is a concern.

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 14

Methods ESIgen is a Python 2.7/3.4+ project that can generate automated, unsupervised supporting information reports from a variety of computational chemistry output files. To do that, it provides two interfaces: (1) a graphical interface that consists of a web server that can be deployed to public-facing machines (like the demo hosted at http://esi.insilichem.com) or locally in the user desktop with esigenweb; and (2) an executable called esigen meant to be used in command-line environments. To achieve its functionality, this software relies on several open-source projects. The cclib library1 is used to parse computational chemistry output files coming from popular software packages such as Gaussian,6 ORCA,12 TurboMole13 or GAMESS-UK.14 PyMol15 is used for unsupervised 3D static image rendering of the compound in command-line environments, while NGL Viewer16 is recruited in the web interface (built with Flask17) to provide interactive 3D depictions of the optimized structures. ESIgen can be installed in Linux with a self-contained executable available at https://github.com/insilichem/esigen/releases or, for more experienced users, manually with conda or pip packages.

Tool usage In the graphical web interface, the user needs to upload the desired output files to the server. Then, a template must be chosen from a collection of presets to define which kind of data will be included in the generated report. If none of the provided choices satisfies the user, a custom one can also be tailored using the Jinja template language.17

ACS Paragon Plus Environment

4

Page 5 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The parsed data will then fill the template, thus generating the automated report. In the web interface, it will default to a HTML5 report with a 3D interactive preview of the optimized structure. This allows the user to select the best depiction of the compound and draw distance, angle and dihedral measurements. Atoms whose elements are not carbon, hydrogen, oxygen or nitrogen are appropriately labeled for easy visual recognition (figure 1). After customizing all the previews, the user can generate a PDF version of the document, download all the processed files in a Zip folder for custom typesetting, expose all the data in JSON format for programmatic usage and storage, or even export it to a GitHub Gist,18 which can be later imported as a GitHub repository suitable for DOI citation with Zenodo.19 For command-line environments, meant to process a big number of files at once, the esigen executable provides flags to configure the unsupervised generation of all the reports in a single step. The usage is straight-forward: esigen -t TEMPLATE FILE1 [FILE2 …]. It must be noted that, in this case, only static, high-resolution images of the compounds will be provided, resulting in potentially non-ideal orientations for some structures. Since the core logic of the package is designed to support several submissions at once, ideally all the provided output files should contain the same type of calculation. Otherwise, the tabulated data might contain missing fields, denoted as N/A (or any other desired placeholder) for the author to review.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 14

Figure 1. ESIgen can be used via a web interface and from the command line. When using the web interface (a demo is available at http://esi.insilichem.com), the user only needs to upload the quantum chemistry calculation output files to the server and select the data to report. After

ACS Paragon Plus Environment

6

Page 7 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

processing the file, an interactive HTML5 preview of the 3D structure can be displayed along the requested data so the user can manually find the best orientation for a static depiction.

While the web interface is meant for final, publication-ready reports, the command-line version could be useful in other contexts as well, such as routine checks on ongoing studies where the researcher must analyze tens of files daily to accept or discard possible reaction paths. Proficient users of the command-line normally resort to tools like grep to locate the relevant lines in the file and then manually fill a spreadsheet with the obtained values. In those cases, the workflow would be greatly improved by using ESIgen with a custom template to list all pertinent quantities for the files involved, resulting in more readable results and less time spent.

Conclusions Although still under active development, ESIgen is able to generate automated ESI reports for computational chemistry calculations performed with most common packages. The software only needs the logfiles of such calculations and a template specifying the fields that should be included in the report (like charge, multiplicity, cartesian coordinates, imaginary frequencies or free energy). The resulting utilities can be used online as well as locally, avoiding that way any possible conflicts of interest when dealing with confidential data. Being user-friendly and automated, a custom template could be applied routinely to review calculations under study to rapidly check for imaginary frequencies, convergence or energy profiles. Feedbacks of the initial version of ESIgen in the computational chemistry community has been more than enthusiastic, and some manuscripts have already used to generate part of their SI.20,21 We are confident that, with both theoretical and experimentalist scientists running quantum mechanical calculations

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 14

nowadays, ESIgen will be very welcomed and will also improve the communication between both sides. In the future, community-contributed templates could be listed in an online repository to address more usage scenarios. More export formats could be implemented, like direct generation of Docx or LaTeX documents, instead of having to convert from Markdown with external tools like Pandoc.22 New features could include the automated generation of reaction coordinate diagrams if multiple, related files are provided. Of course, any progress in cclib would immediately provide ESIgen and its users with wider input format compatibility and higher number of fields available.

AUTHOR INFORMATION Corresponding Authors * [email protected], [email protected]

Author Contributions The software was developed by Rodríguez-Guerra Pedregal and tested by all authors. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources Research group: Spanish MINECO (project CTQ2017-87889-P), Generalitat de Catalunya (project 2014SGR989). J.R.G.P.: Generalitat de Catalunya (grant 2017FI_B2_00168). P.G.O.: Spanish MINECO (grant FPI BES-2015-074190). ACKNOWLEDGMENT

ACS Paragon Plus Environment

8

Page 9 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

We thank Giuseppe Sciortino, Yane Vasquez, Gabriel Dos Passos, Martin Graf-Utzmann, Christopher Jeyajumar, Dennis Svatunek and Robert Q. Topper for helpful discussions on this work. ABBREVIATIONS SI, supporting information. ESI, electronic supporting information. HTML5, hypertext markup language 5. REFERENCES (1)

O’Boyle, N. M.; Tenderholt, A. L.; Langner, K. M. cclib: A library for packageindependent computational chemistry algorithms. J. Comput. Chem. 2008, 29, 839–845.

(2)

O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An Open chemical toolbox. J. Cheminform. 2011, 3.

(3)

Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423.

(4)

Hanwell, M. D.; Curtis, D. E.; Lonie, D. C.; Vandermeersch, T.; Zurek, E.; Hutchison, G. R. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform. 2012, 4, 17.

(5)

Laloo, J. Z. A.; Laloo, N.; Rhyman, L.; Ramasami, P. ExcelAutomat: a tool for systematic processing of files as applied to quantum chemical calculations. J. Comput. Aided. Mol. Des. 2017, 31, 667–673.

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(6)

Page 10 of 14

Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Others. Gaussian, Inc.: Wallingford CT 2009.

(7)

Alsberg, B. K.; Bjerke, H.; Navestad, G. M.; Åstrand, P. O. GaussDal: An open source database management system for quantum chemical computations. Comput. Phys. Commun. 2005, 171, 133–153.

(8)

Pansanel, J. MyChem http://mychem.sourceforge.net (accessed Dec 8, 2017).

(9)

Beisken, S.; Meinl, T.; Wiswedel, B.; de Figueiredo, L. F.; Berthold, M.; Steinbeck, C. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinformatics 2013, 14, 257.

(10)

Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; SoilandReyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 2013, 41, 557–561.

(11)

Álvarez-Moreno, M.; De Graaf, C.; López, N.; Maseras, F.; Poblet, J. M.; Bo, C. Managing the computational chemistry big data problem: The ioChem-BD platform. J. Chem. Inf. Model. 2015, 55, 95–103.

(12)

Neese, F. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 73–78.

(13)

Ahlrichs, R.; Bär, M.; Häser, M.; Horn, H.; Kölmel, C. Electronic structure calculations on workstation computers: The program system turbomole. Chem. Phys. Lett. 1989, 162, 165–169.

ACS Paragon Plus Environment

10

Page 11 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(14)

Guest, M. F.; Bush, I. J.; Van Dam, H. J. J.; Sherwood, P.; Thomas, J. M. H.; Van Lenthe, J. H.; Havenith, R. W. A.; Kendrick, J. The GAMESS-UK electronic structure package: Algorithms, developments and applications. Mol. Phys. 2005, 103, 719–747.

(15)

DeLano, W. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr. 2002, 700, 44–53.

(16)

Rose, A. S.; Hildebrand, P. W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res. 2015, 43, 576–579.

(17)

Ronacher, A. Flask (A Python Microframework) http://flask.pocoo.org (accessed Nov 29, 2017).

(18)

GitHub. GitHub Gist https://gist.github.com/ (accessed Jan 25, 2018).

(19)

CERN. Zenodo https://zenodo.org/ (accessed Jan 25, 2018).

(20)

Sciortino, G.; Lihi, N.; Czine, T.; Maréchal, J.-D.; Lledós, A.; Garribba, E. Accurate prediction of vertical electronic transitions of Ni(II) coordination compounds via Time Dependent DFT. Int. J. Quantum Chem. 2018 (under revision).

(21)

Lepori, C.; Gómez-Orellana, P.; Ouharzoune, A.; Guillot, R.; Lledós, A.; Ujaque, G.; Hannedouche, J. Well-Defined β-Diketiminatocobalt(II) Complexes For Alkene Cyclohydroamination of Primary Amines. J. Am. Chem. Soc. 2018 (under revision).

(22)

MacFarlane, J. Pandoc https://github.com/jgm/pandoc (accessed Jan 16, 2018).

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 14

FOR TABLE OF CONTENTS USE ONLY ESIgen: Electronic Supporting Information Generator for Computational Chemistry Publications Jaime Rodríguez-Guerra Pedregal,* Pablo Gómez-Orellana, Jean-Didier Maréchal*

ACS Paragon Plus Environment

12

Page 13 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

FOR TABLE OF CONTENTS USE ONLY 277x109mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. ESIgen can be used via a web interface and from the command line. When using the web interface (a demo is available at http://esi.insilichem.com), the user only needs to upload the quantum chemistry calculation output files to the server and select the data to report. After processing the file, an interactive HTML5 preview of the 3D structure can be displayed along the requested data so the user can manually find the best orientation for a static depiction. 309x361mm (72 x 72 DPI)

ACS Paragon Plus Environment

Page 14 of 14