ESIgen: Electronic Supporting Information Generator for

Here we present an automated ESI generator supported by both web-based and command line interfaces. Focused on quantum mechanics calculations outputs ...
5 downloads 6 Views 1021KB Size
Application Note pubs.acs.org/jcim

Cite This: J. Chem. Inf. Model. 2018, 58, 561−564

ESIgen: Electronic Supporting Information Generator for Computational Chemistry Publications Jaime Rodríguez-Guerra Pedregal,* Pablo Gómez-Orellana, and Jean-Didier Maréchal* Departament de Química, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain S Supporting Information *

ABSTRACT: Electronic supporting information (ESI) occupies a fundamental position in the way scientists report their work. It is a key element in lightening the writing of the core manuscript and makes concise communication easier for the authors. Computational chemistry, as all fields related to structural studies of molecules, tends to generate huge amounts of data that should be inserted in the ESI. ESI reports originating from computational chemistry works generally reach tens of sheets long and include 3D depictions, coordinates, energies, and other characteristics of the structures involved in the molecular process understudy. While most experienced users end up building scripts that dig throughout the output files searching for the relevant data, this is not the case for users without programming experience or time. Here we present an automated ESI generator supported by both web-based and command line interfaces. Focused on quantum mechanics calculations outputs so far, we trust that the community would find this tool useful. Source code is freely available at https://github.com/insilichem/esigen. A web app public demo can be found at http://esi.insilichem.com.



INTRODUCTION One of the major exercises with which scientists are confronted is to write articles in a concise, direct, and convincing manner. This exercise is only possible if the core manuscript presents the most important concepts, results, and discussions while technical details and further analysis are part of additional documents. Supporting information (SI) and its electronic counterpart (ESI) represent a key asset in scientific communication. SI reports allow researchers to provide details on their studies so that reviewers and readers can assess the quality of the practices and reach relevant information to reproduce or apply them in the context of their own investigation. It is therefore a common procedure to submit extensive SI documents along the main manuscript; something that is highly supported by scientific journals, which provide specific contexts and guidelines for this process. In computational chemistry, at least two challenges exist in the generation of SI documents. The first one is the handling of massive amounts of data. For example, the mechanistic study of a chemical reaction likely includes 3D representations for the reactants, products, and relevant intermediates for the paths under study, as well as text data like energies, stoichiometry, or © 2018 American Chemical Society

transition state vectors, at least. These data are normally scattered in complexly structured output files, and the second challenge consists therefore in gathering them efficiently. Researchers with the convenient programing skills generally end up building scripts to generate their SI documents. Those go over the output files, search for the relevant data, and write it an (un)formatted file that is finally pasted in a more advanced text processor. When developing new scripts, one could bet on available programs and libraries to handle the parsing1−4 and then simply focus on the style of output format. However, users without programming experience will avoid the technical complexities involved and become experts in copy−pasting exercises: a cumbersome handcrafting process easily prone to human error. While there are several software projects that could be helpful in this task, such as spreadsheet automation,5 desktop graphical interfaces for results analysis (GaussView,6 GaussSum1), or custom database creation (GaussDal,7 MyChem8), they mainly operate as standalone data collectors. In the end, Received: December 13, 2017 Published: March 5, 2018 561

DOI: 10.1021/acs.jcim.7b00714 J. Chem. Inf. Model. 2018, 58, 561−564

Application Note

Journal of Chemical Information and Modeling

Figure 1. ESIgen can be used via a web interface and from the command line. When using the web interface (a demo is available at http://esi. insilichem.com), the user only needs to upload the quantum chemistry calculation output files to the server and select the data to report. After processing the file, an interactive HTML5 preview of the 3D structure can be displayed along the requested data so the user can manually find the best orientation for a static depiction.

formatted ESI files. As opposed to ioChem-BD, the purpose of ESIgen is to generate offline, downloadable reports that can be attached to the publications during the submission process. The ESIgen output includes 3D depictions of the chemical structures as well as key text information like coordinates, energies, and so on. ESIgen can be run on a web server as well as installed locally, in case the privacy of the files is a concern.

yet another system would be needed to automate the full pipeline, be it a custom script or a multipurpose workflow editor such as KNIME-CDK9 or Taverna.10 The only resource currently available that allows the creation of automated ESI reports is ioChem-BD,11 a full-fledged web interface to store, organize, and share chemical computation output files. While powerful, this platform must be hosted by the institution the user works for, which, if not academic, can incur additional costs. Alternatively, an account must be created at one of the running instances after submitting a request form. ioChem-BD offers far more than online ESI reports, and all these steps are surely worth it in the long run once the users have their projects categorized and uploaded. However, some users might not need all the organization features, which can potentially get in the way if only some quick, customizable ESI reports are needed. Here we present ESIgen, a Python-based tool with a minimal entry barrier that embeds existing parsers to provide well-



METHODS ESIgen is a Python 2.7/3.4+ project that can generate automated, unsupervised supporting information reports from a variety of computational chemistry output files. To do that, it provides two interfaces: (1) a graphical interface that consists of a web server that can be deployed to public-facing machines (like the demo hosted at http://esi.insilichem.com) or locally in the user desktop with esigenweb and (2) an executable called esigen meant to be used in command-line environments. 562

DOI: 10.1021/acs.jcim.7b00714 J. Chem. Inf. Model. 2018, 58, 561−564

Journal of Chemical Information and Modeling To achieve its functionality, this software relies on several open-source projects. The cclib library1 is used to parse computational chemistry output files coming from popular software packages such as Gaussian,6 ORCA,12 TurboMole,13 or GAMESS-UK.14 PyMol15 is used for unsupervised 3D static image rendering of the compound in command-line environments, while NGL Viewer16 is recruited in the web interface (built with Flask17) to provide interactive 3D depictions of the optimized structures. ESIgen can be installed in Linux with a self-contained executable available at https://github.com/insilichem/esigen/ releases or, for more experienced users, manually with conda or pip packages.



Application Note



CONCLUSIONS



ASSOCIATED CONTENT

Although still under active development, ESIgen is able to generate automated ESI reports for computational chemistry calculations performed with most common packages. The software only needs the logfiles of such calculations and a template specifying the fields that should be included in the report (like charge, multiplicity, Cartesian coordinates, imaginary frequencies, or free energy). The resulting utilities can be used online as well as locally, avoiding that way any possible conflicts of interest when dealing with confidential data. Being user-friendly and automated, a custom template could be applied routinely to review calculations under study to rapidly check for imaginary frequencies, convergence, or energy profiles. Feedback on the initial version of ESIgen in the computational chemistry community has been more than enthusiastic, and some manuscripts have already used it to generate part of their SI.20,21 We are confident that, with both theoretical and experimentalist scientists running quantum mechanical calculations nowadays, ESIgen will be very welcomed and will also improve the communication between both sides. In the future, community-contributed templates could be listed in an online repository to address more usage scenarios. More export formats could be implemented, like direct generation of docx or LaTeX documents, instead of having to convert from Markdown with external tools like Pandoc. 22 New features could include the automated generation of reaction coordinate diagrams if multiple, related files are provided. Of course, any progress in cclib would immediately provide ESIgen and its users with wider input format compatibility and a higher number of fields available.

TOOL USAGE

In the graphical web interface, the user needs to upload the desired output files to the server. Then, a template must be chosen from a collection of presets to define which kind of data will be included in the generated report. If none of the provided choices satisfies the user, a custom one can also be tailored using the Jinja template language.17 The parsed data will then fill the template, thus generating the automated report. In the web interface, it will default to a HTML5 report with a 3D interactive preview of the optimized structure. This allows the user to select the best depiction of the compound and draw distance, angle, and dihedral measurements. Atoms whose elements are not carbon, hydrogen, oxygen, or nitrogen are appropriately labeled for easy visual recognition (Figure 1). After customizing all the previews, the user can generate a PDF version of the document, download all the processed files in a Zip folder for custom typesetting, expose all the data in JSON format for programmatic usage and storage, or even export it to a GitHub Gist,18 which can be later imported as a GitHub repository suitable for DOI citation with Zenodo.19 For command-line environments, meant to process a large number of files at once, the esigen executable provides flags to configure the unsupervised generation of all the reports in a single step. The usage is straightforward: esigen -t TEMPLATE FILE1 [FILE2···]. It must be noted that, in this case, only static, high-resolution images of the compounds will be provided, resulting in potentially nonideal orientations for some structures. Since the core logic of the package is designed to support several submissions at once, ideally all the provided output files should contain the same type of calculation. Otherwise, the tabulated data might contain missing fields, denoted as N/A (or any other desired placeholder) for the author to review. While the web interface is meant for final, publication-ready reports, the command-line version could be useful in other contexts as well, such as routine checks on ongoing studies where the researcher must analyze tens of files daily to accept or discard possible reaction paths. Proficient users of the command-line normally resort to tools like grep to locate the relevant lines in the file and then manually fill a spreadsheet with the obtained values. In those cases, the workflow would be greatly improved by using ESIgen with a custom template to list all pertinent quantities for the files involved, resulting in more readable results and less time spent.

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00714. Snapshot of the GitHub repository (www.github.com/ insilichem/esigen) as of Mar 6th, 2018, which includes both the source code of the software as well as additional documentation on installation, advanced usage, and development guidelines. (ZIP)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected] (J.R.-G.P.). *E-mail: [email protected] (J.-D.M.). ORCID

Jaime Rodríguez-Guerra Pedregal: 0000-0001-8974-1566 Author Contributions

The software was developed by J.R.-G.P. and tested by all authors. Funding

Research group: Spanish MINECO (project CTQ2017− 87889-P), Generalitat de Catalunya (project 2014SGR989). J.R.-G.P.: Generalitat de Catalunya and European Social Fund (grant 2017FI_B2_00168). P.G.-O.: Spanish MINECO (grant FPI BES-2015-074190). Notes

The authors declare no competing financial interest. 563

DOI: 10.1021/acs.jcim.7b00714 J. Chem. Inf. Model. 2018, 58, 561−564

Application Note

Journal of Chemical Information and Modeling



(19) CERN. Zenodo. https://zenodo.org/ (accessed Jan 25, 2018). (20) Sciortino, G.; Lihi, N.; Czine, T.; Maréchal, J.-D.; Lledós, A.; Garribba, E. Accurate prediction of vertical electronic transitions of Ni(II) coordination compounds via Time Dependent DFT. Int. J. Quantum Chem. 2018, in press. (21) Lepori, C.; Gómez-Orellana, P.; Ouharzoune, A.; Guillot, R.; Lledó s , A.; Ujaque, G.; Hannedouche, J. Well-Defined βDiketiminatocobalt(II) Complexes For Alkene Cyclohydroamination of Primary Amines. J. Am. Chem. Soc. 2018, in press. (22) MacFarlane, J. Pandoc. https://github.com/jgm/pandoc (accessed Jan 16, 2018).

ACKNOWLEDGMENTS We thank Giuseppe Sciortino, Yane Vasquez, Gabriel Dos Passos, Martin Graf-Utzmann, Christopher Jeyajumar, Dennis Svatunek, and Robert Q. Topper for helpful discussions on this work.



ABBREVIATIONS SI, supporting information; ESI, electronic supporting information; HTML5, hypertext markup language 5; JSON, JavaScript Object Notation



REFERENCES

(1) O’Boyle, N. M.; Tenderholt, A. L.; Langner, K. M. cclib: A library for package-independent computational chemistry algorithms. J. Comput. Chem. 2008, 29, 839−845. (2) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An Open chemical toolbox. J. Cheminf. 2011, 3, 33. (3) Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422−1423. (4) Hanwell, M. D.; Curtis, D. E.; Lonie, D. C.; Vandermeersch, T.; Zurek, E.; Hutchison, G. R. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminf. 2012, 4, 17. (5) Laloo, J. Z. A.; Laloo, N.; Rhyman, L.; Ramasami, P. ExcelAutomat: a tool for systematic processing of files as applied to quantum chemical calculations. J. Comput.-Aided Mol. Des. 2017, 31, 667−673. (6) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; et al. GaussView; Gaussian, Inc.: Wallingford CT, 2009. (7) Alsberg, B. K.; Bjerke, H.; Navestad, G. M.; Åstrand, P. O. GaussDal: An open source database management system for quantum chemical computations. Comput. Phys. Commun. 2005, 171, 133−153. (8) Pansanel, J. MyChem. http://mychem.sourceforge.net (accessed Dec 8, 2017). (9) Beisken, S.; Meinl, T.; Wiswedel, B.; de Figueiredo, L. F.; Berthold, M.; Steinbeck, C. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinf. 2013, 14, 257. (10) Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 2013, 41, 557−561. (11) Á lvarez-Moreno, M.; De Graaf, C.; López, N.; Maseras, F.; Poblet, J. M.; Bo, C. Managing the computational chemistry big data problem: The ioChem-BD platform. J. Chem. Inf. Model. 2015, 55, 95− 103. (12) Neese, F. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 73−78. (13) Ahlrichs, R.; Bär, M.; Häser, M.; Horn, H.; Kölmel, C. Electronic structure calculations on workstation computers: The program system turbomole. Chem. Phys. Lett. 1989, 162, 165−169. (14) Guest, M. F.; Bush, I. J.; Van Dam, H. J. J.; Sherwood, P.; Thomas, J. M. H.; Van Lenthe, J. H.; Havenith, R. W. A.; Kendrick, J. The GAMESS-UK electronic structure package: Algorithms, developments and applications. Mol. Phys. 2005, 103, 719−747. (15) DeLano, W. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr. 2002, 700, 44−53. (16) Rose, A. S.; Hildebrand, P. W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res. 2015, 43, 576−579. (17) Ronacher, A. Flask (A Python Microframework). http://flask. pocoo.org (accessed Nov 29, 2017). (18) GitHub. GitHub Gist. https://gist.github.com/ (accessed Jan 25, 2018). 564

DOI: 10.1021/acs.jcim.7b00714 J. Chem. Inf. Model. 2018, 58, 561−564