QSoas: A Versatile Software for Data Analysis - ACS Publications

Apr 20, 2016 - ABSTRACT: Undoubtedly, the most natural way to confirm a model is to quantitatively verify its predictions. However, this is not done s...
1 downloads 5 Views 621KB Size
Subscriber access provided by HOWARD UNIV

Technical Note

QSoas: a versatile software for data analysis Vincent Fourmond Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b00224 • Publication Date (Web): 20 Apr 2016 Downloaded from http://pubs.acs.org on April 21, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

QSoas: a versatile software for data analysis Vincent Fourmond Aix-Marseille Université, CNRS, BIP UMR 7281, 31 chemin J. Aiguier, F-13402 Marseille cedex 20, France E-mail: [email protected] Abstract Undoubtedly, the most natural way to confirm a model is to quantitatively verify its predictions. However, this is not done systematically, and one of the reasons for that is the lack of appropriate tools for analyzing data, because the existing tools do not implement the required models or they lack the flexibility required to perform data analysis in a reasonable time. We present QSoas, an opensource, cross-platform data analysis program written to overcome these problems. In addition to standard data analysis procedures and full automation using scripts, QSoas features a very powerful data fitting interface with support for arbitrary functions, differential equation and kinetic system integration, and flexible global fits. Perhaps the most natural way to confirm a model is to check that it quantitatively reproduces data. However, this validation often cannot be easily systematically performed. A model may not be validated with data because it does not yield quantitative predictions, because it is too complex, or because it is too computationally intensive to be tested. Too often, though, quantitative data analysis is prevented by a lack of tools to implement the models, or lack of flexibility of the existing tools. Herein, we present QSoas, a general-purpose free software for data analysis and fitting, designed to overcome these problems. QSoas is the successor of the open-source program SOAS1. Like its predecessor, it was written to analyze electrochemical data, and especially those obtained using protein film voltammetry2. Hence, it features all the tools necessary to analyze voltammograms and chronoamperograms: data filtering (FFT and B-splines), baseline subtraction or division3, spike removal, averaging, integration, peak detection, arbitrary algebraic transformations, and also facilities for browsing through large sets of data files. However, QSoas was written to be more general than SOAS, and it will also be useful for the analysis of kinetic traces, spectra, and, more generally, all y = f(x) signals, as illustrated below. QSoas is command-based, like, e.g., Gnuplot (http://www.gnuplot.info) or R (http://www.r-project.org): to load data files, view them, and manipulate them, one enters commands at a prompt similar to the Unix command-line (with history edition and contextual automatic completion with the tab key). As many commands do not require user interaction, repetitive tasks can be automated: a full data analysis pipeline (e.g. data loading, baseline subtraction, spike removal and peak detection) can be scripted and applied to hundreds of datasets without manual intervention. Unlike Gnuplot or R, the use of the command-line is not compulsory, since commands can also be run using drop-down menus. Furthermore, many commands can be fine-tuned via optional parameters. A complete manual is available online (http://www.qsoas.org/manual.html), along with a tutorial that covers the most useful features of QSoas (http://www.qsoas.org/tutorial.html), including those described below.

The strongest point of QSoas is the facility it provides for non-linear regression (curve fitting). The fit interface is intuitive (Figure 1) with a display of the residuals and of the sub-components of the fits (when applicable, as for the fit shown in figure 1). QSoas can fit arbitrary, user-defined functions to data, and it can integrate ordinary differential equations and kinetic schemes. It also provides a number of built-in functions. Unlike other fitting programs, QSoas is designed keeping in mind the fact that,in nonlinear curve fitting, the choice of initial parameters determines whether or not the fitting algorithm converges towards a relevant solution. Therefore, QSoas makes it easy to visually inspect the quality of the fit, restart it with other parameters, fix or free parameters, save and reuse parameters ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 9

values. Moreover, it also provides initial parameter detection for built-in functions. Together, these features help to quickly obtain a meaningful set of parameters. QSoas can also fit a functionto several data sets simultaneously, with some parameters common to all and others specific to only some (creating a global parameter is just a matter of ticking a checkbox). This way, one can use the knowledge that some fit parameters should be identical across a number of experimental curves to constrain the parameters and determine them more reliably. As an example, we took advantage of this possibility to deconvolve a series of Raman spectra of mixtures of species with different concentrations (Figure 1). The characteristics of the bands (position and width) are common to all spectra, but their respective intensities are different in each spectrum. Fitting a function to several datasets simultaneously can also be used to analyze z = f(x; y) data, provided they can be modelled as a collection of z = f(x) datasets sharing some characteristics. This is, for instance, the case for the spectrum of a reaction mixture evolving with time (followed using stopped-flow or Joliot spectrophotometers, or other fast kinetic techniques), which can be modelled as a series of time evolutions of the absorbance for different wavelengths. The kinetic parameters that determine the evolution of the concentration of the species are common to all wavelengths, but the absorbances of each species depend on wavelength. Using QSoas, it is easy to load a series of spectra, transform them into series of time evolutions of the absorbance at several wavelengths, and fit the appropriate model to them. For an easier evaluation of the quality of the fit, it is possible to display the fit parameters as a function of the wavelengths, to see the difference spectra of the components for instance (as in figure 2C), and also to inspect the fit not only as a series of time traces for different wavelengths, but also as a series of spectra for different times. These two features are not found in other data fitting programs. Figure 2 shows this approach applied to the time evolution of the spectra of Rhodopseudomonas viridis bacterial photosynthetic reaction centers after exposure to a flash of light, measured using a Joliot spectrophotometer4. As shown in the online tutorial, the full data processing, from loading the files to displaying the spectra of the components, takes only minutes. Importantly, it is possible to fit a model to a large number of time traces this way, since QSoas embeds a highly optimized fit engine for which the time taken for an iteration scales only linearly with the number of datasets. For typical algorithms, it scales as the cube of the total number of parameters, i.e. as the cube of the number of datasets. This means better resolution for the spectra and better determined common parameters. The idea of fitting several curves at the same time is not new; the data presented in Figure 2 were initially fit using the emblematic program mexfit5. However, the programs performing this kind of data analysis are often highly technique- or discipline-specific and not user-friendly. If the parameters obtained through the first fit of the model to the data are not satisfactory, the iterative process by which one tweaks the initial parameters and re-runs the fit in the hope of finding the best set of parameters can be tedious. Even with industry standard fitting softwares like IgorPro or Origin, restarting the fit with a new set of initial parameters to converge towards a meaningful solution takes time. By contrast, the interface of QSoas makes it fast. Flexibility also distinguishes QSoas from all the other data-fitting programs. Models evolve in the reaserchers’ minds with their understanding of the underlying processes, and QSoas supports this evolution in several ways. First, it is common that parameters that were initially thought to be independant turn out in fact to be correlated. Maybe the ratio of two parameters is constant, or perhaps five parameters turn out to be a function of only three. QSoas supports reparametrizing the fit by providing facilities to define parameters as functions of others. Second, in the case of molecular systems for instance, non-idealities are often attributed to minute changes in the microscopic environments of the molecules, resulting in a distribution of microsopic parameters (for example, thermodynamic properties). Using QSoas, one can fit a model with a distribution of parameters to the experimental data. Finally, it is also possible to combine several fitting functions through an arbitrary formula, or to simultaneously fit a model and its derivative to the data and its derivative. We took ACS Paragon Plus Environment

Page 3 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

advantage of the latter possibility in our study of the dependence of the activity of hydrogenases (the enzymes that produce or consume H2) as a function of driving force. We showed that fitting the raw signal alone does not yield well-determined parameters, but fitting the signal together with its derivative does, as the derivative shows more pronounced features than the raw signal6. QSoas is written in C++ using Qt for the interface, the GNU Scientific Library (http://www.gnu.org/software/gsl/) for most of the data processing, and Ruby as an interpreted language (for mathematical formulas for instance). It embeds ODRPACK7 as one of many fit engines. It is cross-platform and can be compiled for Linux, MacOS and Windows. QSoas is open source and released under the GNU General Public License; this means that anyone can download the source code, compile it, verify it, modify it, and redistribute it (http://www.qsoas.org/downloads.html). For those who want to be spared the hassle of compilation, installers for Windows and MacOS can also be purchased. Acknowledgments The author wishes to thank Frauke Baymann for giving the data from ref. 2, Anne Jones and Guillaume Gerbaud for their critical reading of the manuscript, Pierre Ceccaldi, Carole Baffert, Jessica Hadj-Saïd, Christophe Orain, Meriem Merrouch and Matteo Sensi for testing QSoas and suggesting improvements, and Christophe Léger for both suggestions on QSoas and critical reading of the manuscript. The author acknowledges funding from the the A*Midex foundation of AixMarseille University (project MicrobioE, grant number ANR-11-IDEX-0001-02). The author is a member of the French Bioinorganic Chemistry group (http://frenchbic.cnrs.fr). Conflict of interest The author declares financial interest in the sale of the pre-built binaries. References (1) Fourmond, V.; Hoke, K.; Heering, H. A.; Baffert, C.; Leroux, F.; Bertrand, P.; Léger, C. Bioelectrochemistry 2009, 76, 141–147. (2) Léger, C. & Bertrand, P. Chem. Rev., 2008, 108, 2379-2438 (3) Fourmond, V.; Lautier, T.; Baffert, C.; Leroux, F.; Liebgott, P.-P.; Dementin, S.; Rousset, M.; Arnoux, P.; Pignol, D.; Meynial-Salles, I.; Soucaille, P.; Bertrand, P.; Léger, C. Anal. Chem. 2009, 81, 2962–2968. (4) Baymann, F.; Rappaport, F. Biochemistry 1998, 37, 15320–15326. (5) Müller, K.-H.; Plesser, T. Eur. Biophys. J. 1991, 19, 231–240. (6) Fourmond, V.; Baffert, C.; Sybirna, K.; Lautier, T.; Abou Hamdan, A.; Dementin, S.; Soucaille, P.; Meynial-Salles, I.; Bottin, H. & Léger, C. J. Am. Chem. Soc., 2013, 135, 3926-3938 (7) Boggs, P. T.; Donaldson, J. R.; Byrd, R. H.; Schnabel, R. B. ACM Trans. Math. Softw. 1989, 15, 348–364.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 9

Figure 1: Screenshot of the fit window at the end of simultaneous fitting of two spectra. The top panel shows the data with the fit of a sum of lorentzian peaks superimposed on each individual component of the spectrum. Parameter values are shown at the bottom (those circled in red are common to both spectra, for which the “G”, for “global”, checkbox is ticked). The yellow to green hue reflects the size of the 95% confidence interval for the parameter (the closer to green, the greater the confidence). For the sake of legibility, we chose to display only one of the datasets. It is possible to navigate between the datasets, and to display more than one dataset simultaneously.

ACS Paragon Plus Environment

Page 5 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2: Simultaneous fit of photosynthetic kinetic data, from ref 4, as demonstrated in the tutorial. Panel A shows spectra from whole cells of Rhodopseudomonas viridis recorded at different times after illumination (from 40 µs, in red, to 30 ms, in blue). Panel B shows the absorbance as a function of time for selected wavelengths (plain lines, the wavelengths are indicated with the same colors in panel A), along with the result of fitting bi-exponential relaxation simultaneously to all kinetic traces, with common values for the time constants (circles). Panel C shows the spectra of the two components and the t = 1 spectrum (as extrapolated from the fit).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of contents graphics:

ACS Paragon Plus Environment

Page 6 of 9

Page 7 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

fit components data

residuals

ACS Paragon Plus Environment

∆A (A.U.)

5

Analytical Chemistry Page 8 of A9

∆A (A.U.)

∆A (A.U.)

0 1 2 3 −5 4 540 550 560 570 580 5 λ (nm) 6 7 5 ●● ● B ● ● ● 8 ● ● ● ● 9 ● ● ● ● ● ● ● ● ● ● 10 0 ● ● ● 11 ● ● ● ● ● ● 12 ● ● ● ● ● −5 ● ● ● ● 13 −4 −3 −2 14 10 10 10 15 t (s) 16 10 Final C 17 τ = 0.3 ms 18 5 τ = 2.6 ms 19 20 0 21 22−5 ACS Paragon Plus Environment 23 540 550 560 570 580 24 λ (nm) 25

Page 9 of 9

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment