Subscriber access provided by Olson Library | Northern Michigan University
Technical Note
sbml-diff: A tool for visually comparing SBML models in synthetic biology James Scott-Brown, and Antonis Papachristodoulou ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00273 • Publication Date (Web): 30 Dec 2016 Downloaded from http://pubs.acs.org on December 31, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
sbml-diff : A tool for visually comparing SBML models in synthetic biology James Scott-Brown∗ and Antonis Papachristodoulou Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK E-mail:
[email protected] Abstract We present sbml-diff, a tool that is able to read a model of a biochemical reaction network in SBML format and produce a range of diagrams showing different levels of detail. Each diagram type can be used to visualise a single model, or to visually compare two or more models. The default view depicts species as ellipses, reactions as rectangles, rules as parallelograms, and events as diamonds. A cartoon view replaces the symbols used for reactions based on the associated Systems Biology Ontology term. An abstract view represents species as ellipses, and draws edges between them to indicate whether a species increases or decreases the production or degradation of another species. sbml-diff is freely licensed under the 3-clause BSD license, and can be downloaded from https://github.com/jamesscottbrown/sbml-diff and used as a python package called from other software, as a free-standing command-line application, or online using the form at http://sysos.eng.ox.ac.uk/tebio/upload
Keywords SBML, synthetic biology, visualization, comparison, version control 1
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
In order to predict how synthetic biological circuits will function, or determine how parameters should be tuned, it is necessary to use mathematical models. Ideally, these models would be expressed in a standardised data format that provides interoperability between software packages, so that models can be constructed and simulated using different software, the results of simulating the same model using different simulation packages can be compared, and researchers can use their preferred tools to reproduce or extend previous work that used different tools. One such standardised format is the Systems Biology Markup Language (SBML) (1 ), a free and open interchange format for computer models of biological processes based on the Extensible Markup Language (XML). An SBML model consists of species that participate in reactions. Additionally, an event may reset the values of parameters and concentrations when a particular trigger expression is satisfied. A rule may specify that the value (in the case of an AssignmentRule) or rate of change (in the case of a RateRule) of a parameter or concentration is given by a particular math expression, or that a particular function of parameters and/or concentrations must always equal zero (in the case of an AlgebraicRule). The ability to compare SBML models is important, both to compare models of different systems, and to compare different versions of a model corresponding to the same model. Many other engineering fields, particularly software engineering, rely heavily on systems of version control to keep track of designs produced at each stage of an iterative design cycle. This is often accompanied by the use of file differencing (diff) tools to directly compare different versions and identify what changes were made. However, directly comparing two models in SBML format as text is unsatisfactory: diff can be used, but it is difficult to spot the salient features in their output. Also, many textual changes are not significant (e.g. changes in white-space or the ordering of elements), and if the id of a species is changed, this change will appear in many places (e.g. the list of reactants, list of products, and kineticLaw for reactions involving that species).
2
ACS Paragon Plus Environment
Page 2 of 12
Page 3 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
A visual comparison is able to put a change into context, showing not only what has changed, but also how this relates to the other, unchanged, components. Whilst it is possible to compare a set of models by simply juxtaposing independently created visualisations of each, this requires the user to ‘spot the differences’ between diagrams in which equivalent nodes may be in different spatial positions. In this paper, we present smbl-diff, a tool that reads SBML models of biochemical networks and produces a range of diagrams showing different level of detail. sbml-diff allows visual comparisons using a single diagram and sbml-diff can quickly highlight changes made between versions of a single model, make it easy to identify when a particular change was made, or check that no unintended changes were made at the same time as an intentional edit. It can also be used to check a model has not been accidentally altered by the process of importing and then re-exporting from another tool, or converting back-and-forth between file formats. Importantly, sbml-diff can also be used to compare models corresponding to different synthetic circuit designs, and visually show the difference between them. The most developed standard for biological designs is the Synthetic Biology Open Language (SBOL) (2 ), which represents designs as being composed of components with sequences and sequence annotations. Tools exist that use SBOL designs to automatically generate corresponding SBML models (3 ), which could be compared using sbml-diff. sbml-diff can also produce a diagram of a single model. Such a diagram is particularly useful in understanding the structure of a model produced by another researcher. Visualisation can also reveal structural features that are likely to correspond to errors. For example, any parameters that are set by a rule but not used elsewhere in the model become obvious; these may indicate that the wrong parameter is used or set somewhere in the model, or that the model has been changed and the parameter is no longer needed.
3
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Existing tools There are some existing tools which are capable of visually representing a single SBML model as a bipartite reaction graph. These include large GUI packages with many other features (e.g. CellDesigner (4 ), iBioSim (5 )), for which the visualisation component is difficult to script or incorporate into other tools, and a few freestanding tools that can generate DOT output (e.g. The Systems Biology Format Converter’s SBML2DOT Java module (6 ), or the python library VisualiseSBML (7 )). Several of these ignore rules and events, or otherwise do not visualise all elements of an SBML model. Cy3sbml (8 ) provides the ability to import an SBML file into Cytoscape (9 ), creating a network in which each component is represented by a node. However, whilst a user could manipulate this network in Cytoscape to produce a diagram like those produced by sbml-diff, cy3sbml cannot produce such diagrams automatically. BiVeS (10 ) is a tool for comparing SBML models, intended to track changes in a model over time. It produces outputs in a variety of formats but its visualisation abilities are currently relatively limited: work is underway to produce a companion tool to produce diagrams in the Systems Biology Graphical Notation Process Description (SGBN PD) language (11 ). This would visualise the underlying biochemistry, rather than the structure of the mathematical model, and so be complementary to sbml-diff, rather than competing with it.
Implementation sbml-diff reads models in SBML format, and produces output in DOT format, which can be converted into an image by GraphViz (12 ), or by other compatible software. It can be used as a python package, as a freestanding command-line tool, or through a form on our website. By default, elements in two models are treated as the same entity if they have the same id attribute. Optionally, two elements with different id attributes can be treated as the 4
ACS Paragon Plus Environment
Page 4 of 12
Page 5 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
same entity if they have the same set of MIRIAM (13 ) annotations of type is: in this case, the id of one is changed to be the same as the other, and all reactions/rules/events updated accordingly. Colouring is used to indicate whether each node and edge is common to all models (grey), some but not all models (black), or a single model (the colour specific to that model); the default colours were chosen to be distinguishable by colour-blind users. A dashed node edge indicates that a component is shared between models, but with differences in its attributes: a rectangle with a dashed border indicates a reaction for which not all models have the same kineticLaw; an ellipse with a dashed double border indicates a species for which not all models have the same isBoundary attribute.
compartment species decreasing value
species increasing value
species triggering event
species decreasing value
event
species set by event
Increased production
species increasing value
reactant
rule
parameter set by event
species set by rule
modifier (activator)
reaction
parameter set by rule
Decreased production
modifier (repressor)
Increased degradation
product
Decreased degradation
Figure 1: Meaning of each graphical element in the default (top) and abstract (bottom) views. The meaning of the lines differs between the two diagram types, but this should not cause confusion as in the default diagram they join species and non-species, and in the abstract diagram they join species to species.
5
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 12
cell cI protein
transcription of LacI (IR)
degradation of CI (IR)
LacI mRNA
degradation of LacI transcripts (IR)
cI protein
translation of LacI (IR)
LacI protein
degradation of LacI (IR)
transcription of TetR (IR)
LacI protein
TetR mRNA
degradation of TetR transcripts (IR)
translation of TetR (IR)
TetR protein
TetR protein degradation of TetR (IR)
transcription of CI (IR)
Files: toggle, repressilator cI mRNA
degradation of CI transcripts (IR)
translation of CI (IR)
Files: toggle, repressilator
cell transcription of TetR
transcription of LacI LacI protein
TetR protein
transcription of CI cI protein
Files: toggle, repressilator
Figure 2: Three examples of the comparison between two models, representing the repressilator (14 ) and the toggle-switch (15 ).
6
ACS Paragon Plus Environment
Page 7 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
Default view In the default view, there is a direct mapping between the structure of an SBML model and the visual elements in the diagram. This view is intended to directly represent the structure of the mathematical model represented in SBML, rather than the structure of the reaction network that is being modeled, which may be modeled in several different ways. For example, a reaction may be modeled in SBML using a reaction with the attribute fast set to true, or by an assignmentRule. We therefore map components directly onto symbols, using the mapping shown in Figure 1. The key SBML components are compartments (drawn as dashed rectangles), species (ellipses, drawn with a double border if isBoundary is true), parameters (labels), rules (parallelograms), reactions (rectangles), and events (diamonds). For clarity, we depict only those parameters whose values are set by rules or events. However, the effect of every parameter can be seen if reaction nodes are labeled with their rate law. Function definitions are not shown, as we substitute them into any expression which uses them. Units and initialAssignment components are not shown, as these concern the values of parameters rather than the structure of the model. Currently only rules of type AssigmentRule or RateRule (not AlgebraicRule) are depicted. Components are joined by arrows: if a line is solid, it indicates that a species’ concentration (or parameter’s value) is affected by an event, rule or reaction; if a line is dashed, it shows that a species affects an event, rule or reaction in some way. Normal arrowheads indicate activation, and T-shaped arrowheads indicate repression (arrowhead direction is determined numerically – see Supplementary Information) An option allows the user to adjust the level of detail, by choosing between labelling reaction nodes with the corresponding reaction name, kineticLaw, both, or neither. If a reaction has the attribute fast set to true, this is indicated by a ‘F’; if a reaction has the attribute reversible set to false this is indicated by ‘IR’ (IRreversible): when models are compared, these two markers are individually coloured using the same rule as other visual 7
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
elements.
Abstract view The abstract view does not represent reactions using nodes, and instead draws edges directly between species to indicate interactions between them. For each reaction, an edge is added from each species appearing in the kineticLaw to each reactant and product. Edges corresponding to a species increasing the rate of its own degradation are hidden, as these would typically be present for all species and create clutter. These interactions are categorised into four types, distinguished visually by two styles of arrowhead (arrow or T-shaped, indicating the sign of the interaction) and two styles of line (solid or dashed, indicating whether the interaction affects the rate of production or degradation). sbml-diff provides the option to elide a list of species (e.g. intermediate mRNAs) – these species are not drawn, and if they increase the production of a second species, then the heads of arrows incident to them are moved to that species (i.e. if A → B → C and B is elided, then B is not drawn and instead an arrow is drawn directly from A to C).
Cartoon view The cartoon view resembles the default view, but replaces some rectangular reaction nodes with symbols indicating the nature of the reaction. Specifically, any reaction annotated with a Systems Biology Ontology (16 ) term of transcription is replaced by a compound symbol containing the symbol for a promoter and a coding sequence for each product. Reactions of type translation are automatically elided, unless the intermediate mRNA participates in a reaction that is not annotated as translation or degradation.
8
ACS Paragon Plus Environment
Page 8 of 12
Page 9 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
Conclusion sbml-diff is freely licensed under the 3-clause BSD license, and can be downloaded from https://github.com/jamesscottbrown/sbml-diff and used as a free-standing command-line application, or used online using the form at http://sysos.eng.ox.ac.uk/tebio/upload. It can also be used as a python package, allowing it to be incorporated into larger software packages, such as tools for editing and curating collections of models, or incorporated into automated tests to ensure that other tools do not contain bugs that cause unintended changes to SBML files. Supporting Information The Supplementary Text contains additional details of how arrow directions are determined. The file SBML models.zip contains the models used to produce Figure 2.
Acknowledgement J.S-B acknowledges funding through the EPSRC & BBSRC Centre for Doctoral Training in Synthetic Biology, EP/L016494/1, and from DSTL. A.P. acknowledges support from EPSRC project EP/M002454/1.
References 1. Hucka, M. et al. (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531. 2. Roehner, N. et al. (2016) Sharing Structure and Function in Biological Design with SBOL 2.0. ACS Synth. Biol. 5, 498–506. 3. Roehner, N., Zhang, Z., Nguyen, T., and Myers, C. J. (2015) Generating Systems Biology
9
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Markup Language Models from the Synthetic Biology Open Language. ACS Synth. Biol. 4, 873–879. 4. Funahashi, A., Morohashi, M., Kitano, H., and Tanimura, N. (2003) CellDesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO 1, 159 – 162. 5. Myers, C. J., Barker, N., Jones, K., Kuwahara, H., Madsen, C., and Nguyen, N.-P. D. (2009) iBioSim. Bioinformatics 25, 2848–2849. 6. Rodriguez, N., Pettit, J.-B., Dalle Pezze, P., Li, L., Henry, A., van Iersel, P. M., Jalowicki, G., Kutmon, M., Natarajan, K. N., Tolnay, D., Stefan, I. M., Evelo, C. T., and Le Nov`ere, N. (2016) The systems biology format converter. BMC Bioinf. 17, 1–7. 7. Gillespie, C. S., Wilkinson, D. J., Proctor, C. J., Shanley, D. P., Boys, R. J., and Kirkwood, T. B. L. (2006) Tools for the SBML Community. Bioinformatics 22, 628–629. 8. Konig, M., Drager, A., and Holzhutter, H.-G. (2012) CySBML: a Cytoscape plugin for SBML. Bioinformatics 28, 2402–2403. 9. Shannon, P. (2003) Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 13, 2498–2504. 10. Scharm M., W. D., Wolkenhauer O. (2016) An algorithm to detect and communicate the differences in computational models describing biological systems. Bioinformatics 32 . 11. Moodie, S., Le Nov`ere, N., Demir, E., Mi, H., and Vill´eger, A. (2015) Systems Biology Graphical Notation: Process Description language Level 1 Version 1.3. Journal of integrative bioinformatics 12, 263. 12. Gansner, E. R., and North, S. C. (2000) An open graph visualization system and its applications to software engineering. Software Practice and Experience 30, 1203–1233.
10
ACS Paragon Plus Environment
Page 10 of 12
Page 11 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
13. Laibe, C., and Le Nov`ere, N. (2007) MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology. BMC Syst. Biol. 1, 58. 14. Elowitz, M. B., and Leibler, S. (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403, 335. 15. Gardner, T. S., Cantor, C. R., and Collins, J. J. (2000) Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339. 16. Le Nov`ere, N. (2006) Model storage, exchange and integration. BMC Neurosci. 7, S11.
11
ACS Paragon Plus Environment
cell
ACS Synthetic Biology
Page 12 of 12
cI protein
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
(IR)
>_
(IR) cell
LacI mRNA
LacI protein
TetR protein
(IR)
transcription of TetR
transcription of LacI
(IR)
transcription of CI cI protein
LacI protein
(IR)
.sbml
+
Files: toggle, repressilator
(IR)
.sbml
cI protein
TetR mRNA
(IR)
(IR)
LacI protein
TetR protein
(IR)
(IR)
cI mRNA
TetR protein ACS Paragon Plus Environment
(IR)
(IR)
Files: toggle, repressilator
Files: toggle, repressilator