SBOLDesigner 2: An Intuitive Tool for Structural Genetic Design - ACS

Apr 25, 2017 - This paper describes SBOLDesigner 2, a genetic design automation (GDA) tool that natively supports both the SBOL data model (Version 2)...
0 downloads 10 Views 879KB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Article

SBOLDesigner 2: An Intuitive Tool for Structural Genetic Design Michael Zhang, James Alastair McLaughlin, Anil Wipat, and Chris J. Myers ACS Synth. Biol., Just Accepted Manuscript • Publication Date (Web): 25 Apr 2017 Downloaded from http://pubs.acs.org on April 26, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

SBOLDesigner 2: An Intuitive Tool for Structural Genetic Design Michael Zhang,∗,‡ James Alastair McLaughlin,§ Anil Wipat,§ and Chris J. Myers‡ ‡Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, USA §School of Computing Science, Newcastle University, Newcastle upon Tyne, UK E-mail: [email protected]

Running header SBOLDesigner 2

1

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract As the Synthetic Biology Open Language (SBOL) data and visual standards gain acceptance for describing genetic designs in a detailed and reproducible way, there is an increasing need for an intuitive sequence editor tool that biologists can use that supports these standards. This paper describes SBOLDesigner 2, a genetic design automation (GDA) tool that natively supports both the SBOL data model (Version 2) and SBOL Visual (Version 1). This software is enabled to fetch and store parts and designs from SBOL repositories, such as SynBioHub. It can also import and export data about parts and designs in FASTA, GenBank, and SBOL 1 data format. Finally, it possesses a simple and intuitive user interface. This paper describes the design process using SBOLDesigner 2, highlighting new features over the earlier prototype versions. SBOLDesigner 2 is released freely and open source under the Apache 2.0 license.

Keywords synthetic biology open language, SBOL visual, genetic circuits, genetic design automation

Introduction Synthetic biology is an engineering discipline where biological components are assembled to form devices or systems with more complex functions. Engineering fundamentals such as standards, abstraction, and decoupling are required to give biologists the ability to design biological circuits in an easy, automated workflow (1 ). SBOLDesigner 2 is a simple, intuitive software tool for creating and manipulating genetic circuits and their sequences using version 2 of the Synthetic Biology Open Language (SBOL) data standard (2 ) natively, while presenting the designs to the user with symbols from version 1 of the SBOL Visual standard (3 ). Building on an earlier prototype (4 , 5 ) that supports SBOL data version 1 (6 ), this paper highlights numerous enhancements provided by its support of SBOL 2. Further-

2

ACS Paragon Plus Environment

Page 2 of 27

Page 3 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

more, the methods section of this paper provides a description of the details and challenges encountered during the SBOL 2 update. The SBOL data standard provides a digital format that allows biologists to share genetic designs stored in a principled scheme. SBOL 1 focused solely on the design of DNA components and the annotation of their sequences (6 ). The latest version, SBOL 2, enables the description of general biological components, along with their interactions, and hierarchical composition into modules. In particular, SBOL 2 describes designs structurally using ComponentDefinitions, Sequences, and SequenceAnnotations that are composed together into biological designs described functionally using ModuleDefinitions, Interactions, and Models (2 ). The SBOLDesigner tool focuses on the structural design of DNA parts. Although data content used by SBOLDesigner is similar to that in SBOL 1, many changes, as described in the Methods section, are necessary to support the restructured data model of SBOL 2 and its additional features. SBOL visual 1.0 (SBOLv 1) is a specification of standard visual schematic glyphs used to represent various commonly used genetic components (3 ). The standards are linked using the Sequence Ontology (SO) (7 ) as shown in Figure 1. Namely, a DNA ComponentDefinition constructed in the data standard is visualized by looking up the corresponding SBOLv 1.0 symbol specified by its role. As a result, SBOL 2 and SBOLv 1 taken together facilitate communication between experimental biologists, computational biologists, genetic engineers, and their computational tools. A key benefit of SBOL is that it enables reproducible exchange of genetic designs published in the literature by providing a clear and detailed storage format. For this reason, in 2016, ACS Synthetic Biology set a precedent by adopting the SBOL data and visual standards as the official method for depicting and digitally storing genetic constructs (8 ). The use of SBOL in publications and the deposition of this data into public repositories tremendously aids reproducibility in this field. However, in order for biologists to generate these designs in SBOL, they need a workflow that enables a straightforward way for generating

3

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: SO:0000167 is the Sequence Ontology term for a promoter. SBOLDesigner is able to associate this well-defined ID to both the SBOL Visual glyph for promoter, as well as a template promoter ComponentDefinition. Additionally, the Sequence Ontology is organized as a tree, so any descendants of promoter such as constitutive promoter or inducible promoter, are also associated with the correct glyph and ComponentDefinition.

4

ACS Paragon Plus Environment

Page 4 of 27

Page 5 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

these constructs (9 ). This paper describes SBOLDesigner, a key piece of this workflow and the first sequence editor that integrates support of the SBOL 2 data standard with SBOLv 1 symbols. A workflow using SBOLDesigner to create and capture genetic designs is depicted in Figure 2. In particular, SBOLDesigner can obtain DNA sequences and other important metadata from the SynBioHub parts repository (10 ). These components can then be composed and edited within SBOLDesigner to create a complete structural design of a genetic circuit. These new composite designs can then be uploaded to the part repository to enable sharing and reuse. In order to add functional information about a genetic design, SBOLDesigner has been integrated into the modeling and simulation genetic design automation (GDA) tool, iBioSim (11 ). The iBioSim software can be used to construct and analyze functional models using the Systems Biology Markup Language (SBML) (12 ). These functional models once annotated using genetic designs produced by SBOLDesigner (13 ) can be converted into an SBOL 2 document including functional information about the product of these genetic circuits and their interactions (14 ). Once again, the complete genetic circuit with its functional information can be archived in a part repository for sharing. Throughout this process, researchers can collaborate and pass around files from institution to institution, located anywhere in the world. The SBOL standard provides the means to enable a lossless communication of data between these software tools and repositories.

Results and Discussion SBOLDesigner 2 has a simple user interface that allows biologists to visualize and edit the details of their creation. It supports hierarchical design and offers generic, user-defined parts to ease fabrication from partial sequences to complete genetic constructs. Additionally, the user can flip the orientation of parts and view or edit their names and descriptions. Throughout the design process, SBOLv 1 (3 ) symbols, a system of schematic glyphs, provide

5

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: A workflow for synthetic biology design using SBOLDesigner. In this workflow, SBOL data is exchanged between the SynBioHub part repository, the sequence editor, SBOLDesigner, and the modeling and simulation software, iBioSim. Both softwares have direct computational access to the parts in the SynBioHub repository. These parts include the iGEM Parts Registry, the BacillOndex dataset for Bacillus subtilis (15 ), and a collection of Escherichia coli coding sequences and proteins.

6

ACS Paragon Plus Environment

Page 6 of 27

Page 7 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

standardized visualizations of individual parts. Figure 3 shows the basic user interface of SBOLDesigner 2 with a generic genetic circuit laid out on the canvas. Each of these parts are specified visually through SBOLv glyphs, specified by their roles. At this point, an abstract design is being represented since explicit parts are not assigned and sequences are not specified. In SBOL 2, a part is comprised of a ComponentDefinition and its Sequence, but the details of the SBOL data model need not be important to a biologist using the tool. The internal representation of how various features of SBOLDesigner 2 relate to the SBOL data model is abstracted away, and all a user needs to worry about is the high-level design of their actual circuit. A key feature of SBOLDesigner is that the power of the SBOL data standard is working in the background, but it never needs to be explicitly shown to the user. SBOLDesigner’s part editor is shown in Figure 4. The role of the part can be promoter, ribosome binding site, coding sequence, terminator, etc. If there is not a glyph that fits the part being represented, a generic user-defined part can be used instead. In addition to specifying a basic role, a more specific refinement role from the SO can be chosen. For example, a basic role of promoter can be enhanced with the refinement role of inducible promoter. SBOLDesigner 2 also supports importing part data and sequences from external SBOL, GenBank, and FASTA files, promoting the reuse of parts described by these earlier data standards. Genetic Toggle Switch Example: To explore some of the features of SBOLDesigner 2, let us consider the design of a genetic toggle switch (16 ). This example highlights features such as importing of parts from registries, hierarchical design, versioning, and connections with circuit GDA tools. Figure 5 depicts the registries tab in the preferences. This list shows a list of repositories from which the user can obtain parts. By default, every installation of SBOLDesigner 2 comes with the ability to fetch parts from a small library of built-in parts, the current working document, and the reference instance of SynBioHub. The built-in parts library contains a handful of pre-selected parts that demonstrate the ability to import from

7

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: SBOLDesigner 2’s user interface with a generic genetic circuit, comprised of a promoter, ribosome binding site, coding sequence, and terminator, laid out on the canvas. Buttons above the canvas are used to create a new document, open a document, or save and export the current document as an SBOL file. Specifically, the open document action allows the user to open a GenBank, SBOL 1, or SBOL 2 file. Also, above the canvas are buttons for showing or hiding scars in the design and editing, deleting, and flipping the orientation of a selected part. Buttons below the canvas are used to drop template parts onto the canvas.

8

ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 4: SBOLDesigner 2’s part editor depicting BBa R0010, which is the pLacI promoter. The part editor enables the user to specify the part’s role, display ID, name, description, version, and sequence. The DNA sequence is shown in a box below, as well as buttons for importing part data and sequences from other GenBank, FASTA, SBOL 1.1, and SBOL 2.0 files. The editing of annotations is also supported.

9

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

repositories. The working document displays all the parts in the SBOL design document that are currently loaded. Finally, an instance of the SynBioHub repository allows the fetching of parts from SynBioHub repositories (or other repositories adhering to a simple API) located in servers located anywhere on the Internet.

Figure 5: The registries tab in preferences. The default installation of SBOLDesigner 2 comes with a library of built-in parts, the working document, and the reference SynBioHub instance. Additional instances of SynBioHub can be easily linked to by providing the URL they are located at. For example, there is a public instance located at http://synbiohub.org. Also, files saved locally on disk can be added as repositories by clicking on the add button and using the file browser to navigate to the desired file. The genetic toggle switch is comprised of the LacI and TetR inverters, which are themselves comprised of promoters, ribosome binding sites, coding sequences, and terminators. The details for the specific parts can either be edited manually or fetched from a registry, such as a SynBioHub instance. The reference instance of SynBioHub currently includes parts from the iGEM Parts Registry, the BacillOndex dataset for Bacillus subtilis (15 ), and a collection of Escherichia coli coding sequences and proteins. The import part dialog is shown in Figure 6. This menu lets you filter through a certain registry’s parts by their type, role, name, and display ID. For example, the TetR repressible promoter can be imported 10

ACS Paragon Plus Environment

Page 10 of 27

Page 11 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

by selecting SynBioHub as the registry to import from, changing the part type to DNA, changing the part role to promoter, and typing BBa R0010 into the text box. This queries the repository and brings up all parts with matching properties. User created parts can also be shared by uploading them onto a SynBioHub instance.

Figure 6: The dialog used to import parts from repositories. The registry can either be a SynBioHub instance, a local file of parts, the working document, or the built-in parts. Parts can then be filtered by type, role, display Id, and name. Also, parts can be organized and viewed in labeled collections and sub-collections, such as the iGEM Parts Registry, the bacillus subtilis collection, or the E. coli Knowledge Base. Another integral feature of SBOLDesigner is the ability to design constructs hierarchically. Hierarchy is a useful organizational abstraction when designing complicated systems, and takes advantage of modularity and encapsulation to simplify more complicated genetic circuits. Figure 7 shows the complete genetic toggle switch represented hierarchically. The top level design is composed of two inverters, one on each strand. Each inverter is composed 11

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of a promoter, one or more gene constructs, and a terminator. Finally, each gene construct is composed of a ribosome binding site and coding sequence. SBOL 2 also provides version control of genetic designs. If a modification is made to a particular coding sequence, the changes could be saved into a new version instead of overwriting the original design. For example, a change to the LacI Gene would cause a new version of the TetR Inverter and top-level GeneticToggleSwitch to be created, as shown in Figure 8. Since SBOLDesigner 2 now supports reading of SBOL files with multiple designs, either version of the GeneticToggleSwitch can be opened. This gives the user the ability to view a design’s changes over time. When the design is complete and ready to export, SBOLDesigner 2 stitches together all of the specified Sequences to form a composite root Sequence that is attached to the root part. If desired, this root sequence can then be sent to a DNA synthesis service for fabrication. However, if the design is not yet complete, or if it needs to be shared via another format, SBOLDesigner 2’s export feature allows the user to export the design to a variety of other data standards such as GenBank, FASTA, and SBOL 1. The complete structural design can be simulated and modeled using a circuit GDA tool such as iBioSim (11 ). To make this process more streamlined, SBOLDesigner has been embedded as a plugin directly into iBioSim. Figure 9 shows SBOLDesigner, iBioSim’s default SBOL part constructor, running in the iBioSim workspace along with an image of an SBML model constructed by iBioSim for this part. At this point, iBioSim can be utilized to check the functional behavior through simulation and other types of analysis (11 ). In this SBML model, the structural SBOL design created using SBOLDesigner is associated with the SBML model using the annotation scheme described in (13 ), and it can be translated from SBML into SBOL 2 using the converter described in (14 ).

12

ACS Paragon Plus Environment

Page 12 of 27

Page 13 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 7: The complete hierarchically defined genetic toggle switch. The top-level design is composed of the LacI and TetR inverters, which are shown on the top right. The TetR inverter is then defined on the bottom right, and it is composed of a promoter, a gene construct, and a terminator. The LacI Gene, which is a part of the TetR inverter, is defined by the ribosome binding site and coding sequence shown on the left.

13

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8: The dialog for selecting which design in an SBOL file to open. There are two versions of the GeneticToggleSwitch. Version 1 contains the original LacI Gene, while version 2 contains a new LacI Gene. The history of changes to the design are kept and can be viewed just like any other design. When a design is modified, the user can choose to either overwrite the original design or save the new design as a new version. If the second option is chosen, SBOLDesigner only increments the versions of the parts that have changed.

14

ACS Paragon Plus Environment

Page 14 of 27

Page 15 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 9: SBOLDesigner as a plugin to the circuit GDA tool iBioSim. Using SBOLDesigner, the sequence for a new part can be constructed utilizing parts found in a SynBioHub repository. After designing the sequence for a new part, it is stored within an SBOL file within the iBioSim project workspace. At this point, an SBML model of the part can be constructed using iBioSim’s model editor. This model can then be annotated with a URI reference to the part to provide a link between the functional and structural description of the part. iBioSim can then be employed to validate the parts functionality through simulation and other types of analysis. Finally, the completed design can be uploaded to a SynBioHub repository.

15

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Discussion SBOLDesigner 2 completes a workflow for users of GDA tools. It combines a simple user interface with the power of the SBOL standard and serves as a launchpad for more detailed designs involving simulations and experiments. Many of the new features in SBOLDesigner 2 are needed to create complete designs, such as the genetic toggle switch described earlier. Some other new features that are not described include the ability to open SBOL files that include multiple root ComponentDefinitions, store a workspace of designs in a single SBOL file, and have write protection enforcement through name-space control using uniform resource identifiers (URIs). Specifically, users are only able to create and edit parts that match their personal unique name-space. This is necessary to prevent designers from changing parts that have a canonical representation in the public domain, which is an important feature that SBOLDesigner 1 lacked. Also, parts that users upload to repositories have a new URI minted that belongs to the repository in order to ensure an unchanging and dereferenceable part in the future. Besides a major overhaul to the internal SBOL representation, a major part of SBOLDesigner 2 is the many new features that are not found in the previous version. SBOLDesigner 2 is the first tool released to support fetching and uploading to SynBioHub repositories. A reference instance of SynBioHub located at http://synbiohub.org currently contains SBOL 2 parts from the iGEM Registry, the BacillOndex dataset for Bacillus subtilis (15 ), and a collection of Escherichia coli coding sequences and proteins. This meta-repository represents SBOL’s vision of the future of genetic part repositories, and will continue to grow as SBOL becomes more prolific. Also, any local SBOL 2 file saved on disk can also be linked within the tool to provide a custom user-defined repository. Furthermore, SBOLDesigner 2 now supports importing and exporting GenBank and FASTA files in addition to legacy support for SBOL 1. Versioning of separate parts, as well as overall versioning of designs has been added, with algorithmic auto-versioning of changed files when saving. SBOL 2 also includes an extensive and robust user annotation capability. Our annotation editor leverages 16

ACS Paragon Plus Environment

Page 16 of 27

Page 17 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

this to enable the user to provide any data not currently supported by SBOL within the SBOL file with a custom namespace and annotation. The overall structure of designs within documents, much like files within folders, has also been redone. This new structure has been tested with SBOL documents containing hundreds of unique parts nested hierarchically, and there have been no perceivable slowdowns or other problems so scalability should not be an issue for the immediate future. Along with many other features, SBOLDesigner 2 has added a lot of utility and useful integrations for the benefit of biologists interested in utilizing software to support genetic design. In the future, we plan to integrate SBOLDesigner with more data repositories and GDA tools. For example, the JBEI-ICE repository (17 ) already supports SBOL 2, so it should be fairly straightforward to connect to SBOLDesigner. Other registries that utilize the GenBank or FASTA formats could also be connected, making use of the conversion utilities available within libSBOLj (18 ). We are also currently working on updating the SBOLDesigner plugin support for the commercial sequence editor tool, Geneious (5 ). While other sequence based design tools such as DNAplotlib provide SBOL Visual style visualizations and Autodesk’s Genetic Constructor supports the ability to manipulate modular and hierarchical parts, none of them support the detailed editing power of the underlying SBOL like SBOLDesigner. Using SBOL 2 as its internal data model enables SBOLDesigner to achieve a level of reproducibility and reuse among SBOL compliant tooling and services and that is currently unmatched by other sequence editor and visualization tools. Additionally, in the future, we plan to extend the user interface for expert users to enable them to view and edit more of the properties for their part designs. In order to simplify the interface, several elements of even the structural subset of the SBOL 2 data model cannot currently be viewed or edited within SBOLDesigner. For example, SBOLDesigner presents the user with a part editor that encapsulates three SBOL objects, a ComponentDefinition, a Component, and a Sequence. An expert user may wish to have the ability to edit these elements separately to provide, for example, unique names or descriptions for each element.

17

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Finally, SBOL 2 provides the ability to express basic qualitative functional relationships in the form of Interactions. We are considering adding a better connection between the structural and functional layers perhaps leveraging our iBioSim plugin interface. While the SBOL 2 data model is quite robust, there are some limitations that the community is currently discussing, and when consensus is reached, we plan to update SBOLDesigner accordingly. For example, SBOLDesigner could quite easily be extended to support RNA and protein parts once the community agrees on visual glyphs for the Components associated with these parts. Also, SBOLDesigner’s implementation of the SBOL data model assumes that all sequences can be represented sequentially. While gaps can be represented using scars or generic parts, overlapping parts that are not hierarchical are challenging to visualize and cannot currently be created or edited. In this case, there is a proposal currently be evaluated to add a Location field to Components. This proposal may separate these concerns as Components may be required to be contiguous while SequenceAnnotations would continue to be allowed to specify overlapping features. There was also a related problem regarding specification of circular DNA Sequences, but an SO role has been chosen to address the issue. With sequence tools like SBOLDesigner 2, circuit tools like iBioSim, and parts repositories like SynBioHub, biologists have now a complete SBOL compliant workflow for prototyping and automating the design of genetic circuits. The use of SBOL allows these tools and repositories to seamlessly exchange data to create complete genetic designs. These workflows undoubtedly will stimulate the advancement of synthetic biology and achieve its promise of becoming a truly model-based design engineering discipline.

Methods This paper describes a new version of SBOLDesigner that has been updated from SBOL 1 to SBOL 2. While the user interface remains similar, the transition from SBOL 1 to

18

ACS Paragon Plus Environment

Page 18 of 27

Page 19 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

SBOL 2 required re-implementing most of the software’s back-end to use the data model provided by the libSBOLj 2.0 Java library (18 ). Figure 10 provides a comparison of the SBOL 1 data model to the structural portion of the SBOL 2 data model. The key difference is that SBOL 2 separates SequenceAnnotations from SBOL 1 into Components, SequenceAnnotations, and SequenceConstraints, which requires a fundamental change in the representation of parts in SBOLDesigner. There are also many API changes between libSBOLj 1.0 and libSBOLj 2.0. Part of the design philosophy of libSBOLj 1.0 is the utilization of a factory for creating objects from the data model. Conversely, all libSBOLj 2.0 data model objects inherit from the Identified class, and all object creation is handled from a centralized SBOLDocument object. Also, the API is more complicated due to the introduction of more specific data model objects and the increase in the number of classes. For example, SequenceAnnotations now contain Locations which can be specific Ranges or more general GenericLocations. SequenceConstraints allow for more general ordering, and Components define explicit instantiations of ComponentDefinitions. These changes taken together necessitated a complete re-implementation of SBOLDesigner’s back-end. The main canvas shown in Figure 3 represents a ComponentDefinition that brings together information on the design’s Sequence, its Components, and their organization. Therefore, every design in SBOLDesigner is inherently hierarchical. All the parts that are added to a specific design are contained transitively within a root ComponentDefinition, which is defined as a part that is not included as a sub-part within any other ComponentDefinition. Each SBOL file is allowed to include multiple root ComponentDefinitions. Therefore, when an SBOL file is opened in SBOLDesigner, a single root ComponentDefinition must be selected for editing. Non-root ComponentDefinitions can also be selected for editing. However, it is important to note that while non-root ComponentDefinitions appear hierarchically identical to normal root ComponentDefinitions from the perspective of the SBOLDesigner hierarchy viewer and canvas, they are fundamentally different since there exists another ComponentDefinition in the SBOL file that contains a Component that ref-

19

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10: A color coded diagram showing the mapping of classes between SBOL 1 and SBOL 2. While the left-hand side includes the entire SBOL 1 data model, the righthand side is a simplified view that includes only the structural portion of the SBOL 2 data model. SBOLDesigner currently only supports this structural portion. In particular, DNA ComponentDefinitions specify their structure using DNA Sequences encoded using IUPAC. These DNA ComponentDefinitions can include sub-structure specified using Components that are ordered by their SequenceConstraints and positioned by the Locations within the SequenceAnnotations. Finally, these DNA ComponentDefinitions can be organized into Collections. The key difference between SBOL 1 and SBOL 2 is that SequenceAnnotations are now split into SequenceAnnotations, Components, and SequenceConstraints. Also, DnaComponents and DnaSequences are now more generic ComponentDefinitions and Sequences enabling them to specify parts of other types, such as RNAs, proteins, small molecules, complexes, etc.

20

ACS Paragon Plus Environment

Page 20 of 27

Page 21 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

erences this transitive ComponentDefiniton. Also, if the original non-root is overwritten, changes are reflected in all designs in the SBOL file that reference this ComponentDefinition. If, on the other hand, a new version is created, then the original references continue to use the original design of this ComponentDefinition. In this case, the new version of this ComponentDefinition would itself become a root ComponentDefinition until it is also included in another design. The fields in the part editor map directly to a ComponentDefinition and its Sequence. For example, the role, displayId, name, description, and version are all properties of the ComponentDefinition, and the sequence text box maps directly to the elements property of a separate Sequence object. A ComponentDefinition and its Sequence are individual top-level objects in the data model, and can both exist independently without the other, but to simplify the interface, SBOLDesigner treats them as a single unit. Below the canvas in SBOLDesigner is a row of genetic elements that can be added to the design. When placed on the canvas, each element represents a Component. These Components are organized by SequenceAnnotations and SequenceConstraints, all of which are children of the canvas ComponentDefinition. This means that they can only exist within the parent ComponentDefinition. Components are completely new to SBOL 2, and represent hierarchy in the design. Each Component refers to another ComponentDefinition, and represents a specific instantiation of that referred ComponentDefinition. While Components are not top-level object in the data model, the ComponentDefinitions they refer to are. Therefore, clicking on the ”focus in” button expands a Component to expose its ComponentDefinition, generating a nested canvas. This new canvas also represents a ComponentDefinition, which allows the user to create hierarchically defined designs. However, even though this nested ComponentDefinition is a top-level object in the data model, it is not considered a root ComponentDefinition since there exists another ComponentDefinition that contains a Component that references this ComponentDefinition. As a result of Components, a design can have any number of layers of ComponentDefinitions, as long as there is not a cycle.

21

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 27

In other words, ComponentDefinitions cannot refer to a Component that then directly or indirectly refers to a ComponentDefinition that is a parent of the Component that refers to the original ComponentDefinition. SequenceAnnotations specify the precise Location and orientation (position and direction) of a Component and SequenceConstraints encode information on how Components are ordered relative to each other. SequenceContraints are only present when there is more than one part on the canvas, since relative ordering only matters when there are multiple parts. However, each Component always has a SequenceAnnotation that refers to it. Also, it is important to note that SequenceAnnotations and SequenceConstraints are annotations and constraints on Components, not Sequences. This distinction means there could be SequenceAnnotations and SequenceConstraints on Components whose ComponentDefinitions do not have a defined Sequence. By default, SequenceAnnotations contain a Location of type GenericLocation if the ComponentDefinition pointed to by its Component does not have a defined Sequence. However, if the Component refers to a ComponentDefinition with a defined Sequence, then the SequenceAnnotation contains a Location of type Range. This Range specifies the start and end index of exactly where this Component’s ComponentDefinition’s Sequence belongs in the parent ComponentDefinition’s Sequence. An important assumption used by SBOLDesigner is that parts are abutted as shown in the canvas. This means the start position of a Component is one base along from the end position of the Component before it. The first Component has a start position value of 1. With this assumption, the user does not need to worry about the construction of Sequences and their SequenceAnnotations, since this is all handled behind the scenes and is automatically generated by SBOLDesigner.

Acknowledgement We thank Evren Sirin (Complexible), Michal Galdzicki (Arzeda), Bryan Bartley (U. of Wash-

22

ACS Paragon Plus Environment

Page 23 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

ington), and John Gennari (U. of Washington) for their work on the original version of SBOLDesigner. This work is funded by the National Science Foundation under grants CCF1218095 and DBI-1356041. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. A.W. has been supported by the Engineering and Physical Sciences Research Council under grant EP/J02175X/1 and EP/N031962/1. J.A.M. is supported by Newcastle University and FUJIFILM Diosynth Biotechnologies.

Author Information • Present Address University of Utah Department of Electrical and Computer Engineering, University of Utah, 50 S. Central Campus Drive, Rm. 2110, Salt Lake City, UT, 84112, USA • Author Contributions M.Z. and C.M. worked on SBOLDesigner 2 and wrote the initial paper draft. J.A.M., A.W., and C.M. built SynBioHub. All authors contributed to the subsequent revisions of this paper. • Notes The authors declare no competing financial interest.

Supporting Information Available The latest release of SBOLDesigner 2, as well as source code, tutorials, and related files, can be found online at http://www.async.ece.utah.edu/SBOLDesigner. Additional examples using this workflow can be found at http://sbolstandard.org/software/workflows. This material is available free of charge via the Internet at http://pubs.acs.org/.

23

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1. Endy, D. (2005) Foundations for engineering biology. Nature 438, 449–453. 2. Beal, J. et al. (2016) Synthetic Biology Open Language (SBOL) Version 2.1.0. Journal of Int. Bioinfo. 3. Quinn, J. et al. (2015) SBOL Visual: A Graphical Language for Genetic Designs. PLOS Biology 4. Galdzicki, M., Bartley, B. A., Sleight, S. C., Sirin, E., and Gennari, J. H. (2013) Version Control for Synthetic Biology. 2013 International Workshop on Bio-Design Automation 19–20. 5. Olsen, C., Qaadri, K., Shearman, H., and Miller, H. (2014) Synthetic Biology Open Language Designer. 2014 International Workshop on Bio-Design Automation 60–61. 6. Galdzicki, M. et al. (2014) The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology. Nature Biotechnology 32, 545–550. 7. Eilbeck, K., Lewis, S., Mungall, C., Yandell, M., Stein, L., Durbin, R., and Ashburner, M. (2005) The Sequence Ontology: a tool for the unification of genome annotations. Genome biology 6, 1. 8. Hillson, N., Plahar, H., Beal, J., and Prithviraj, R. (2016) Improving Synthetic Biology Communication: Recommended Practices for Visual Depiction and Digital Submission of Genetic Designs. ACS synthetic biology 5, 449–451. 9. Roehner, N. et al. (2016) Sharing Structure and Function in Biological Design with SBOL 2.0. ACS synthetic biology 5, 498–506.

24

ACS Paragon Plus Environment

Page 24 of 27

Page 25 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

10. Madsen, C., McLaughlin, J. A., Mısırlı, G., Pocock, M., Flanagan, K., Hallinan, J., and Wipat, A. (2016) The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs. ACS synthetic biology 11. Madsen, C., Myers, C., Patterson, T., Roehner, N., Stevens, J., and Winstead, C. (2012) Design and test of genetic circuits using iBioSim. IEEE Design and Test of Computers 29, 32–39. 12. Hucka, M. et al. (2003) The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531. 13. Roehner, N., and Myers, C. (2013) A methodology to annotate systems biology markup language models with the synthetic biology open language. ACS synthetic biology 3, 57–66. 14. Nguyen, T., Roehner, N., Zundel, Z., and Myers, C. J. (2016) A Converter from the Systems Biology Markup Language to the Synthetic Biology Open Language. ACS synthetic biology 5, 479–486. 15. Misirli, G., Wipat, A., Mullen, J., James, K., Pocock, M., Smith, W., Allenby, N., and Hallinan, J. S. (2013) BacillOndex: An integrated data resource for systems and synthetic biology. Journal of Integrative Bioinformatics (JIB) 10, 103–116. 16. Gardner, T. S., Cantor, C. R., and Collins, J. J. (2000) Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. 17. Ham, T. S., Dmytriv, Z., Plahar, H., Chen, J., Hillson, N. J., and Keasling, J. D. (2012) Design, implementation and practice of JBEI-ICE: an open source biological part registry platform and tools. Nucleic acids research gks531. 18. Zhang, Z., Nguyen, T., Roehner, N., Misirli, G., Pocock, M., Oberortner, E., Sami-

25

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

neni, M., Zundel, Z., Beal, J., Clancy, K., Wipat, A., and Myers, C. (2015) libSBOLj 2.0: A Java library to support SBOL 2.0. IEEE Life Sciences Letters 1, 34–37.

26

ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Graphical TOC Entry

27

ACS Paragon Plus Environment