Genetic Constructor: An Online DNA Design Platform - ACS Publications

Oct 11, 2017 - principle assembly of the EMMA constructs by H.T. Software project management led by F.M. (ADSK). Software project planning led by E.G...
0 downloads 0 Views 1MB Size
Subscriber access provided by LAURENTIAN UNIV

Technical Note

Genetic Constructor: An online DNA design platform Maxwell Bates, Joe Lachoff, Duncan Meech, Valentin Zulkower, Anais Moisy, Yisha Luo, Hille Tekotte, Cornelia Johanna Franziska Scheitz, Rupal Khilari, Florencio Mazzoldi, Deepak Chandran, and Eli S Groban ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00236 • Publication Date (Web): 11 Oct 2017 Downloaded from http://pubs.acs.org on October 13, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Title: Genetic Constructor: An online DNA design platform Authors: Maxwell Bates1, Joe Lachoff1, Duncan Meech1, Valentin Zulkower2, Anaïs Moisy2, Yisha Luo2, Hille Tekotte2, Cornelia Johanna Franziska Scheitz1, Rupal Khilari1, Florencio Mazzoldi1, Deepak Chandran3, Eli Groban1,* 1

Autodesk Life Sciences, San Francisco, California 94111, United States Edinburgh Genome Foundry, School of Biological Sciences, University of Edinburgh, Edinburgh, UK EH9 3BF 3 Radiant Genomics, Emeryville, CA 94608, United States 2

Abstract Genetic Constructor is a cloud Computer Aided Design (CAD) application developed to support synthetic biologists from design intent through DNA fabrication and experiment iteration. The platform allows users to design, manage, and navigate complex DNA constructs and libraries, using a new visual language that focuses on functional parts abstracted from sequence. Features like combinatorial libraries and automated primer design allow the user to separate design from construction by focusing on functional intent, and design constraints aid iterative refinement of designs. A plugin architecture enables contributions from scientists and coders to leverage existing powerful software and connect to DNA foundries. The software is easily accessible and platform agnostic, free for academics, and available in an open-source community edition. Genetic Constructor seeks to democratize DNA design, manufacture, and access to tools and services from the synthetic biology community. Keywords: synthetic biology, biological design, cloud foundry, CAD, open source, software Introduction Synthetic biologists unite the principles of engineering with traditional molecular and cell biology1, and have developed a powerful array of tools to inform and enable design of pathways2–5. The growing capacity and fidelity of DNA synthesis platforms and cloud foundries allows for increasingly intricate designs in biological engineering. These projects may encompass large spans of DNA sequence, introduce several complex, novel constructs, and test entire libraries simultaneously, instead of single sequences. However, it can be difficult to compose existing tools to devise cellular function, and the use of several unconnected tools through the design process is inefficient, and hinders reproducibility and record keeping. Existing sequence editing tools, such as ApE, Benchling, Vector NTI, Geneious, and Genome Compiler6–9, handle nucleotide level design and optimization gracefully, but their emphasis on linear and nucleotide-level representations proves unwieldy in projects that require abstraction of biological complexity. Especially when outlining a complex project, focus on sequence hampers concentration on function, centralization of design and experimental intent, and the encapsulation and reuse of parts. Conversely, many design specification tools and languages,

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

like the Genotype Specification Language (GSL)3 and Eugene2, present text-driven interfaces with limited visual feedback, which are difficult for some researchers to access. Tools focused on part composition, like GenoCAD4 and Teselagen5, use restrictive grammars to formalize manufacture-driven principles. Cello supports combinatorial design, but focuses on biological circuits10. j5 and Device Editor support functional design paradigms, but across separate tools and do not deeply support extension or ordering integration12. Genetic Constructor aims to ease the design process for Synthetic Biology by allowing a biologist to smoothly navigate a common workflow, from concept to manufactured DNA, in a centralized and extensible platform. Namely, the software works to disentangle design from the restrictions of manufacturing, while still integrating with assembly foundries and leveraging new DNA assembly methods from the community13. Constructor frees researchers to focus on functional design by beginning with free-form drafting, and eases the transition through DNA construction by delegating some workflows to algorithms, like codon optimization, and allowing for the functional expression of others, like automated assembly primer design. The Genetic Constructor Application Genetic Constructor builds on thought leadership of community standards like SBOL14, advancing design and fabrication paradigms scalable to devising and building large numbers of complex assemblies. User experience design and ongoing user research guided development of the application interface, in which users advance through a process of outlining a design intent, adding constraints, specifying sequence, manufacturing DNA, and progressive iteration. All work history is saved, allowing users to access prior versions of their work, and can be published publicly, or shared privately among users. A “sketching” feature provides a digital medium for initial drafting, allowing genetic designers to outline functional representations using glyphs adapted from SBOL Visual15, alongside other metadata (Figure 1.1). Sketches are hierarchical, composed of non-overlapping parts called “blocks,” and can be progressively annotated and defined with sequence to ultimately yield DNA specifications. Combinatorial design, visually captured using “list blocks” (Figure 1.2), and design constraints assist in addressing increased complexity. For example, users may define positional or sequence constraints or prevent subsets of changes. Many of these rules may be applied to parts independent of sequence, and can be composed to define reusable templates. Constraints can be honed as a design’s logic is guided by experiment. DNA sequences can be added manually, or imported from local inventories, public repositories like NCBI or iGEM, genome analysis tools16, or exposed foundry inventories (Figure 1.3). Files in common formats, like Genbank, from these sources can be imported and are automatically converted into hierarchical constructs. Alternatively, sequences may be derived from algorithms integrated through Genetic Constructor’s plugin architecture (Figure 2.1). These algorithms can simplify the design concerns of synthesis and assembly, which are influenced by evolving

ACS Paragon Plus Environment

Page 2 of 12

Page 3 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

manufacture processes, so that drafting functional intent is dissociated from the generation of fragments for assembly. Extensions An integral plugin framework allows Genetic Constructor to package existing specialized software and bespoke industrial tools. This architecture aims to support an ecosystem of connected software, averting users from laboriously chaperoning data across different applications. Extensions are supported on both the web client and server and may manifest as visual plugins (e.g. plasmid viewer), as algorithms (codon optimization), as connections to manufacture (foundries, assembly methods), or as adapters to other tools and services (structure prediction, custom data pipelines). One such extension, developed in conjunction with Amyris Inc, links the text-based GSL3 to Genetic Constructor’s design canvas in an integrated development environment (IDE) (Figure 2.1), providing visual feedback from code through compilation (Figure 2.2). Integration with Foundries Genetic Constructor simplifies access to industry scale DNA synthesis and assembly, broadening their access to scientists in industry and academia, illustrated by a connection to the Edinburgh Genome Foundry (Figure 1.4 and 2.1). Genetic Constructor is designed to integrate with foundries and DNA synthesis providers, sending a bill of materials and referring to parts rather than just a single sequence, and providing API hooks interact at key points in the progression from design to product. Genetic Constructor was used to generate a combination of 8 constructs from a genetic template (Figure S1) using the modular assembly kit EMMA17, and ordered from the Edinburgh Genome Foundry. Upon ordering, the designs are sent as a list of parts identifiers to the Foundry's ordering interface, where they are validated before the order enters the Foundry's production pipeline. The designs were assembled using a highly automated robotics chain and in-house decision-making software, resulting in a seamless design-to-manufacture process with minimal human intervention (see Supplemental Information for a detailed description of the workflow). Architecture / Technical Genetic Constructor is a platform-independent web application written in JavaScript and Python. An open-source community edition with core functionality but lacking certain features, like primer design or molecular visualization, is available on GitHub18. Docker19 containerization is used for deployment, so the software is easy to acquire, install, and scale. Genetic Constructor is hosted by Autodesk online20, or may be downloaded and run on a local machine for development and extension. Extensions take the form of npm21 packages and can extend functionality of both the web client and server. REST APIs allow external access to application

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

data and functionality (Figure 2.1), including APIs from extensions. Data models are serialized to JSON and heavily influenced by (though independent of) SBOL 2.014 (Figure S4). Authentication and user management are managed by Autodesk separate from the application, and can easily be substituted or extended in local builds. Future Future work will include more deeply and broadly integrating external services and software, thereby closing the design, build, test and learn cycle. We plan to add additional features for sequence annotation, optimization, and fabrication preparation. Concentrating on the transition from design to manufacture will require greater algorithmic control over designs. Deeper integration with foundries requires more granularity, capturing steps including performing automated design checks, receiving real-time feedback, algorithmically refining sequence, and providing the user with immediate order confirmation. Finally, incorporating learning from experimental data will allow the software to better inform subsequent design refinements. References

(1) Brophy, J. A. N., and Voigt, C. A. (2014) Principles of genetic circuit design. Nat. Methods 11, 508–520. (2) Bilitchenko, L., Liu, A., Cheung, S., Weeding, E., Xia, B., Leguia, M., Anderson, J. C., and Densmore, D. (2011) Eugene - A domain specific language for specifying and constraining synthetic biological parts, devices, and systems. PLoS One 6, e18882. (3) Wilson, E. H., Sagawa, S., Weis, J. W., Schubert, M. G., Bissell, M., Hawthorne, B., Reeves, C. D., Dean, J., and Platt, D. (2016) Genotype Specification Language. ACS Synth. Biol. acssynbio.5b00194. (4) Czar, M. J., Cai, Y., and Peccoud, J. (2009) Writing DNA with genoCAD. Nucleic Acids Res. 37, W40-7. (5) Teselagen; 2017; https://www.teselagen.com/. (6) ApE; 2017; http://biologylabs.utah.edu/jorgensen/wayned/ape/. (7) Benchling; 2017; https://benchling.com/. (8) Lu, G., and Moriyama, E. N. (2004) Vector NTI, a balanced all-in-one sequence analysis suite. Brief. Bioinform. 5, 378–388. (9) Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., and Drummond, A. (2012) Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649. (10) Genome Compiler; 2017 ;http://www.genomecompiler.com/. (11) Nielsen, A. A., Der, B. S., Shin, J., Vaidyanathan, P., Paralanov, V., Strychalski, E. A., ... & Voigt, C. A. (2016). Genetic circuit design automation. Science, 352(6281), aac7341. (12) Hillson, N. J., Rosengarten, R.D., Keasling, J. D. (2011) j5 DNA Assembly Design Automation Software. ACS Synth Biol. doi:10.1021/sb2000116. (13) Chao, R., Yuan, Y., and Zhao, H. (2015) Recent advances in DNA assembly technologies. FEMS Yeast Res. 15. (14) Bartley, B., Beal, J., Clancy, K., Misirli, G., Roehner, N., Oberortner, E., Pocock, M.,

ACS Paragon Plus Environment

Page 4 of 12

Page 5 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Bissell, M., Madsen, C., Nguyen, T., Zhang, Z., Gennari, J. H., Myers, C., Wipat, A., and Sauro, H. (2015) Synthetic Biology Open Language (SBOL) Version 2.0.0. J. Integr. Bioinform. 12, 272. (15) Quinn, J. Y., Cox, R. S., Adler, A., Beal, J., Bhatia, S., Cai, Y., Chen, J., Clancy, K., Galdzicki, M., Hillson, N. J., Le Novere, N., Maheshwari, A. J., McLaughlin, J. A., Myers, C. J., Umesh, P., Pocock, M., Rodriguez, C., Soldatova, L., Stan, G. B. V, Swainston, N., Wipat, A., and Sauro, H. M. (2015) SBOL Visual: A Graphical Language for Genetic Designs. PLoS Biol. 13. (16) Weber, T., Blin, K., Duddela, S., Krug, D., Kim, H. U., Bruccoleri, R., Lee, S. Y., Fischbach, M. A., Müller, R., Wohlleben, W., Breitling, R., Takano, E., and Medema, M. H. (2015) antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237-43. (17) Martella, A., Matjusaitis, M., Auxillos J., Pollard S. M., and Cai, Y. (2017) EMMA: An Extensible Mammalian Modular Assembly Toolkit for the Rapid Design and Production of Diverse Expression Vectors. ACS Synth. Biol. doi:10.1021/acssynbio.7b00016. (18) Genetic Constructor Github Repository; 2017; https://github.com/Autodesk/geneticconstructor-ce (19) Docker; 2017; https://www.docker.com/. (20) Genetic Constructor: Design and manufacture living things; 2017; http://www.geneticconstructor.com. (21) npm; 2017; https://www.npmjs.com/.

Abbreviations Application programming interface (API) Computer Aided Design (CAD) Extensible Mammalian Modular Assembly (EMMA) Toolkit Genotype Specification Language (GSL) Integrated Development Environment (IDE) Synthetic Biology Open Language (SBOL) Acknowledgements Our thanks to Autodesk Life Sciences for funding this work and Amyris for their collaboration on GSL. The EGF Foundry team is supported by Research Councils’ UK Synthetic Biology for Growth Programme (BBSRC grants BB/M025659/1, BB/M025640/1, and BB/M00029X/1 to YC). EGF Template library based on research of Andrea Marcela (EGF). Darren Platt (Amyris), the author of the GSL language, assisted in its integration. Author Information Corresponding Authors * Eli Groban: eli.groban@autodesk.com Author Contributions

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Genetic Constructor is the product of an active, daily working collaboration between two organizations: The Autodesk Life Sciences team in San Francisco, (ADSK); and the Edinburgh Genome Foundry at University of Edinburgh, (EGF). Below are the details of specific roles and contributions. Software architecture was led by M.B. (ADSK) with contributions by F.M. and D.M. (ADSK); Software development: M.B. (ADSK) wrote the main app; D.M. (ADSK) wrote the design canvas (ADSK); V.Z. (EGF), wrote the EGF Foundry order API and all EGF software presented; F.M. (ADSK) wrote the Genbank conversion scripts. Plug-In development: I.L., (EGF) wrote the EGF sequence viewer plugin; R.K. (ADSK) wrote the GSL editor plugin; D.M. (ADSK) wrote the ADSK sequence viewer plugin. User experience design and specification led by J.L. (ADSK) with contributions by A.M. (EGF). User research conducted by A.M. (EGF) with contributions by J.L. (ADSK). Supervision of robotic operations at EGF for the proof-of-principle assembly of the EMMA constructs by H.T. Software project management led by F.M. (ADSK); Software project planning led by E.G., (ADSK), C.S. (ADSK), and D.C. (formerly of ADSK). Overall project vision, support and guidance provided by E.G. (ADSK). Notes The authors declare the following competing financial interest(s): The corresponding author and several current or former Autodesk employees who are co-authors own Autodesk stock. Figures

ACS Paragon Plus Environment

Page 6 of 12

Page 7 of 12

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Design process. Users begin by (1) sketching a construct using “blocks,” (2) defining template rules and specifying lists of parts in combinatorial designs, using “list blocks,” (3) using hierarchy to add specific genes and sequences, yielding complete specifications that can be ordered (4). These specifications can be ordered from DNA foundries, potentially limited to subsets of combinatorial space. Designs are refined during experimental iteration.

Figure 2: Extensions. (1) Schema of the Genetic Constructor architecture depicting examples of design (specifically GSL) and build plugins. The core application provides the base for many classes of plugins. The GSL application (left) is a design plugin adding a command line editor connected to a compiler. Foundry integrations (right), such as EGF, can create custom inventories, provide design templates and execute design checks for live ordering. (2)

ACS Paragon Plus Environment

Page 8 of 12

Page 9 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Application interface relevant to the GSL extension. On the top is the visually rich canvas for creating and exploring designs; at the bottom an IDE to define and execute GSL. Supporting Information Available: Details of EMMA toolkit demonstration. This material is available free of charge via the Internet at http://pubs.acs.org

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

76x69mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 10 of 12

Page 11 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 1: Design process. Users begin by (1) sketching a construct using “blocks,” (2) defining template rules and specifying lists of parts in combinatorial designs, using “list blocks,” (3) using hierarchy to add specific genes and sequences, yielding complete specifications that can be ordered (4). These specifications can be ordered from DNA foundries, potentially limited to subsets of combinatorial space. Designs are refined during experimental iteration. 218x565mm (600 x 600 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Extensions. (1) Schema of the Genetic Constructor architecture depicting examples of design (specifically GSL) and build plugins. The core application provides the base for many classes of plugins. The GSL application (left) is a design plugin adding a command line editor connected to a compiler. Foundry integrations (right), such as EGF, can create custom inventories, provide design templates and execute design checks for live ordering. (2) Application interface relevant to the GSL extension. On the top is the visually rich canvas for creating and exploring designs; at the bottom an IDE to define and execute GSL. 163x315mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 12 of 12