A Visual Language for Protein Design - ACS Synthetic Biology (ACS

Feb 7, 2017 - To this end, we present a draft visual language, Protein Language, that ... for visualization and computer-aided-design of engineered pr...
0 downloads 5 Views 392KB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Letter

A Visual Language for Protein Design Robert Sidney Cox III, James Alastair McLaughlin, Raik Gruenberg, Jake Beal, Anil Wipat, and Herbert Sauro ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00286 • Publication Date (Web): 07 Feb 2017 Downloaded from http://pubs.acs.org on February 8, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

A Visual Language for Protein Design Robert Sidney Cox III,∗,‡ James Alastair McLaughlin,∗,§ Raik Gr¨unberg,∗,k Jacob Beal,∗,⊥ Anil Wipat,∗,§ and Herbert M Sauro∗,¶ ‡Material Science Institute, University of Oregon, USA §School of Computing Science, Newcastle University, UK kComputational Bioscience Research Center, King Abdullah University for Science and Technology, KSA ⊥Raytheon BBN Technologies, USA ¶Department of Bioengineering, Seattle, USA E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Running header A Visual Language for Protein Design

1

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract As protein engineering becomes more sophisticated, practitioners increasingly need to share diagrams for communicating protein designs. To this end, we present a draft visual language, Protein Language, that describes the high-level architecture of an engineered protein with a few easy-to-draw glyphs, intended to be compatible with other biological diagram languages such as SBOL and SBGN. Protein Language consists of glyphs for representing important features (e.g., globular domains, recognition and localization sequences, sites of covalent modification, cleavage and catalysis), rules for composing these glyphs to represent complex architectures, and rules constraining the scaling and styling of diagrams. To support Protein Language we have implemented an extensible web-based software diagram tool, Protein Designer, that uses Protein Language in a “drag and drop” interface for visualization and computer-aideddesign of engineered proteins, as well as conversion of annotated protein sequences to Protein Language diagrams and figure export. Protein Designer can be accessed at http://biocad.ncl.ac.uk/protein-designer/

Keywords Synthetic biology, visualization, Synthetic Biology Open Language, genetic circuits, protein engineering

Introduction Protein engineering is one of the oldest disciplines of molecular biotechnology, with a rich history of engineering by mutation and fusion of genes coding for functional protein sequences. As more sophisticated and model-driven methods have become available, practitioners need to communicate increasingly complex designs. In other disciplines, such as electrical engineering (1 , 2 ) or architecture and mechanical engineering (3 , 4 ), standard visual symbols

2

ACS Paragon Plus Environment

Page 2 of 17

Page 3 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

and diagram languages allow engineers to more easily comprehend designs, avoid mistakes, build software tools, etc. No standard visual language has previously existed, however, for the depiction of design features within individual engineered proteins. We address this by presenting a draft visual language for protein design, Protein Language. Protein Language is specifically intended to aid protein design and not to describe all existing knowledge of protein biology. This approach is in keeping with other visual languages in engineering disciplines: for example, electronics diagrams do not aim to capture the full range of electromagnetic phenomena and architectural diagrams do not aim to describe the full physics of built structures. Accordingly, we have created glyphs focused on a subset of design elements intended to cover many of the most common changes that protein engineers make to manipulate protein function, expression and production. Protein Language has been simultaneously developed with the aim of compatibility with other standards in biological engineering, including the Systems Biology Graphical Notation (SBGN) (5 ) and the Synthetic Biology Open Language Visual (SBOLv) (6 , 7 ). Thus Protein Language makes use of design standards from other fields to produce a distinct and clear visual style, while remaining largely compatible with related efforts. Protein Language provides users with a wide range of expressive capabilities, which can improve communication of protein designs with rapidly drawn, easy to interpret, high-quality technical diagrams. To support use and adoption of Protein Language, we have also implemented a web-based software tool, Protein Designer, that provides an accessible interface for using these symbols to construct diagrams. We plan for Protein Language and its symbols to be adjusted and further refined through the experience of practitioners and an open community standardization process.

3

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results Protein Language and Glyph Set At present, there are twelve glyphs defined for Protein Language: four region glyphs and eight site glyphs. These glyphs have been chosen to be compatible with existing literature where possible, plus a number of novel symbols intended to be clear, easy to draw, and easy to distinguish. All twelve glyphs are shown in Figure 1 and described in detail in Appendix A. These glyphs are intended to serve as general categories for design rather than formal ontological definitions. For example, a region containing several transmembrane domains could be represented as several membrane glyphs, as a single structured region glyph, or omitted altogether with the omitted protein region glyph, depending on what a practitioner wishes to communicate regarding that sequence. Together, the twelve glyphs can generate a wide range of conceivable protein designs. A Protein Language diagram is built around a straight line, a common literature representation of an amino acid chain. Other significant features of the protein are then represented by “region” glyphs ordered along this backbone and “site” glyphs that are ordered along a region. The backbone line represents an arbitrary protein region, with unspecified structural properties. Unstructured and linker regions are normally shown in this way as backbone line. A rectangle with rounded edges describes a structured protein region, such as a protein domain, consistent with typical conventions from the literature on protein domains (e.g., (8 , 9 )). The width of the rectangle may be scaled to indicate relative region size. Membrane regions are shown with a zig-zag line, inspired by several literature illustrations (10 , 11 ). This membrane glyph can be used on either the backbone or the structured region glyph. We also include a dotted line to describe a region that is present in the protein but omitted from the diagram. These four region glyph types describe variably sized protein regions, are consistent with previous literature descriptions, and allow the user to highlight the basic structure of globular domains, disordered regions, and membrane regions.

4

ACS Paragon Plus Environment

Page 4 of 17

Page 5 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Smaller significant features of an protein’s structure and function are represented by eight site glyphs, typically representing features from one to thirty amino acids in size. The catalytic glyph represents an enzyme active site or binding pocket. The binding glyph is used to represent protein binding to various ligands including protein, DNA, and small molecules. The cleavage glyph covers proteolytic sites, and the similar degradation glyph includes recognition sites for processive protein degradation machinery and systems such as ubiquitination. Protein modifications by covalent attachment of small molecules are represented by the covalent glyph, covering post-translational modifications such as phosphorylation—a focus of intense research in the proteomic literature (e.g., (10 , 12 )). Two localization glyphs allow for the description of C-terminal, N-terminal, or internal sequences for protein transport, allowing protein designs to specify cellular location. Finally, the biochemical tag glyph includes sites for protein purification, crystallization, and other chemical handles. The eight site glyphs thus describe enzyme active sites and locations where a protein is post-translationally modified, cleaved, degraded, binded, transported, or biochemically manipulated.

Protein Designer Protein Designer is a web-based software tool for creating and manipulating Protein Language diagrams, available at http://biocad.ncl.ac.uk/protein-designer/. A screenshot is shown in Fig. 2.1 The user can create a protein backbone (unspecified region) by right clicking on the blank canvas. An unlimited number of resizable backbone lines are supported. The sidebar allows the user to select a glyph from the glyph set, which can then be placed on the canvas or attached to a protein backbone. The structured protein region glyph, in turn, has its own backbone attachment points for adding site glyphs to the top or bottom. Once completed, designs can be exported, using the button located in the top right, into Scalable Vector Graphic (SVG (13 )) images. The SVG can also be converted to PDF by the browser’s print dialog, and either form imported into compatible illustration or 1

Note: At present, Protein Designer requires the Google Chrome or Chromium desktop browser.

5

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

presentation software. Protein Designer’s simple interface allows fast layout of designs using Protein Language. Protein Designer uses a modular system of drawing rules to render SVG. New glyphs can be defined as geometrical rules using the SVG commands for path drawing: moveto, lineto, and closepath ((13 ) Section 8.3). This allows users the option of contributing new glyphs to the language as SVG geometry definitions, which can be incorporated into the Protein Designer code. The architecture of Protein Designer allows new glyphs and sets of glyphs to be added easily which, we hope, will facilitate the development of a standard visual protein language. Example A: Protease sensor Figure 3 shows a Protein Language diagram representing a protease-based sensor presented in (14 ). This protein device consists of regions encoding two colors of fluorescent proteins with a disordered region between them. Inside the disordered region is a protein cleavage site. This sensor exhibits fluorescent resonance energy transfer (FRET) between the two fluorescent protein domains, which is abolished when the protein is cleaved. The FRET signal is enhanced through a non-covalent binding: an intramolecular “helper interaction.” Other features include synthetic linker sequences and a biochemical purification tag. Example B: Light-inducible protein membrane localization Figure 4 shows a Protein Language diagram representing light-inducible protein membrane localization presented in (15 ). This engineered system consists of two separate protein backbones that can be brought together via a light-induced conformational change that reversibly controls protein binding. Two fluorescent reporter domains (mCitrine and mCherry) are used to image the localization of each protein to the cell plasma membrane, where one of the proteins is anchored by a membrane region. The system can be used as a general, reversible system for regulated recruitment to the plasma membrane in eukaryotes.

6

ACS Paragon Plus Environment

Page 6 of 17

Page 7 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Example C: Inducible artificial transcription factor Figure 5 shows a Protein Language diagram representing an inducible artificial transcription factor presented in (16 ). The estrogen receptor region is used to add inducible response to an artificial transcription factor. This design incorporates three structured protein regions: a DNA binding domain, the estrogen receptor, and a eukaryotic activation domain. Each domain’s function is described by site glyphs for binding and localization.

Discussion Visual depictions have always been an important tool in the design of biological systems. In this paper, we have presented the first diagram language for constructing visualizations specifically for purposes of protein engineering. Rather than focusing on protein structure, as in protein ribbon diagrams (17 ), Protein Language operates at a higher level of abstraction. This abstraction to the modular aspects of protein design reflects the increasing sophistication of protein engineering models, allowing the communication between practitioners to focus on the primary functional characteristics of a design and leaving the specific details of its realization to be examined only if necessary. As protein engineering capabilities improve, we expect that such abstract design diagrams will become increasingly important. Concurrently, as protein engineering capabilities improve, we expect that Protein Language will expand to cover a large range of routinely engineered features. The immediate next steps we envision for this effort, however, focus on refinement of Protein Language and its integration with existing standards and communities. In particular, we aim to integrate Protein Language with the Systems Biology Graphical Notation (SBGN) (5 ) and Synthetic Biology Open Language Visual (SBOLv) (6 , 7 ) standards, both of which are free and open standards supported by diverse international communities and part of the COmputational Modeling in BIology NEtwork (COMBINE) federated standards collection. Together, SBOLv and SBGN enable canonical depictions of functional pathways,

7

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structural features of DNA, and biochemical interactions, but presently neither has a means of depicting the sub-structure of a protein—a complementary capability provided by Protein Language. Moreover, efforts already underway in both of these communities will facilitate integration with Protein Language: SBGN is being enhanced to support diagram elements that show the sub-structure of chemical species using other visual languages, and SBOLv is being enhanced to support the standardized depiction of non-nucleic-acid components. As there are a number of minor differences in how Protein Language is currently formulated and the rules of these standards, integration will involve a number of refinements and adjustments. Given the positive reception that Protein Language has received in initial community discussions, however, we have confidence that it will ultimately form the basis for a broadly accepted, community-supported open standard that helps to effectively integrate engineered proteins into the design of biological systems.

Acknowledgement The authors thank Steven Schkolne for consultation on glyph design and Matthew Pocock for helpful discussions. J.A.M. is supported by FUJIFILM DioSynth Technologies. A.W. is supported by the Engineering and Physical Sciences Research Council grant EP/J02175X/1 and EP/N031962/1. R.S.C. is supported in part by US DoD grant FA5209-16-P-0041 and National Science Foundation grant DBI-1355909. J.S.B. is supported in part by the National Science Foundation Expeditions in Computing Program Award #1522074 as part of the Living Computing Project. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.

8

ACS Paragon Plus Environment

Page 8 of 17

Page 9 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Appendix A: Protein Glyphs • Protein region, unspecified. A protein region of unspecified function and structure. This denotes a generic region of protein, which is not known to contain a structured domain, such as an unstructured linker sequence. This glyph is also referred to as a protein backbone, since other glyphs can attach to it. • Protein region, omitted The dashed line is provided for omitting a region of the protein backbone, in cases where the relative scale of the protein domains should be shown but portions of the protein are omitted from the diagram to maintain the relative scale. • Membrane region. The membrane region glyph can be used to modify either unspecified or structured protein regions. The glyph is placed along either backbone, interrupting it with a membrane glyph. Membrane regions include both trans-membrane regions and membrane anchors into or across a plasma membrane or organelle (biological context or other diagrams must be used to distinguish between types of membrane regions). The site glyphs below cannot be put on top of a membrane glyph to indicate a functional modification in a membrane region. Instead, site glyphs can be placed vertically aligned with a membrane glyph on a structured region backbone. • Protein region, structured. A protein region with specific function or structure, such as a protein domain. Structured regions can be re-sized horizontally to have variable lengths. Structured regions can include multiple protein domains, and generally protein fusions would be represented as two or more structured regions. The structured region can be decorated with other site glyphs on the top or bottom backbone. This region can also be used to represent any function not covered by other glyphs. • Catalytic site. A protein region including one or more enzyme active sites. This glyph is meant to be flexible, so it could be used to label anything from an entire 9

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

active site, down to as small as a single residue that participates in an enzyme active site. Multiple glyphs can be used to show the relative positions of particular active site residues. The label letters and colors can be used to indicate different substrates and different sites contributing to the same active site. • Binding site, non-covalent. A general ligand binding site or dimerization domain. This will often represent a non-covalent contact between a protein site and another protein or peptide, a small molecule binding site, or binding to another macromolecule such as DNA. We do not proscribe how to draw binding interactions, just the presence of the binding site and a label. Binding sites can affect protein conformation and enzyme activity. • Protease cleavage site. A polypeptide region that directs a protease cleavage of the peptide chain. The cleavage could be modulated by protein conformation, or other protein sites such as catalytic sites or covalent modifications. • Covalent modification. A site for a covalent attachment to a protein such as phosphorylation, methylation, etc., with type indicated by a letter. Covalent sites can affect protein conformation and enzyme activity. • Localization signal, cleaved. A targeting polypeptide that encodes the localization of the protein. The signal peptide is then cleaved, and therefore the signal can not be used again. • Localization signal, retained. A targeting polypeptide that encodes the localization of the protein. Since the signal is retained after transport, it is possible for this signal to be used repeatedly. • Degradation signal. A sequence that directs protein degradation. For example, in bacteria this could be a recognition sequence for a protein degrading enzyme, and in eukaryotes this could be a ubiquitination site. 10

ACS Paragon Plus Environment

Page 10 of 17

Page 11 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

• Biochemical tag. A sequence that is useful for biochemical manipulation, detection or characterization. Examples include the His tag for protein purification and the FLAG tag for antibody recognition.

Appendix B: Glyph Drawing Rules Any implementation of Protein Language or extension thereof should conform to the following rules: Protein Backbone The protein backbone is the line on which the protein design glyphs are drawn. In some cases (e.g., structured protein regions) the line is not shown in the design, in other cases (e.g., protein sites) the line is shown with the glyphs on top. We do not restrict the shape of the line, though it should not cross itself. The protein line is drawn with a default 3pt thickness. Other linewidths are automatically considered to be labels. Multiple amino acid chains (such as for protein complexes) should be drawn as distinct protein lines. Protein Regions Regions are amino acid chains that can be of variable size, typically more than ten amino acids. Regions are drawn as replacements for sections of the backbone line. Any region glyph is scaleable in the dimension that is parallel to the backbone line, but may not be scaled in the other (i.e., the dimension normal to the backbone line). Regions can be expanded horizontally provide information about different sizes of amino acid regions, or to accommodate labels. When drawn on a curve or irregular line, the region glyph should lie along a straight line which connects the two endpoints of the backbone line (thus the backbone line is replaced with a straight line where there is a region glyph). Regions may not be drawn overlapping each other.

11

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Protein Sites Sites are generally smaller than regions, typically one to fifty amino acid residues. Sites are drawn as glyphs on top of the backbone line, centered, with the backbone line displayed below. Sites can also be drawn onto the structured region glyph (see Examples for implementation details). Site glyphs are not scaleable, so they are always the same size relative to the protein line. Site glyphs cannot be drawn on top of each other or overlapping. Sites may be drawn with a single character label (additional labels can be shown in a separate “labels” layer of a diagram). Sites can restrict to a specific set of characters such as lower case roman, upper case roman, roman numerals, etc. to the visual appeal of the glyphs. Protein Designer restrictions are: lower case in squares, upper case in diamonds and circles.

Supporting Information Available Supporting Information 1: Glyph images in SVG format. Supporting Information 2: Glyph images in PNG format. This material is available free of charge via the Internet at http: //pubs.acs.org/.

References 1. IEEE, IEEE Graphic Symbols for Logic Functions (Includes IEEE Std 91A-1991 Supplement, and IEEE Std 91-1984). IEEE Std. 91a-1991, 1991. 2. IEEE, IEEE Standard American National Standard Canadian Standard Graphic Symbols for Electrical and Electronics Diagrams (Including Reference Designation Letters). IEEE Std. 315-1975 (Reaffirmed 1993), 1993. 3. Schley, M., Buday, R., Sanders, K., and Smith, D. AIA CAD layer guidelines; The American Institute of Architects Press: Washington, DC, 1997.

12

ACS Paragon Plus Environment

Page 12 of 17

Page 13 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

4. British Standards Institution, Collaborative production of architectural, engineering and construction information. BS 1192:2007, 2007. 5. Le Novere, N., Hucka, M., Mi, H., Moodie, S., Schreiber, F., Sorokin, A., Demir, E., Wegner, K., Aladjem, M. I., and Wimalaratne, S. M. (2009) The systems biology graphical notation. Nature biotechnology 27, 735–741. 6. Quinn, J. Y. et al. (2015) SBOL Visual: A Graphical Language for Genetic Designs. PLoS Biol. 13, e1002310. 7. Quinn, J., Beal, J., Bhatia, S., Cai, P., Chen, J., Clancy, K., Hillson, N., Galdzicki, M., Maheshwari, A., Pocock, M., Rodriguez, C., Stan, G.-B., and Endy, D. Synthetic Biology Open Language Visual (SBOL Visual), version 1.0.0 ; 2013. 8. Chen, C., Nott, T. J., Jin, J., and Pawson, T. (2011) Deciphering arginine methylation: Tudor tells the tale. Nat. Rev. Mol. Cell Biol. 12, 629–642. 9. Lai, A., Sato, P. M., and Peisajovich, S. G. (2015) Evolution of synthetic signaling scaffolds by recombination of modular protein domains. ACS Synth. Biol. 4, 714–722. 10. Choudhary, C., and Mann, M. (2010) Decoding signalling networks by mass spectrometry-based proteomics. Nat. Rev. Mol. Cell Biol. 11, 427–439. 11. Lim, W. A. (2010) Designing customized cell signalling circuits. Nat. Rev. Mol. Cell Biol. 11, 393–403. 12. Whitaker, W. R., Davis, S. A., Arkin, A. P., and Dueber, J. E. (2012) Engineering robust control of two-component system phosphotransfer using modular scaffolds. Proc. Natl. Acad. Sci. U. S. A. 109, 18090–18095. 13. Ferraiolo, J., Jun, F., and Jackson, D. Scalable Vector Graphics (SVG) 1.0 Specification; iuniverse, 2000.

13

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

14. Gr¨ unberg, R., Burnier, J. V., Ferrar, T., Beltran-Sastre, V., Stricher, F., van der Sloot, A. M., Garcia-Olivas, R., Mallabiabarrena, A., Sanjuan, X., Zimmermann, T., and Serrano, L. (2013) Engineering of weak helper interactions for high-efficiency FRET probes. Nat. Methods 10, 1021–1027. 15. Levskaya, A., Weiner, O. D., Lim, W. A., and Voigt, C. A. (2009) Spatiotemporal control of cell signalling using a light-switchable protein interaction. Nature 461, 997–1001. 16. McIsaac, R. S., Oakes, B. L., Wang, X., Dummit, K. A., Botstein, D., and Noyes, M. B. (2013) Synthetic gene expression perturbation systems with rapid, tunable, single-gene specificity in yeast. Nucleic Acids Res. 41, e57. 17. Richardson, J. S. (1985) Schematic drawings of protein structures. Methods in enzymology 115, 359–380.

14

ACS Paragon Plus Environment

Page 14 of 17

Page 15 of 17

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 16 of 17

Page 17 of 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

E

E

Figure 4: Diagram example: the light-inducible PIF domain is used to create a reporter system for programmed localization of proteins to the plasma membrane. The protein binding domain (‘b’) is modulated by light reversibly when exposed to red (650 nm) or infra-red (750 nm) light. Each protein backbone also contains a distinct fluorescent reporter protein, yfp (yellow) or rfp (red).

Figure 5: Diagram example: an estrogen receptor region is used to add an inducible response to an artificial transcription factor. This design brings together three protein regions: the Nterminus encodes a DNA binding domain (‘d,’ a zinc finger DNA recognition region binding to a specific 9 base-pair DNA sequence); the middle region contains the estrogen receptor, which controls nuclear localization of the entire protein with an inducible response to the hormone beta-estradiol; the C-terminus encodes the activation domain VP16 (‘a’), which recruits polymerase to activate a eukaryotic promoter. The nuclear localization is modulated by a retained nuclear localization signal ‘N’ and a retained nuclear export signal ‘X’ where ‘N’ is blocked when bound to Hsp90 and unblocked when bound to estrogen.

17

ACS Paragon Plus Environment