An Automated Biomodel Selection System (BMSS) for Gene Circuit

Apr 29, 2019 - Gilman, Singleton, Tennant, James, Howard, Lux, Parker, and Love. 0 (0),. Abstract: Well-characterized promoter collections for synthet...
1 downloads 0 Views 1MB Size
Subscriber access provided by OCCIDENTAL COLL

Article

An Automated Bio-Model Selection System (BMSS) for Gene Circuit Designs Jing Wui Yeoh, Kai Boon Ivan Ng, Ai Ying Teh, Jing Yun Zhang, Wai Kit David Chee, and Chueh Loo Poh ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.8b00523 • Publication Date (Web): 29 Apr 2019 Downloaded from http://pubs.acs.org on April 30, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

An Automated Bio-Model Selection System (BMSS) for Gene Circuit Designs Jing Wui Yeoh1,2#, Kai Boon Ivan Ng1, Ai Ying Teh1,2, JingYun Zhang1,2, Wai Kit David Chee1,2, Chueh Loo Poh1,2* 1

Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, Singapore. 2

NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, Singapore #

Present address: Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore. *To whom correspondence should be addressed. Email: [email protected] ABSTRACT Constructing a complex functional gene circuit composed of different modular biological parts to achieve the desired performance remains challenging without a proper understanding of how the individual module behaves. To address this, mathematical models serve as an important tool towards better interpretation by quantifying the performance of the overall gene circuit, providing insights and guiding the experimental designs. As different gene circuits might require exclusively different mathematical representations in the form of ordinary differential equations to capture their transient dynamic behaviors, a recurring challenge in model development is the selection of the appropriate model. Here, we developed an automated biomodel selection system (BMSS) which includes a library of pre-established models with intuitive or unintuitive features derived from a vast array of expression profiles. Selection of models is built upon the Akaike information criteria (AIC). We tested the automated platform using characterization data of routinely used inducible systems, constitutive expression systems, and several different logic gate systems (NOT, AND, and OR gates). The BMSS achieved a good agreement for all the different characterization datasets and managed to select the most appropriate model accordingly. To enable exchange and reproducibility of gene circuit design models, the BMSS platform also generates Synthetic Biology Open Language (SBOL)compliant gene circuit diagrams and Systems Biology Markup Language (SBML) output files. All aspects of the algorithm were programmed in a modular manner to ease the efforts on model extensions or customizations by users. Taken together, the BMSS which is implemented in Python supports users in deriving the best mathematical model candidate in a fast, efficient and automated way using part/circuit characterization data. KEYWORDS: Genetic circuit modelling; model selection; automation; model fitting; characterization data; computational tool Synthetic Biology has sprouted and gained a notable advance since two decades ago with the creation of synthetic gene-regulatory circuit devices of a genetic toggle switch and a biological repressilator1,2. However, constructing a complex foundational gene circuit composed of different modular standardized biological parts to achieve the desired performance remains challenging without a proper understanding of how the individual modules behave. These exceptional works had thus leveraged the power of quantitative modelling to better understand the system dynamic behaviors and identify key experimental parameters to be adjusted so as to achieve functional devices1,2. Quantitative modelling serves as a powerful tool to facilitate

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the design-build-test cycles with less effort either through forward or reverse engineering approach. Use of modelling in synthetic biology allows a representation of essential aspects of the desired system, capturing the system behavior in a quantitative manner useful for analysis and rational design optimization (e.g. design of experiments and sensitivity analysis)3. Deterministic nonlinear models have been developed to assist the development of a number of engineered biological systems for various applications including synthetic gene circuits4,5, gene expression regulations4–7, pathogen targeting microbes8,9, and cell factories bioproductions10,11. The model development process typically involves structure identifications, formulation derivations, parameter inferences, model verifications and validations11. Abstracting the experimental data in this way is a tedious yet complicated process that requires extensive experience and knowledge of the respective system of interest. This often requires many iterative trial-and-error learning and testing cycles which can essentially take months. Automating this process could drastically reduce the time consumed with minimal manual interventions from users. As different gene circuits might require exclusively different mathematical representations, one of the key challenges in model development is the selection of the appropriate model which typically involves intensive in silico hypothesis or assumption testing. Furthermore, similar or even same genetic circuits could behave differently due to the complications caused by biological and physical variations (e.g., changes in host cells resources, temperatures, media, etc.). Consequently, data from previous databases may no longer be valid and applicable for such cases, which indirectly challenge the reproducibility and transferability of a particular model formulation and its corresponding parameters values. To date, a plethora of computer-aided design (CAD) software or modelling tools is available to facilitate the synthetic biology design and modelling processes12–15. While majority of these modelling tools provide useful functions and interactive GUIs with access to different previous databases, there is still a lack of automated features that could expedite the process of selecting the most appropriate candidate model to describe the transient dynamic profile of a biopart or device. Most of them demand moderate to intensive manual efforts from users. There have been previous studies on devising versatile frameworks for model identification/selection. A data-driven Nonlinear Auto Regressive Moving Average model with eXogenous input (NARMAX) modelling framework, leveraging concepts from the field of control engineering, was proposed and shown to be able to detect parsimonious set of model terms which could describe the behavior of gene circuits16. While the data-driven structure detection method is attractive and particularly useful for higher order systems, a major drawback of the polynomialbased NARMAX approach is the lack of direct physical interpretation of the model parameter, which would limit the understanding in biological context. A recently reported framework which is designed to achieve automated bioparts characterization integrated growth modelling, Bayesian parameter inference, and model selection in the characterization process17. Implemented in the Genetic Engineering of Cells (GES) environment, the framework has been shown to be useful in deriving models which capture the relative steady-state transcriptional activity of the gene circuits in response to input signals, in the form of sigmoidal patterns. However, their approach focused mainly on characterizing the steady-state response of gene circuits, lacking models to describe the transient dynamic activity which is important for real time monitoring and control of the circuits. To address the shortcomings, we aimed to develop a modelling platform (BMSS) to automate the bio-model fitting and selection processes, providing a means to efficiently derive the best model candidate that could capture the transient dynamic profiles of a biopart or device well using characterization data. A library of pre-established mathematical ordinary differential equation (ODE) models with different key features (e.g., delay in activation, inducer degradation, etc.) was developed to describe the genetic circuits expression profiles using

ACS Paragon Plus Environment

Page 2 of 26

Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

assumptions supported by literature. Additional general exhibited characteristics of gene circuits (e.g., basal leakiness, fluorescent protein maturation time, etc.) were incorporated. Our developed platform also delivers secondary quantitative information and insights of the corresponding system of interest apart from the general features (e.g., cellular resources competitions). In addition, we designed the BMSS to derive relative strengths of different constitutive promoters and ribosome binding sites (RBSs) based on the estimated kinetics. To validate and demonstrate the capability of the BMSS in selecting the most appropriate model candidate for a particular gene circuit, we tested the platform using sets of characterization data acquired through the construction of commonly designed gene circuits (with increasing complexity), ranging from inducible promoter systems and constitutive promoter systems to logic gate systems such as NOT, AND, and OR gates. The characterization data is based on commonly used fluorescence measurement using microplate reader. To further verify if the BMSS is able to identify the appropriate model for similar kinetic profiles, the test data also include characterization of the circuits subjected to biological variations or different physical conditions. Our model selection system has demonstrated a good correlation for all the different characterization datasets and managed to identify the optimal model accordingly. Overall, the BMSS offers a context-independent modelling tool which can be easily tuned to accommodate a vast array of characterization data from different gene circuits. RESULTS AND DISCUSSION General Descriptions of BMSS Figure 1 illustrates the architecture of the developed BMSS platform. The selection system begins by assimilating the user inputs including the properly configured characterization data in growth normalized format followed by the respective standard deviations, and a user-defined python file. Pre-processing steps were implemented for proper unit conversions and/or doseresponse pre-fitting to determine the range of values for the parameters whenever necessary. Numerical integration and two-step optimization (global followed by local optimizer) were used to solve the ODEs and iteratively fit models to experimental data through minimizing the computed sum squared residuals. Plotting packages were incorporated for generating result plots and bar charts for data visualizations. To ensure the selection of the most appropriate model without overfitting, the model selection algorithm was built upon the Akaike information criterion (AIC) under certain assumptions with regard to information loss18. AIC rewards goodness of fit which is quantified by the regression sum squared error (SSE) while penalizing increasing number of estimated parameters to discourage overfitting. Different tested pre-established models in the model bank were ranked based on the AIC algorithm and tabulated in an output text file, together with the best fitted model formulation and estimated parameters values. To provide an indication of confidence for the best model selected with respect to the rest of the models in the library, the individual AIC values were rescaled to quantify the difference from the minimum represented by ΔAICi = AICi - AICmin19. The simple rule of thumb in assessing the relative merits of models was applied: models with ΔAICi ≤ 2 have substantial evidence showing no difference from the best approximating model, those models with 4 ≤ ΔAICi ≤ 7 have less support, and models having ΔAICi > 10 are essentially different from the best approximating model with confidence19. The computed ΔAICi values were printed in the output text file. This allows users to better interpret the model selection results in deciding whether other models in the library could also be applicable for their characterization data. The time-response profiles of model simulations in comparison with the experimental data points with error bars, and the derived steady-state behaviors were displayed in graphical figures for visualizations. BMSS also

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

includes the SBOL-compliant gene circuit rendering feature which allows graphics exportations based on the plasmids and parts information provided by users20,21. To enable the process of interoperability and sharing of the developed computational models, the users were given options to export the corresponding model data and SBML representation format in comma-separated values (CSV) and eXtensible Markup Language (XML) files respectively. In the model bank, models of three prominently used genetic systems, with increasing gene circuit complexity, spanning from inducible promoter systems, constitutive promoter systems, to complex logic gates (NOT, AND, and OR gates) were developed and incorporated. BMSS would select the most appropriate model from a library of model formulations for each type of genetic system, based on user inputs and characterization data. Considerable features derived from the kinetic profiles were described distinctively by the library of ODEs models incorporated into the model bank for each genetic system (see Supplementary Figure S3, Table S5-S7). Typically, these features include constant inducer activations, inducers with degradation characteristics, delay activations potentially due to the slow uptake system, inhibitions at higher expressions, and basal leakiness. Apart from incorporating models with commonly observed kinetics, we also included models which take the fluorescent protein (FP) maturation into consideration. FPs are extensively used in parts/circuits characterization to estimate the gene expression dynamics. Nevertheless, nascent FPs have to undergo inherent maturation steps before they can fluoresce, which might significantly impact the accuracy of the interpreted expression profiles in terms of their expression levels and temporal dynamics patterns4. High variation of FP maturation time was also reported even between closely related E.coli strains and was found to be dependent on growth conditions22. A recent study has systematically characterized the maturation kinetics of 50 commonly used FPs using a high-precision time-lapse microscopy in growing cells and reported a maturation half-life duration ranging from 3.8 to 38.2 min for green FPs and 20.8 to 143.4 min for red FPs at 37 °C23. The half-life duration prolonged to the range of 4.2 to 58 min and 30.7 to 258.7 min at 32 °C for green and red FPs respectively23. Despite a highly diverse maturation kinetics was observed, the FPs maturation could often be modelled as a simple firstorder single exponential kinetics with a characteristic half-life duration. In view of the high variation in the half-life durations at diverse conditions, the parameter was introduced as a bounded free parameter with flexibility instead of being kept as a fixed constant during model optimization. This parameter could serve to reflect the sensitivity level of the corresponding expression profile in response to the delay caused by protein maturation kinetics. The contribution of protein maturation time to the overall expression kinetic profile could vary profoundly in different genetic systems which is partly attributed to the dependency of protein expression data alone in deriving the most suitable model candidate. Smaller data size with fewer pieces of information available could lead to an underdetermined system with a decrease in the error degrees of freedom, which was computed as sample/data size minus the number of parameters to be estimated. This might thus impose greater uncertainty in the estimated parameters. Lastly, we have also included single-lumped formulations of gene expression apart from the commonly used double-equation formulations. Typically, a protein expression mechanism is represented by a simple two-step process which is the transcription of genes to mRNAs and the translation of mRNAs into proteins3. These processes are often modelled by two separate ODEs: one controls the promoter-driven mRNA synthesis rate and the mRNA degradation rate, and the other determines the RBS-mediated protein synthesis rate and the cell growthdependent degradation rate/dilution rate3,6,7. This formulation is particularly useful to gain insights into the associated promoter and RBS strengths with known mRNA degradation rate or accessible mRNA experimental measured data. Yet, it is hypothesized that this abstracted

ACS Paragon Plus Environment

Page 4 of 26

Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

two-step formulation could be redundant when the data size is too small compared to the fitted free parameters such as a single constitutive dataset in the absence of mRNA information. Multiple unknown parameter estimations in this case could also result in non-deterministic cross-correlated answers. Thus, we sought to compare the single lumped- and double-equation formulations for different characterized systems to verify our hypotheses. Inducible Promoter Systems Inducible promoters serve as a powerful tool in gene circuit engineering in view of its vast applications in controlling expression of heterologous genes at certain stages of development. These promoter systems usually form the foundational basis for more complicated gene circuit designs. It is thus of great interest to identify the different kinetic profiles existing in such a system and select the most appropriate model formulation to better understand their regulated expression dynamics. As a proof of concept, BMSS was first tested using three routinely used inducible promoter expression systems: (a) pBAD promoter induced by Arabinose; (b) pTet promoter triggered by anhydrotetracycline (aTc); and (c) pLac induced by isopropyl-β-Dthiogalactoside (IPTG). Thirteen different model formulations covering disparate features for the inducible system were developed for selection (Supplementary Figure S3 and Table S5). Intriguingly, the aforementioned three inducible systems were found to exhibit distinct gene expression behaviors that could only be well-captured by three different model representations (Figure 2). Using FPs as a reporter, the gene expression output measured from pBAD promoter induced by arabinose exhibited an average-expression profile which could be recapitulated with a model under an assumption of a constant inducer concentration/inducer activation (Figure 2a-b). Despite the fast activation driven by the fast diffusion of the small molecules aTc, the pTet promoter expression system induced by aTc displayed a drop in expressions after the peak values were achieved at all inducer concentration levels. Nevertheless, it was observed that the fall in expression occurred much earlier at lower inducer concentrations, exhibiting a positive correlation. In this vein, it is thus conjectured that there is fast inducer degradation or deactivation of the transcription activation caused by the inducer binding/unbinding mechanism following a first-order kinetics, which is sensitive to the inducer concentration (Figure 2c-d). On the other hand, the IPTG-induced pLac promoter expression system manifests a great extent of initial delay which is absent in the other two systems. Interestingly, the pLac system characterization data with delay responses was well described by an active transport mechanism as illustrated in Figure 2e-f, which agrees with the natural attribute of the pLac system itself, facilitated by lac-permease24. As an indispensable tool in synthetic biology gene circuit regulations, an efficient and reliable inducible promoter system should fulfil several salient criteria including high sensitivity, a broad dynamic controllable span, and a tight system with little to no basal expression25. However, high basal expression level (leakiness) remains as one of the major shortcomings reported in inducible promoter systems potentially due to its de-repression regulatory mechanisms. Interestingly, the model with basal leakiness feature was not selected based on the ranking, suggesting that the basal leakiness is not pronounced in our existing characterization gene circuits. This could potentially be attributed to the utilization of commercially available plasmids that have been improved and well optimized for enhanced performances. Further, previous studies have observed lower expressions when administrated with higher inducer concentrations after reaching certain thresholds6,26. These phenomena were speculated to be due to the substrates/products toxicity effect imposed on the cells or presumably caused

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

by inhibitory effect from molecular crowding which might have impacted the binding process26,27. We have incorporated a model formulation considering this inhibitory effect at high inducer concentrations, subsequently generating low expression levels, which was well validated by a copper-sensitive inducible system shown in Supplementary Figure S4g. However, for the three inducible systems in Figure 2, the impact of the substrates/product was not significant. It is common for the inducible promoter systems to be used under different physical and biological circumstances, we thus characterized each inducible system experimentally under different conditions and used the data to verify whether BMSS can determine the appropriate model. In addition, more distinctive newly characterized or previously developed inducible systems including quorum-controlled promoter pLasI activated by N-(3-oxododecanoyl)-Lhomoserine lactone (AHL), IPTG-induced GFP expressions, and blue light-inducible promoter in response to light intensities and illumination pulses5 were used to further validate our platform performances as shown in Supplementary Figure S4. The BMSS-validated results (Figure 2 and Supplementary Figure S4) demonstrated the robustness and viability of our system in recapitulating the different experimental kinetic profiles by accurately ranking the models and choosing the optimal model formulations. It was also observed that a better curve fit was achieved for majority of the inducible systems after considering the FP maturation step (Figure 2 and Supplementary Figure S4). A comparison of simulation results demonstrating the impacts before and after incorporating the feature of protein maturation kinetics was conducted and shown in Supplementary Figure S2. The gentle initial increment in expressions for both pBAD and pTet promoter systems were better captured with the added protein maturation feature due to the consequential slight delay responses recorded in expression profiles, with improved regression SSE percentages of 65% (between 2.46e-11 (Supplementary Figure S2a) and 8.71e-12 (Supplementary Figure S2b)) for the pBAD promoter system; and 3% (between 4.27e-11 (Supplementary Figure S2c) and 4.15e11 (Supplementary Figure S2d)) for the pTet promoter system. On top of that, with reference to the maturation half-life measured from a recent study23, we sought to investigate and compare the BMSS model fitting results when leaving the protein maturation rate as a free parameter or upon affixing the parameter following values given by literature. For the tested pBAD-induced RFP expression system and NOT gate-regulated GFP expression at 30 °C, the simulation results when subjected to two parameter settings were comparable in their fitting performances (see Supplementary Figure S7). Interestingly, the fitted parameter values were found to be similar to the literature reported measurements: the best fitted half-life value of 17.24 min (21.9 min from literature) and 2.35 min (4.5 min from literature) for the pBAD inducible system and the NOT gate system respectively. On the other hand, our model selection system also exhibited better double-equation fitting results for all of the inducible expression systems subjected to varying inducer concentrations, which suggests a higher degree of freedom is required to achieve a good fit for the multiple characterization datasets induced by varying inducer concentrations. Constitutive Promoters and RBSs library We next tested BMSS using a library of constitutive promoters and RBSs of varying strengths. Libraries of constitutive promoters and RBSs are routinely used in synthetic biology applications to direct expression of a gene at a constant level and to avoid the use of multiple inducers which could potentially be toxic to the cells8,28. Having models of these parts will be useful in providing quantitative insights into the underlying kinetic rates for better comparison of their relative strengths. This could allow for more precise regulation of the gene expression

ACS Paragon Plus Environment

Page 6 of 26

Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

levels to achieve optimized device performance as demonstrated in one of our recent studies6. Here, we adopted characterization data of constitutive promoters and RBSs of varying strengths from our previous published work and tested on BMSS8,29 (Figure 3 and Supplementary Figure S5). We have designed the BMSS platform to be able to handle single characterization data of single promoter or multiple characterization data of different promoters or RBSs to offer more practical applications. This enables rapid identification of quantitative model for a single part which can be incorporated into more complicated circuits in projected future needs, or allows comparison of multiple parts to determine their relative strengths. To achieve this, ten different model formulations considering single or double equations and with or without FP maturation time for the constitutive promoters were developed (Supplementary Figure S3 and Table S6). The method yielded results showing good agreements with the experimental characterization data for either single dataset which was fitted alone or multiple datasets with different strength combinations. Furthermore, it is interesting to unravel the significance of the FP maturation time especially in constitutive promoter systems. Results showed that temporal expression profiles within the first exponential growth phase (150 min) could be adequately described by the model without maturation kinetics. However, a prolonged expression profile of up to 240 min could only be well-captured after considering the maturation kinetic parameter, as depicted in Figure 3e-f. It is thus postulated that the influence of FP maturation time could be more significant when the actual promoter-driven protein synthesis rate has slowed down as manifested after the exponential growth phase. A moderate rise in accumulation of the fluorescence expression at the near stationary growth phase could be seen. This type of expression profile appeared to be more prominent in constitutive systems than inducible systems. This could presumably be because the constitutive system gene expression occurs when the cells are at a more steady state for a long time course in comparison with the transient induced phase upon the presence of inducers which could dominantly shift the cells away from the steady phase. As a result, the slow accumulation of FP expression caused by maturation half-life thus appeared to be less significant in transient inducible systems compared to the more steady constitutive expression systems. As mentioned earlier, the kinetics of a single biopart composed of a promoter and an RBS can be modelled using a two-equation representation (transcription and translation) or a single equation (lumped protein expression kinetics). We included both representations in the BMSS model bank for different regulatory systems. Single-equation descriptions were selected by BMSS for constitutive promoter systems which corroborates with our hypothesis that doubleequation representations could be redundant when the available data size is too small compared to the number of free parameters to be fitted, which could potentially lead to an underdetermined system. Logic Gates Digital-like logics with switching behaviors provide powerful decision making with deeply layered circuits, which form the elementary building blocks for more complex circuitry30. This switching property also allows us to take in multiple input signals to control the output gene expression in a non-graded manner. The logic gates are often characterized by their transfer function which captures how the steady-state output changes in response to different input states31. However, it is of great importance to not only match the final steady-state outputs but to quantitatively evaluate how the responses change over time. This allows for highly tunable real-time monitoring and control of the switching capability to achieve high-precision device performances with desired expression levels underpinning many future practical

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

applications4,7,32. Here, we devised models to describe logic gate behaviors by exemplifying the simplest one-input inverter NOT gates, two-input AND gates, and OR gates. We demonstrated not only the time-course simulations but also steady-state outputs as a function of inputs depicted in bar charts (Figure 4). NOT Gates (Inverter) We first experimentally characterized a NOT gate system which forms the fundamental logic operation for building more complicated circuitry. This NOT gate system was constructed based on the CRISPR interference (CRISPRi) technology8,33 (Figure 4). Two different gRNAs were designed to specifically target the constitutive promoter J23101 which drives the GFP gene, and the GFP gene coding region respectively in two separate genetic circuit designs. The gRNA expression was regulated by an inducible promoter aTc-driven pTet, while dead Cas9 (dCas9) was expressed constitutively and used to bind tightly to the DNA sequence for repression without cleaving it. Upon the presence of aTc, production of gRNA causes the binding of dCas9 to the corresponding site to repress the GFP expressions and vice versa, which behaves like a NOT gate (state = 1 in the presence of aTc; state = 0 in the absence of aTc). Four different model formulations accounting for single or double equations and with or without FP maturation time for the NOT gate were developed (Supplementary Table S7). To test the platform versatility, we further moved on to characterize the two different gene circuit constructs under two commonly tested cell cultures temperatures (30 °C and 37 °C) to examine their expression kinetic profiles. Higher expression levels were observed at 30 °C with greater repression capacities compared to those recorded at 37 °C (Figure 4 and Supplementary Figure S6). Interestingly, the increasing expression kinetic profile in the absence of inducer was well captured by the double-equation formulation with protein maturation time included as shown in Figure 4a whereas all the other decreasing expression kinetics characterization data (Figure 4b and Supplementary S6) were adequately captured by single-equation formulations even excluding the protein maturation time. It is discernible that the consideration of protein maturation time is more significant when kinetic profiles showing fluctuations with increasing trends after steady expression phases. The better representations by single-equation formulations for one-input two-state NOT gates again aligned with the notion that smaller datasets with smaller degrees of freedom will be well captured by the AIC metric. AND Gates Here, we sought to test BMSS using more complex two-input four-state AND gate. We characterized a previously developed blue light inducible promoter by replacing the constitutive promoter-driving blue light sensitive DNA-binding protein EL222 with the pBAD/arabinose inducible promoter. This representative gene circuit functions like an AND gate, where in the presence of both administrated arabinose inducers and blue light illuminations, the EL222 proteins will be activated and bound to the promoter region, thus recruiting RNA polymerase (RNAP) to initiate the transcription process and express FPs, else no expression should be observed5. In view of the potential expression variations under different E. coli strains, we inserted the gene circuit and cultured in two different E. coli strains (Top10 and BL21). The time-course characterization data at the four different states (00, 01, 10, and 11) were then fed into the BMSS platform to identify the best model candidate. The first state of the two-state AND gate represents the presence (1) or absence (0) of arabinose inducer, whereas the second state denotes the light illumination state (1) or in dark (0). As a case study upon analyzing the kinetic profiles, we observed a non-negligible basal leakiness in

ACS Paragon Plus Environment

Page 8 of 26

Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

this system and thus devised model formulations focusing more on the basal leakiness feature and protein maturation time. The BMSS model bank was developed in such a way that eases the process of model extensions, allowing the bank to serve as a repository of knowledge for the purpose of unravelling the different systems behavioral kinetics. To this end, the BMSS is capable of discerning the relevant features and selecting the most well-described model formulations as demonstrated in Figure 4c and d. OR Gates To study two-input OR gate systems, we constructed a gene circuit with two identical red FP genes regulated separately by two inducible promoters: pBAD/arabinose and pTet/aTc, and characterized the circuit under four different states (00, 01, 10, 11), whereby the first state indicates the presence (1) or absence (0) of arabinose inducers and second state denotes the presence or absence of aTc inducers. As a proof of concept, we cultured the gene circuit in two different E. coli strains (Top10 and BL21) to incorporate the potential biological variations. Finite pools of available cellular resources have been shown to adversely compromise the system performance and robustness of even simple circuits, both in vivo34,35 and in vitro36,37. This is particularly relevant when similar intracellular processes competing for the shared limited resources, occur in parallel. Expression of genes from heterologous gene circuit relies heavily on the transcriptional and translational resources with the major bottlenecks appear to be the availability of RNAPs and ribosomes. In more complex gene circuits exemplified by our logic gate systems, we observed unintended under-expression in the two-input OR gate system when more than one module was activated as shown in Figure 4e-f. The expression profiles could only be reproduced upon adopting a model with assumptions of limited RNAP and ribosome pools. The best-fitted model parameter estimated at about 25% and 15% expression attenuations for OR gate in BL21 strain and Top10 respectively, contributed by the finite ribosome pool when two modules induced concurrently. On top of that, our BMSS platform is also capable of identifying the best-fitted model candidate to recapitulate the individual expression profile for a particular state when only a single signal input is available (upon adding arabinose (state = 10) or aTc (state = 01) alone). As listed in the model ranking table (Supplementary Figures S8c-d), the formulation with the delay response for first input (arabinose) and inducer degradation response for second input (aTc), accompanied by resources competition was chosen as the best fitted model candidate, as demonstrated by the well recapitulations of the experimental kinetic profiles. Cross-platform Implementations To verify the interoperability and functional extensibility of the BMSS platform, the generated SBML files were loaded into other existing model-based tools such as iBioSim and COPASI14,38. These open-source software offer modelling and simulation environments coupling with design or analysis features. Samples of generated SBML files could be read and displayed in both software as schematic diagrams showing the compartment, corresponding species and the rate equations linked between the associated species (Supplementary Figure S11 and S13). We proceeded to execute model simulations using the imported model information in SBML files. The samples time-course simulation results for the few test cases (single and multiple characterization datasets for the constitutive system, and a two-state NOT gate model) implemented in both platforms were demonstrated in Supplementary Figure S12 and S14. For the NOT gate model, the two-state (0 or 1) simulation results were shown separately in two figures as only a single model formulation with a constant state of one was exported from the BMSS platform. The state value can then be adjusted to be zero in the two software to display the corresponding simulation result. Supplemental model interpretation and

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

analysis were also performed in COPASI. The deterministic interpretation in the form of ODEs of the model was depicted in Supplementary Figure S15a. Additional functionalities such as sensitivity analyses were executed successfully to gauge how a selected model variable changes in response to a change of a selected kinetic parameter in both steady-state and timeseries modes (Supplementary Figure S15b-c). For the test case of multiple J23101-driven varied RBSs model, mRNA synthesis rate was identified to be the most sensitive parameter which is justifiable in view of the role of that parameter in determining the expression levels of all of the four sets of characterization results. DISCUSSION The model development process has been dominated by manual iterative trial-and-error efforts, which are highly dependent on experience and knowledge of the corresponding system of interests. Repetitively abstracting a model with imposed assumptions to describe the various trends displayed in experimental recordings could be a tedious yet complicated process. Deriving from the basis of kinetic modelling framework based on law of mass action kinetics, BMSS platform automates the kinetic model fitting and selection processes which profoundly minimizes the time consumed on model hypotheses testing and the error-prone risk. This system eliminates the need to start from scratch and requires little to no relevant expertise. The libraries of model formulations available in the BMSS model bank serve as a knowledge base, which encompasses intuitive and/or unintuitive characteristics extracted from a vast array of characterization data for different regulatory systems featured under different environmental and biological variations. In our system validations, we have managed to achieve a good fit for 13 datasets of inducible promoter systems across four different chemical inducers, one blue light-sensitive system against two optical settings, and one metal-activated system, adopted from different E.coli strains and cultures temperatures. Thirteen different models were formulated deriving from the different feature combinatorial configurations to be composed into the inducible system model library. On the other hand, for constitutive systems, fifteen different constitutive promoters coupled with rbspBb, and eight J23101-driven varied RBSs expression data were validated and showing good agreements with experimental data with the available 10 model formulations. Moreover, a good correlation was also accomplished for the validated eight total datasets for logic gates system: four NOT gates systems; and two for each of the AND, and OR gates systems respectively. A total of 21 model formulations are accessible in the logic gates model library: four for NOT gates; six for AND gates; and 11 for OR gates. Collectively, all these results have verified the robustness of our model selection system which is applicable to a broad context under different variable factors. We also compared the applicability of one-step or two-step formulations for describing the gene expressions in the different systems and demonstrated better two-step equation fitting results for majority of the inducible expression systems under varying inducer concentrations (Figure 2), yet better single-equation models were observed for constitutive expression systems (Figure 3), which is conformable to our hypotheses. Gene expression is highly plastic which enables microbes to acclimate to the constantly changing environments. However, this plasticity feature could adversely affect the reproducibility of biological system to a great extent. To ensure the viability of our automated platform in a vast array of contexts, it is pre-eminent to consider these variations in the existing system. Several key common variable factors such as strains (MG1655, Top10, BL21, 10β), temperatures (30 °C, 37 °C), plasmids/origins (pBbE2K, pBbE6K, pBbE8K, pBbA6C, pSB1A3; CoIE1, p15A), culture media (M9 minimal salts-0.4% glucose, lysogeny broth LB),

ACS Paragon Plus Environment

Page 10 of 26

Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

system regulations (chemicals, optical, CRISPR/dCas9, logic gates), and design constructs (promoters or RBSs) were accounted. Our automated platform was able to achieve a considerable good fit for all of the different expression profiles under different biological and physical operating circumstances. As synthetic biology continues to grow by leveraging more applications of engineering principles to biology, same sorts of strategies in engineering discipline in dealing with problems have gained tremendous interest such as standardization, abstraction, modularity, and automation12. These strategies are widely adopted and implemented in different phases of the design-build-test cycles. Adoption of the recent advance in technology, in particular computing and laboratory automation, have addressed the needs of parallel, high-throughput computational and laboratory techniques. The emergence of automated synthetic biology foundry could revolutionize the next generation synthetic biology design-build-test-learn cycle39,40. This is an interdisciplinary platform integrating biology with sophisticated tools from software and hardware systems to enable an innovative and scalable technology to achieve robust biomanufacturing at the shortest time with the lowest cost. This advancement has prompted the need of an equivalent accessible computer-based approaches for fast processing and analysis of the voluminous data generated. Automated modelling platforms could serve as one of the powerful tools to facilitate the whole cycle at distinctive phases with less efforts either through forward or reverse engineering approaches. The developed BMSS could complement the existing design modelling tools by offering an automated model fitting and selection alternative to fully harness the power of quantitative models in a context-independent manner. A key activity in synthetic biology is the characterization of bioparts to facilitate the utilization and manipulation in applications. Extensive efforts have been done to ease the effective transferability between part repositories and between repositories and BioCAD applications21,41–43. Datasheets representation is one key channel in the transfer process which enables the incorporation of detailed descriptions, spanning from parts identity, qualitative and quantitative information, to the contextual experimental details44–46. To this end, BMSS could extend the functionality of the data model system in generating biopart datasheets by supplementing more associated quantitative information (model formulations, kinetic parameters, and data features) to be reported and transferred for a particular biopart or device. Apart from generating the SBML files and the SBOL-compliant gene circuit visual graphics, more tool-independent structured formats such as the Simulation Experiment Description Markup Language (SED-ML), SBOL, and the complete Computational Modelling in Biological Network (COMBINE) Archive that includes all of the above-mentioned files could be incorporated into the BMSS platform to ensure the standardization for information exchange and reproducibility. On the other hand, there is also a need to standardize the unit of reported fluorescent readings in view of the huge discrepancy from multiple variable factors across laboratories (lab measurement tools or equipment, experimental protocols, biological factors, human errors etc.). To address this issue, one of the technique used could be having a calibration curve for a particular measurement equipment such as microplate reader to relate the recorded fluorescence intensity to the known protein concentrations29,47. This allows us to estimate the absolute protein concentrations from the measured fluorescence levels, for which is conducted in this study. Another proposed technique would be the use of relative rather than absolute quantities which depends on the reference activity17,48,49. The ratiometric measurement computes the ratio of the parts/devices activity to the reference activity, increasing the robustness to external variations or from instrumentations17. Standardizing the unit for fluorescent readings underpins a proper streamline of synthetic biology community and development.

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To accommodate the exponential growth of characterized data, the future of this system could entail more adaptive and additive learning algorithms by adopting feature-based brute-force search approach which can systematically enumerating all possible features and scanning if each possible candidate satisfies the given data profiles. Practically, the feature-based models could also be evaluated in similar ways as nested models, whereby the simplest test such as the likelihood ratio test could be performed to compare two nested models to determine if the improvement contributed by adding the new feature is significant50. On the other hand, machine learning techniques could be an alternative in identifying and classifying data patterns when more data become available in place of conscious manipulations by users. It is also desirable to integrate with parallel computing which could drastically reduce the computation time in view of the mutual exclusive nature of the different models. Despite our system forming the basis for more complex expression profiles, more alternatives in ODE numerical solvers and optimization tools may be required to embrace more different expression patterns including pulses or oscillatory behaviors. Continuous updating of models for existing genetic system type or incorporating models for new types of genetic systems in particular more complex circuits is greatly favourable to unleash the full capability of modelling tools in facilitating experimental designs. Future models in BMSS can include cell growth modelling and cellular resources. In a nutshell, we have developed an automated system which enables users to automatically fit many models to their measured experimental recordings with minimal inputs and suggests the best candidate from a list of model rankings. The system also forms a foundation to capture knowledge using models. This model selection system could eventually serve as a prescreening platform to be coupled with human interpretations and other existing computer-aided analysis tools to expedite the model development and experimental design optimization processes. METHODS Plasmids construction All plasmids were designed using Benchling web-based design and sequence analysis platform (Benchling, Inc. San Francisco, CA). Gene fragments and primers were synthesized from Integrated DNA Technologies (IDT). Plasmid pBbE2K (JBEI Part ID: JPUB 000090, colE1 ori, Kanr), pBbE6K (JBEI Part ID: JPUB 000054, colE1 ori, Kanr), pBbE8K (JBEI Part ID: JPUB 000036, colE1 ori, Kanr), pBbA2C (JBEI Part ID: JPUB_000092, p15A ori, Cmr), pBbA6C (JBEI Part ID: JPUB 000056, p15A ori, Cmr), and pBbA8C (JBEI Part ID: JPUB 000038, p15A ori, Cmr) were used as the backbones in all constructions whenever necessary. Double terminator BBa_B0015 consisted of BBa_B0010 and BBa_B0012 was used as gene transcription terminator in all cases. The NOT gate plasmid consisted of dual plasmid system: i) anhydrotetracycline (aTc) inducible GFP targeting gRNA expression plasmid pBbE2K-pTet-gRNAGFP-BBa1005, ii) constitutive dCas9 and GFP expression plasmids GabDP2-rbs34-dCas9-J23101-rbs34-GFP, with the promoter gene sequences retrieved from our previous work8. The gRNAs in this study were designed using the Benchling CRISPR tool. The tool uses two models developed by Doench et al. (2014) and Hsu et al. (2013) to predict CRISPR on-targeting and off-targeting efficiency respectively51,52. Using the tool, we selected two target locations: i) GFP promoter J23101 region (5’-AGACAGCTAGCATAATACCT), and ii) starting region of the GFP CDS (5’-GTAGTGACAAGTGTTGGCCA), since the two sites have relatively high on-targeting and low off-targeting efficiencies. The AND gate plasmid was synthesized using gBlocks of

ACS Paragon Plus Environment

Page 12 of 26

Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

EL222 (Gene ID: 3868928) and PCR amplified nucleotide fragment of blue light inducible promoter pBLind (5’GGTAGCCTTTAGTCCATGTTAGCGAAGAAAATGGTTTGTTATAGTCGAATAAA)rbs34-RFP-BBa_B0015 from the plasmid pEBLind5 and assembled into the plasmid backbone of pBbA8C to give pBbA8C-AraC-pBAD-rbs34-EL222-pBLind-rbs34-RFP-BBa_B0015. The OR gate plasmid was synthesized using the PCR amplified fragment of TetR-pTet-rbspBbRFP from the plasmid pBbE2K and assembled into the plasmid pBbE8K to give plasmid pBbE8K-PBAD-rbspBb-RFP-BBa_B0015-pTet-rbspBb-RFP-BBa_B0015. All polymerase chain reaction (PCR) products were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs, MA, USA) according to the manufacturer’s protocols. PCR products were analysed by gel electrophoresis using 1% agarose gel and purified using QIAquick gel extraction kit (Qiagen, Hilden, Germany). The DNA concentrations of the gelpurified samples were quantified with NanodropTM (Thermo Fisher Scientific, MA, USA). Gibson assembly was subsequently performed using the NEBuilder HiFi DNA assembly (New England Biolabs, MA, USA) following the manufacturer’s protocols53. The constructed plasmids were chemically transformed into the respective E. coli strains. Colonies grown on the LB-antibiotic plate were picked and inoculated into fresh LB-antibiotic medium to prepare overnight culture for plasmids extraction using QIAprep Spin Miniprep kit (Qiagen, Hilden, Germany). The plasmids were sent for DNA sequencing (1st BASE, Singapore). The sequencing results were analysed using Benchling. Growth and Characterization conditions Escherichia coli TOP10 (Invitrogen), BL21 (Invitrogen), MG1655 (Invitrogen) and DH10β (New England Biolabs) strains were used for cloning and characterization as specified. All chemicals were purchased from Sigma Aldrich (Sigma, MO, USA), unless stated otherwise. All characterization experiments were carried out in Miller’s LB medium or M9 minimal medium (1X) supplemented with 1 mM MgSO4, 1 mM CaCl2, 0.2% casamino acid and 20 mM glucose. In brief, seed cultures were grown overnight in 5 mL LB medium supplemented with Kanamycin (Km) (50 μg/mL) and/or Chloramphenicol (Cm) (25 μg/mL) when required, at 37 °C with a shaking speed of 225 rpm. Glycerol stocks were prepared from mixing 500 µL overnight culture and 500 µL sterilized 100% glycerol. To run characterization experiments, 50 µL overnight culture was added to 5 mL fresh LB/ M9 antibiotic medium and grown for 2 hr at 37 °C with 225 rpm shaking. For characterizations of NOT and OR gates, 300 µL cell cultures at starting optical density (OD) of 0.1 containing corresponding chemical inducers was added to each well of flat bottom 96-well microplate (NuncTM, Thermo Scientific). For characterization of AND gate, 1 mL of cell culture at starting OD of 0.1 containing various arabinose inducer concentrations was added into each well of the flat bottom 12-well microplate (NuncTM, Thermo Scientific). In cases when blue light illumination was used in the light-sensitive system, the microplate was mounted on top of a device with 3 X 4 LED blue light (465 nm) panel as constructed by Jayaraman et al. (2016)5 or kept in dark with aluminium foil wrapping. The 12-well plates were incubated at 120 rpm in a mini shaker incubator (NB205, N-BIOTEK). Time series optical density (OD600nm) and fluorescence (GFP: excitation 485 nm, emission 528 nm; and RFP: excitation 535 nm, emission 600 nm) were obtained at 10 min (NOT and OR gates) or 1 h interval (AND gate) between kinetic reads using Synergy (H1) Hybrid Multi-Mode Reader (BioTek). The microplate reader was configured at temperature of 30 °C or 37 °C, and subjected to slow double orbital shaking speed of 282 rpm in the continuous reading settings. Negative control was included in each experiment whereby the autofluorescence reading (GFP or RFP) of the E. coli strain lacking the plasmid was recorded. The fluorescence/OD600nm reading was calculated as fluorescence of (sample – negative control)/OD600nm at each time point. All samples were prepared in triplicates; the average and

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

standard deviation (S.D.) of the technical triplicate sample readings were determined and plotted on the graphs. Data were represented as mean ± S.D. (n=3) Installations The latest python 3.7 under Anaconda, a free and open-source distribution is required to provide the necessary packages to run the BMSS platform. Python 3.6.5 environment can then be installed and set up using package management system conda to ensure that it is compatible with the versions supported by SimpleSBML and tesbml. The tesbml is libSBML python API available in Python Package Index (PyPI) used to write and manipulate the Systems Biology Markup Language (SBML), whereas SimpleSBML package allows one to construct biological models in SBML format without direct interacting with the libSBML package which involves much complicated command lines. A modified SimpleSBML to accommodate our system utilities has been incorporated in the given code files. The tesbml package is required to be installed using pip package management system, along with the tabulate library for pretty print of tabular data.

Python Tools The system was written in Python 3 as the scripting language with its well-established scientific computing packages, which is open source and freely accessible. The “pandas” library, a powerful data manipulation and analysis tool in python, was used to read the user input in comma-separated values (csv) file, the part/circuit characterization data. All model kinetics were described by systems of ordinary differential equations (ODEs) and these initial value problems were solved by numerical integration solvers from scipy.integrate class. A global optimization using differential evolution approach from scipy.optimize package was implemented followed by the constrained Nelder-Mead local optimizer (https://github.com/alexblaessle/constrNMPy) for model fitting parameter estimations. Experimental data and simulation results were displayed using visualization tool, matplotlib.pyplot module with a MATLAB-like interface. The system outputs were tabulated using tabulate library and ranked using scipy.stats module. Input Data Curation The raw characterization data were partly obtained from measured experiments conducted in this study and the others were retrieved from our previous published data8,29 and iGEM part Registry (http://parts.igem.org). The fluorescence data were normalized to the cell growth OD600 after deducting each of the technical triplicate readings from the negative control cultures at that particular time point. The average and standard deviation (S.D.) were calculated. The computed values (Fluorescence/OD) were saved in a csv file initialized by Fluorescence/OD data in descending inducer concentration order for inducible systems followed by the corresponding S.D. values in the same order. For logic gate systems, the input data should be in the sequence of 0, 1 for NOT gates, and 00, 01, 10, 11 for AND, and OR gates. Full System Workflow The BMSS takes in user-input characterization data with user-defined python commands as shown below for different systems: 1. For Inducible System, (a) Insert Input_filename ended with .csv extension

ACS Paragon Plus Environment

Page 14 of 26

Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

(b) Insert NumDataSet to specify the total column of Fluorescence/OD data excluding S.D. (c) Specify Inducer_unit in either 10**-6, 10**-9, ‘%’, or ‘dimensionless’. The first two are in unit Molar; the third refers to the unit in mass concentration represented by g per 100 mL; and the last one is used to represent any arbitrary units. (d) Indicate True or False for OptInhibition to specify if there is inhibitory effect at high inducer concentrations. 2. For Single or Multiple Constitutive System, (a) Insert Input_filename ended with .csv extension (b) Insert NumDataSet to specify the total column of Fluorescence/OD data factoring out S.D. 3. For Logic Gates System (a) Insert Input_filename ended with .csv extension (b) Indicate NumState to specify the total number of states (i.e. 2 for NOT gate, and 4 for AND, and OR gates) The input-data were read by the system data reader for data extraction into different components and unit conversions. In view of the profound order of magnitude differences in the dose response for different inducible promoter systems, additional dose-response pre-fitting step was implemented to identify the dose-response parameters which subsequently determine the parameter ranges specified in the subsequent global optimization step. In the core system, the different models in the form of ODEs from a particular system, were accessed from the model bank, and being solved by the integration solver. The sum of the corresponding timeresponse residuals and the weighted steady-state responses against experimental data points were then fed into a two-step optimization process: global optimizer differential evolution algorithm followed by the constrained Nelder-Mead local optimizer. The minima estimated from global optimizer were fed into the local optimizer as initial guesses automatically. Default hyperparameters from both optimizers were used for implementations. The aforementioned two processes could be set to run iteratively by specifying the number of iterations in the main python script. All the simulation results obtained in this study were achieved in less than 15 iterations, with majority of the runs were attained in less than 5 iterations. The default setting for the platform would be running through all models in that particular system. However, users can selectively choose the models of interest to be implemented by commenting out the unnecessary models. Model selection algorithm based on AIC was then executed to rank the models according to the two key criteria: goodness of fit and model complexity54. To better interpret the computed AIC results and provide an indication of the confidence for the selected model, the relative merits of the models were calculated in comparison to the selected best model with the lowest AIC. The individual strength of evidence was also indicated, which is imperative to justify how confident is the best model compared to the next best models. The results for best model candidate in time-response and steady-state response fashions were exhibited in figure format concertedly with experimental data. The steady-state response for inducible system was displayed as dose-response behavior over increasing inducer concentrations, whereas a bar chart was shown to demonstrate the steady-state result under different logic states. The tabulated model ranking result showing the computed SSEs and AICs, the relative merits of the models and the chosen best model formulation with its corresponding best fitted parameters were recorded and listed in an output text file as exemplified in Supplementary Figure S9. The quantified SSEs for all the model simulations in comparison with experimental data points were also demonstrated in a separate Supplementary Table S4 for references. Users are given the option to choose whether to export the model simulation data file in csv format.

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To enable the process of interoperability and sharing of the developed computational models, the BMSS system also exported the best candidate model in SBML standard, which is a representation format based on XML, using SimpleSBML python package with its dependency on libSBML43,55,56. A sample of an output SBML file was demonstrated in Supplementary Figure S10. Apart from that, our system also incorporated DNAplotlib20, which is a computational visualization toolkit, to support the graphics rendering of genetic circuits from the different regulatory systems, conforming to the SBOL genetic design standard21,57. This allows the users to program customized gene circuits by inserting the particulars of plasmids and parts. Examples of various potential circuit configurations for logic gate systems are available in Supplementary Figure S1. For advance users who wish to append their new models into the model bank, they can refer to the model function presented in the particular system library and modify accordingly. If new types of parameters are to be defined, the users have to specify the respective upper and lower bounds as an input for the global and local optimizers, else the constrained bounds shall follow those defined as default in our system, as provided in the Supplementary Table S8-10. AUTHOR INFORMATION Corresponding Author *E-mail: [email protected] Author Contribution J.W.Y. and C.L.P. conceived the project. J.W.Y. and K.B.I.N. constructed the models, developed the system, and analysed the data. A.Y.T., J.Z., W.K.D.C. designed, constructed and performed the characterization experiments. J.W.Y., A.Y.T., J.Z., and W.K.D.C. wrote the manuscript with inputs from C.L.P. All authors commented and approved the manuscript. Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS The authors like to acknowledge the support from NUS startup grant and MOE AcRF Tier 1 grant. SUPPLEMENTARY INFORMATION Details of the different characterization data specifications, the SBOL-compliant gene circuit configurations for logic gate models, more BMSS-validated results for the three systems, result comparisons, cross-platform implementations, model mathematical formulations and the setting of corresponding parameters ranges are provided. REFERENCES 1.

Elowitz, M. B., and Leibler, S. A synthetic oscillatory network of transcriptional

ACS Paragon Plus Environment

Page 16 of 26

Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

regulators. (1999). Nature 403, 335–338 2.

Gardner, T. S., Cantor, C. R., and Collins, J. J. Construction of a genetic toggle switch in Escherichia coli. (2000). Nature 403, 339–342

3.

Baldwin, Geoff, Robert Dickinson, and R. I. K. Modelling Synthetic Biology Systems. in (Imperial College Press, 2015) Synthetic Biology-A Primer 89–108

4.

Olson, E. J., Hartsough, L. A., Landry, B. P., Shroff, R., and Tabor, J. J. Characterizing bacterial gene circuit dynamics with optically programmed gene expression signals. (2014). Nat. Methods 11, 449

5.

Jayaraman, P. et al. Blue light-mediated transcriptional activation and repression of gene expression in bacteria. (2016). Nucleic Acids Res. 44, 6994–7005

6.

Jayaraman, P. et al. Cell-Free Optogenetic Gene Expression System. (2018). ACS Synth. Biol. 7, 986–994

7.

Jayaraman, P., Yeoh, J. W., Zhang, J., and Poh, C. L. Programming dynamic control of bacterial gene expression with the chimeric ligand and light-based promoter system. (2018). ACS Synth. Biol. doi:10.1021/acssynbio.8b00280

8.

Holowko, M. B., Wang, H., Jayaraman, P., and Poh, C. L. Biosensing Vibrio cholerae with Genetically Engineered Escherichia coli. (2016). ACS Synth. Biol. 5, 1275–1283

9.

Jayaraman, P., Holowko, M. B., Yeoh, J. W., Lim, S., and Poh, C. L. Repurposing a Two-Component System-Based Biosensor for the Killing of Vibrio cholerae. (2017). ACS Synth. Biol. 6, 1403–1415

10.

Parolini, N. A Model for Cell Growth in Batch Bioreactors. (2010)

11.

Almquist, J., Cvijovic, M., Hatzimanikatis, V., and Nielsen, J. Kinetic models in industrial biotechnology – Improving cell factory performance. (2014). Metab. Eng. 24, 38–60

12.

Myers, C. J. et al. A standard-enabled workflow for synthetic biology. (2017). Biochem. Soc. Trans. 45, 793–803

13.

Misirli, G. et al. A Computational Workflow for the Automated Generation of Models of Genetic Designs. (2018). ACS Synth. Biol. doi:10.1021/acssynbio.7b00459

14.

Watanabe, L. et al. iBioSim 3 : A Tool for Model-Based Genetic Circuit Design. (2018). ACS Synth. Biol. doi:10.1021/acssynbio.8b00078

15.

Nielsen, A. A. K. et al. Genetic circuit design automation. (2016). Science (80-. ). 352, aac7341

16.

Krishnanathan, K., Anderson, S. R., Billings, S. A., and Kadirkamanathan, V. A DataDriven Framework for Identifying Nonlinear Dynamic Models of Genetic Parts. (2012). ACS Synth. Biol. 1, 375–384

17.

Yordanov, B. et al. A computational method for automated characterization of genetic components. (2014). ACS Synth. Biol. 3, 578–588

18.

Burnham, K. P., and David, R. A. Model selection and multimodel inference: a practical information-theoretic approach. (Springer Science & Business Media, 2003)

19.

Burnham, K. P., and Anderson, D. R. Multimodel Inference Understanding AIC and

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

BIC in Model Selection. (2004). Sociol. Methods Res. 33, 261–304 20.

Der, B. S. et al. DNAplotlib: Programmable Visualization of Genetic Designs and Associated Data. (2017). ACS Synth. Biol. 6, 1115–1119

21.

Galdzicki, M. et al. The Synthetic Biology Open Language ( SBOL ) provides a community standard for communicating designs in synthetic biology. (2014). Nat. Biotechnol. 32, 545–550

22.

Hebisch, E., Knebel, J., Landsberg, J., Frey, E., and Leisner, M. High Variation of Fluorescence Protein Maturation Times in Closely Related Escherichia coli Strains. (2013). PLoS One 8, e75991

23.

Balleza, E., Kim, J. M., and Cluzel, P. Systematic characterization of maturation time of fluorescent proteins in living cells. (2018). Nat. Methods 15, 47

24.

Fernández-castané, A., Vine, C. E., Caminal, G., and López-santín, J. Evidencing the role of lactose permease in IPTG uptake by Escherichia coli in fed-batch high cell density cultures. (2012). J. Biotechnol. 157, 391–398

25.

Zhang, Y. et al. Development and Application of an Arabinose-Inducible Expression System by Facilitating Inducer Uptake in Corynebacterium glutamicum. (2012). Appl. Environ. Microbiol. 78, 5831–5838

26.

Siegele;, D. A., and Hu, J. C. Gene expression from plasmids containing the araBAD promoter at subsaturating inducer concentrations represents mixed populations. (1997). Proc. Natl. Acad. Sci. USA 94, 8168–8172

27.

Spruijt, E., Sokolova, E., and Huck, W. T. S. Complexity of molecular crowding in cellfree enzymatic reaction networks. (2014). Nat. Nanotechnol. 9, 406–407

28.

Fabio, M. R., and Helen, M. B. Recent advances in inducible gene expression systems. (1998). Curr. Opin. Biotechnol. 9, 451–456

29.

Wang, H., Ling, M. H. T., Chua, T. K., and Poh, C. L. Two cellular resource-based models linking growth and parts characteristics aids the study and optimisation of synthetic gene circuits. (2017). Eng. Biol. 1, 30–39

30.

Bradley, R. W., Buck, M., and Wang, B. Recognizing and engineering digital-like logic gates and switches in gene regulatory networks. (2016). Curr. Opin. Microbiol. 33, 74– 82

31.

Brophy, J. A. N., and Voigt, C. A. Principles of Genetic Circuit Design. (2014). Nat. Methods 11, 508–520

32.

Moser, F. et al. Dynamic control of endogenous metabolism with combinatorial logic circuits. (2018). Mol. Syst. Biol. 14, e8605

33.

Nielsen, A. A., and Voigt, C. A. Multi-input CRISPR/Cas genetic circuits that interface host regulatory networks. (2014). Mol. Syst. Biol. 10, 763–763

34.

Liu, Y., Beyer, A., and Aebersold, R. Review On the Dependency of Cellular Protein Levels on mRNA Abundance. (2016). Cell 165, 535–550

35.

Qian, Y., Huang, H., and Vecchio, D. Del. Resource Competition Shapes the Response of Genetic Circuits. (2017). ACS Synth. Biol. 6, 1263–1272

ACS Paragon Plus Environment

Page 18 of 26

Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

36.

Breadboard, C. et al. Gene Circuit Performance Characterization and Resource Usage in a Cell-Free “ Breadboard ”. (2014). ACS Synth. Biol. 3, 416–425

37.

Chizzolini, F. et al. Cell-Free Translation Is More Variable than Transcription. (2017). ACS Synth. Biol. 6, 638–647

38.

Bergmann, F. T. et al. COPASI and its applications in biotechnology. (2017). J. Biotechnol. 261, 215–220

39.

Zhirnov, V. 2018 Semiconductor Synthetic Biology Roadmap. (2018)

40.

Ran, C., Yongbo, Y., and Huimin, Z. Building biological foundries for next-generation synthetic biology. (2015). Sci. China Life Sci. 58, 658–665

41.

Ham, T. S. et al. Design, implementation and practice of JBEI-ICE: An open source biological part registry platform and tools. (2012). Nucleic Acids Res. 40, 1–8

42.

McLaughlin, J. A. et al. SynBioHub: A Standards-Enabled Design Repository for Synthetic Biology. (2018). ACS Synth. Biol. 7, 682–688

43.

Hucka, M., Sauro, H. M., Doyle, J., and Kitano, H. The Systems Biology Markup Language (SBML): a medium for representation and exchange of. (2003). Bioinformatics 19, 524–531

44.

Appleton, E. et al. Owl: Electronic datasheet generator. (2014). ACS Synth. Biol. 3, 966– 968

45.

Canton, B., Labno, A., and Endy, D. Refinement and standardization of synthetic biological parts and devices. (2008). Nat. Biotechnol. 26, 787–793

46.

Iñaki Sainz de Murieta 1, Matthieu Bultelle, R. I. K. Data model for biopart datasheets. (2018). Eng. Biol. 2, 7–18

47.

Gaigalas, A. K., Li, L., Henderson, O., Vogt, R., and Barr, J. The Development of Fluorescence Intensity Standards. (2001). J. Res. Natl. Inst. Stand. Technol. 106, 381– 389

48.

Kelly, J. R. et al. Measuring the activity of BioBrick promoters using an in vivo reference standard. (2009). J. Biol. Eng. 3, 1–13

49.

Brown, J. R. A design framework for self-organised Turing patterns in microbial populations. (2013)

50.

Kirk, P., Thorne, T., and Stumpf, M. P. H. Model selection in systems and synthetic biology. (2013). Curr. Opin. Biotechnol. 24, 767–774

51.

Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9 – mediated gene inactivation. (2014). Nat. Biotechnol. 32, 1262

52.

Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. (2013). Nat. Biotechnol. 31, 827

53.

Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. (2009). Nat. Methods 6, 343

54.

Burnham, K. P., and Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. (Springer Science & Business Media, 2003)

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

55.

Cannistra, C., Medley, K., and Sauro, H. M. SimpleSBML : A Python package for creating and editing SBML models. (2015). bioRxiv 2–6

56.

J.Bornstein;, B., M. Keating, S., A, J., and Michael, H. LibSBML: An API Library for SBML. (2008). Bioinformatics 24, 880–881

57.

Roehner, N. et al. Sharing Structure and Function in Biological Design with SBOL 2.0. (2016). ACS Synth. Biol. 5, 498–506

GRAPHICAL ABSTRACT (TOC):

FIGURE 1-4:

Figure 1. A schematic diagram depicting the full workflow of the system algorithm. The BMSS was written in Python 3 as the scripting language which is open source and freely accessible. Numerical integration, optimization, and plotting packages were used to solve the ordinary differential equations, iteratively fit models to experimental data through minimizing the sum squared residuals and plot the graphical results for data visualizations. The model selection algorithm was based on the AIC statistical inference criterion. The output figures

ACS Paragon Plus Environment

Page 20 of 26

Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

displayed the time-response and steady-state response behaviors in conjunction with the imported measured experimental data and their corresponding standard deviations represented in error bars. User-defined customizable SBOL-compliant gene circuit graphics rendering was also exhibited by the BMSS platform. The details of the model ranking and the best model candidate were listed in the output text file with the model simulation data exported in a separate csv file. The SBML representation of the chosen model was also programmed and exported in XML format.

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: BMSS-validated results for three key different inducible promoter systems subjected to single physical or biological variations. The columns demonstrate the pre-fitting dose response to identify the parameters ranges, temporal expression profiles, the derived doseresponse behaviors, and the BMSS-selected model candidate together with the SBOL-

ACS Paragon Plus Environment

Page 22 of 26

Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

compliant gene circuit configuration. Experimental data are denoted as filled circles with error bars; and model simulations are indicated as solid lines. All data points are represented in mean ± S.D. (n=3) (a-b) Arabinose-induced pBAD promoter systems using two distinct ribosome binding sites (RBSs): rbspBb and rbs34. (c-d) aTc-induced pTet promoter systems cultured in M9 with 0.4% glucose or in LB medium (e-f) IPTG-triggered pLac promoter systems introduced in different plasmids: pBbE6K and pBbA6C.

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: BMSS-validated results for constitutive promoter systems. The columns illustrate the temporal expression profiles, and the selected model and relative strengths for multiple constitutive datasets. Experimental data are denoted as filled circles with error bars; and model simulations are indicated as solid lines. All data points are represented in mean ± S.D. (n=3) (a-b) Different single constitutive promoters expressed with rbspBb (c-d) Multiple varying constitutive promoter data with the same RBS. The relative strengths of the different promoters were computed and listed in the green tables by comparing their synthesis rates (ef) J23101-driven rbs34 expression system at the exponential phase in comparison with the full time span result (g-h) Multiple J23101-driven varied RBSs system validation results with their respective relative strengths calculated from their protein synthesis rates shown in the green tables.

ACS Paragon Plus Environment

Page 24 of 26

Page 25 of 26

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: BMSS-validated results for logic gate systems. The columns depict the temporal expression profiles, the steady-state bar charts, and the respective SBOL-compliant gene circuits and the selected model. Experimental data are indicated as filled circles with error bars; and model simulations are denoted as solid lines. All data points are presented in mean ± S.D. (n=3) (a-b) NOT gates exemplified using CRISPR/dCas9 through the gRNA sequence regulated by pTet-aTc promoter (0- no aTc added; 1- presence of aTc) at different temperatures (30 °C and 37 °C) (c-d) AND gates demonstrated with blue light-inducible promoter, controlled by pBAD-regulated blue light-sensitive EL222 proteins, expressed in two different E.coli strains (Top10 and BL21). The first state denotes the presence of blue light (1) or in dark (0), whereas second state indicates the presence (1) or absence (0) of the arabinose inducer (e-f) OR logic gates expressed in Top10 and BL21 strains where the first input represents the presence of arabinose inducers and the second input indicates the presence of aTc inducers.

ACS Paragon Plus Environment

Page 26 of 26