Improving Metabolic Pathway Efficiency by Statistical Model-Based

Aug 4, 2016 - Improving Metabolic Pathway Efficiency by Statistical Model-Based Multivariate Regulatory Metabolic Engineering. Peng Xu†§, Elizabeth...
0 downloads 10 Views 2MB Size
Subscriber access provided by Northern Illinois University

Article

Improving metabolic pathway efficiency by statistical model based multivariate regulatory metabolic engineering (MRME) Peng Xu, Elizabeth Anne Rizzoni, Se-Yeong Sul, and Gregory Stephanopoulos ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00187 • Publication Date (Web): 04 Aug 2016 Downloaded from http://pubs.acs.org on August 5, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Improving metabolic pathway efficiency by statistical model based multivariate regulatory metabolic engineering (MRME)

Peng Xu1‡, Elizabeth Anne Rizzoni2, Se-Yeong Sul1, Gregory Stephanopoulos1*

1

Department

of

Chemical

engineering,

Massachusetts

Institute

of

Technology,

77

Massachusetts Ave, Cambridge, MA 02139 2

Department of Chemistry, Wellesley College, 106 Central St, Wellesley, MA 02481



Present Address: Department of Chemical, Biochemical and Environmental Engineering,

University of Maryland, Baltimore County, 1000 Hilltop Cir, Baltimore, MD 21250

*Corresponding author: Professor Gregory Stephanopoulos, Email: [email protected], Phone: (617)-253-4583, Fax: (617)-258-6876

Contact information for all authors: Peng Xu, [email protected] Elizabeth Rizzoni, [email protected] Se-Yeong Sul, [email protected] Gregory Stephanopoulos, [email protected]

1 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Metabolic engineering entails target modification of cell metabolism to maximize the production of a specific compound. To empower combinatorial optimization in strain engineering, tools and algorithms are needed to efficiently sample the multi-dimensional gene expression space and locate the desirable overproduction phenotype. We addressed this challenge by employing design of experiment (DoE) models to quantitatively correlate gene expression with strain performance. By fractionally sampling the gene expression landscape, we statistically screened the dominant enzyme targets that determine metabolic pathway efficiency. An empirical quadratic regression model was subsequently used to identify the optimal gene expression patterns of the investigated pathway. As a proof of concept, our approach yielded the natural product violacein production at 525.4 mg/L in shake flasks, a 3.2-fold increase from the baseline strain. Violacein production was further increased to 1.31 g/L in a controlled bench-top bioreactor. We found that formulating discretized gene expression levels into logarithmic variables (Linlog transformation) was essential in implementing this DoE based optimization procedure. The reported methodology can aid multivariate combinatorial pathway engineering and may be generalized as a standard procedure for accelerating strain engineering and improving metabolic pathway efficiency.

Key Words: Metabolic engineering, Promoter library, Combinatorial optimization, Synthetic biology, Statistical models and Response Surface Methodology

2 ACS Paragon Plus Environment

Page 2 of 31

Page 3 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Table of Contents (TOC) Figure

3 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

Introduction Recent advances in gene synthesis, facile cloning and gene assembly techniques allow metabolic engineers to rapidly reconfigure a cell’s genetic blueprint at a speed and scale never seen before 1. As enabling technology, metabolic engineering has become a major driver to construct efficient microbial cell factories for next-generation bio-economy 2. Mirroring nature’s exquisite chemistry to synthesize chemicals and natural products, metabolic engineers have now been able to produce a large portfolio of commodity chemicals sustainable fuels

6, 7

and pharmaceuticals

8, 9

3, 4

, novel materials 5,

from renewable feedstocks. This is often achieved

through standard metabolic engineering strategies including overexpression of rate-limiting steps

10

, deletion of competing pathways

other cofactors

11

, managing ATP

12, 13

and recycling NADPH and

13

. While these approaches have been shown to effectively improve cellular

productivity and yield, they should be further complemented with algorithm and tools that speed up genotype search process across larger gene expression landscape. Apart from chemistry, the metabolic complexity underlying biological systems further compounds the task to engineer microbial overproduction phenotypes, as introduced heterologous pathways typically compete for limited cellular resources. For example, precursor flux improvement by overexpression of upstream pathways may not be accommodated by downstream pathways; intermediate accumulation or depletion may compromise cell viability and pathway productivity

14

; and overexpressed heterologous proteins may penalize the cell

with additional energy cost and elicit cellular stress response

15

. To address these issues,

renewed interest has emerged in exploring combinatorial approaches to control and optimize biosynthetic pathways. For example, recent metabolic engineering effort has shifted towards coordinating metabolic flux redistribution including modification of plasmid copy number promoter strength

17

, gene codon usage

18

and RBS strength

16

,

15, 19

. In addition, combinatorial

transcriptional engineering coupled with efficient gene assembly and molecular evolution tools 4 ACS Paragon Plus Environment

Page 5 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

has attracted considerable interest as potential methods for optimizing multi-gene pathways in different host cells. Examples include the heterologous production of anti-cancer taxol precursors and flavonoids in E. coli produce advanced biofuels

20, 21

, combinatorial optimization of fatty acids pathway to

22, 23

, multiplexd regulatory RNA

24

and global transcriptional

machinery engineering 25 to produce L-tyrosine. Existing pathway engineering strategies follow a rather intuitive framework reminiscent of the reverse engineering principles. It comprises an iterative cycle of pathway debottlenecking and evolutionary improvement, which involves extensive trial-and-error work and requires examining a large libraries of strain candidates

26, 27

. Furthermore, the cost of constructing and testing a

large number of pathways limits our ability to search the gene expression space to locate the desirable overproduction phenotype. As such, effective Design of Experiments (DoE) procedures are essential to probe the gene expression landscape and locate the high-producing expression patterns. Indeed, statistical model based DoE procedures have been widely used to optimize biological processes including media design protein purification and virus transfection linear regression models

31-33

28

, bioreactor operation, therapeutic

29

. In particular, principal component analysis

30

and

have been recently used to identify pathway bottlenecks and

improve metabolite production. For instance, Zhou et al reported the combinatorial optimization of genetic constructs and cultivation conditions to improve the nylon precursor 6-aminocaproic acid production in E. coli 33. To pursue other statistical models that improve pathway efficiency, we explored the classical Plackett-Bruman design and Box-Behnken design to probe the gene expression space. Invented by statisticians Plackett and Burman in 1946, Plackett-Burman is an efficient screening method to identify the main factors with as few experimental runs as possible

34

. Plackett-

Burman design has been widely used to determine the most important factors early in the experiment phase when complete interactions of multiple variables are not available. When 5 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

considering interactions of multiple variables, Box-Behnken design is a powerful tool to explore the quadratic effects and determine the optimal conditions of the investigated variables

35

.

Combined with Plackett-Burman and Box-Behnken design, optimal physical parameters (gene expression profile herein) could be rapidly located to maximize/minimize an objective function when a large subset of variable space is interrogated. In this report, we applied a statistical model-based DoE procedure to guide the implementation of combinatorial pathway engineering. We first constructed an efficient T7 promoter library that covers a broad range of gene expression dynamics spanning across 1,000-fold transcriptional activity. Combined with DoE principles, this promoter library was used to systematically sample the multidimensional gene expression space defined by a five gene pathway and optimize the production titer with limited experimental inputs. Without too much a priori biochemical knowledge, Plackett-Burman design allowed us to quickly screen the main pathway components that determine the overall pathway efficiency, while avoiding reconstructing all possible pathway candidates. Finally, a quadratic response surface design (Box-Behnken) enabled us to capture the main features of the genotype-phenotype landscape and identify the optimal expression patterns of the investigated pathway. The statistical DoE methodology is applied to the synthesis of the chemotherapeutic drug violacein and allowed marked production improvement in an engineered E. coli strain. This statistical DoE methodology provided a powerful and efficient solution to probe the production titer-gene expression relationship with minimal experiment effort. We envision that this statistical model based multiplexed combinatorial engineering approach could be generalized to optimize multiple gene pathways, accelerate strain engineering and improve natural product production as a standard practice.

2. Results and discussion 2.1. Construction of T7 promoter library that covers broad range of transcriptional dynamics 6 ACS Paragon Plus Environment

Page 7 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

We first constructed promoter libraries to facilitate the combinatorial search of multi-dimensional gene expression space. Traditional library construction approaches including error-prone PCR 36

or site-directed mutagenesis 37 suffer from low mutation rates, intense library screening efforts

and biased gene expression levels. Here we adopted a novel library construction approach that combines partially overlapping synthetic oligoes with sophisticated gene recombination techniques (Gibson assembly). Random mutations in the form of degenerated oligoes were specifically introduced to the core region of regulatory genetic elements and the mutant library could be further screened from a fluorescence protein readout. We targeted the bacteriophage T7 promoter as it is widely used in various metabolic engineering applications. The strength and efficiency of T7 RNA polymerase (RNAP) are mainly determined by two factors: the recruitment of T7 RNAP and sigma factors on the promoter core region (-35 and -10) and the proceeding of the RNAP across the repressor binding region (lacO)

38

. As such, we used randomly

synthesized oligoes to mutate both the sigma factor binding region (-35 and -10) and the lacI repressor binding region (lacO) (Fig. 1a). IPTG induces the dissociation of lacI repressor from the lacO site and thus allows T7 RNAP to read across the DNA template and give rise to fluorescence signals. With green fluorescence protein as the reporter, we quantified the T7 promoter transcriptional activity of the constructed promoter candidates. eGFP results indicate that the constructed library covers a broad range of transcriptional dynamics spanning across three orders of magnitude (Fig. 1b). For example, the highest transcriptional activity is obtained at 21,340 FU/OD with promoter No. 2, a 7-fold increase compared to the original T7 promoter (3,942 FU/OD with promoter No. 48). In contrast, the lowest transcriptional activity is achieved at 22.3 FU/OD with promoter No. 46, relatively 1000-times lower than the highest promoter (No. 2). From the promoter strength distribution profile, we sequenced 20 promoters and the sequencing results indicated that all investigated promoters contained the desired mutations at the designed

7 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

genetic loci (-35/-10 and lacI repressor binding regions) (Fig. 1c and Suppl. Fig. S1). It is well known that T7 promoter activity is stringently controlled by the amount of lacI repressor protein within the cell. Mutation in the lacI repressor binding region (lacO) would alter the lacI repressor and lacO binding affinity and possibly lead to protein leaky expression. Interestingly, promoter libraries with mutant lacI repressor binding region exhibit relatively small leaky expression discrepancy (eGFP expression in the absence of IPTG induction) ranging roughly from 107 FU/OD to 311 FU/OD (Suppl. Fig. S2), indicating that our designed lacO region may effectively block the proceedings of the T7 RNAP. The 1,000-fold dynamic range of transcription obtained in the investigated promoters is primarily ascribed to the synergy interaction between the promoter core region (-35 and -10) and the repressor binding region (lacO) that may cooperatively reshape the transcriptional response. By simultaneously mutating both the core promoter region (-35/-10) and the repressor binding (lacO) region, we were able to construct artificial T7 promoters with transcriptional activity spanning across three orders of magnitude while maintaining leaky expression at relatively low levels, a feature that would not be easily achieved by conventional promoter engineering approaches. The constructed promoter library enabled three important functions in the quest for optimal gene expression patterns: (a) identification of promoters with extreme high and low strength that are useful to define the boundary of gene expression space; (b) implementation of Design of Experiment procedures to reduce the gene expression space and simplify strain engineering effort; (c) the combinatorial search of optimal gene expression patterns determined from BoxBehnken design. These results are further elaborated in the following sections. 2.2. Plackett–Burman design to screen main effects that determine violacein pathway efficiency To facilitate statistical model-based pathway optimization, we specifically chose the promoters that confer biologically meaningful phenotype change. To this end, we ranked the constructed 8 ACS Paragon Plus Environment

Page 8 of 31

Page 9 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

promoters using a linear-logarithmic (Linlog) transformation that geometrically rescales the discretized linear gene expression levels into logarithmic dimensionless variables (we will discuss why we adopt the Linlog transformation in section 4). This Linlog transformation follows a general mathematical formula described in Fig. 2a and Fig. 2b for the forward and reverse transformations, respectively. When the promoter activity is at a predesignated high level Pmax (i.e. promoter No. 2), the recoded promoter has a rescaled value +1; on the other hand, when the promoter activity is at a low level Pmin (i.e. promoter No. 36), the recoded promoter has a rescaled value -1 (Fig. 2c). The geometric average of Pmax and Pmin (the square root of the product of Pmax and Pmin) corresponds to the middle level (0 point) of the rescaled promoters. Next we applied these rescaled promoters to sample the gene expression space of a model natural product pathway – a five gene violacein pathway. Violacein biosynthesis starts with the oxidation of the endogenous amino acid L-tryptophan and the resulting intermediate further undergoes dimerization, decarboxylation and multiple reductions to form the final purple compound violacein (Fig. 3a). The major byproduct of the pathway, deoxyviolacein, is generated through the reduction of the overflowed intermediate (protodeoxyviolaceinic acid) catalyzed by VioC (Fig. 3a). The search for an optimal gene expression profile that maximizes violacein production would typically require the assembly of a complete set of pathway combinations corresponding to different expression level of the five pathway genes and assessment of product formation by each strain. Instead of constructing all possible pathway combinations to exhaustively search the gene expression space, we employed the Plackett-Burman design (a factorial design) to fractionally sample the gene expression space with the aim to minimize the experimental effort and quickly identify the main pathway components that determine violacein pathway efficiency. Specifically, the number of pathway constructs in the Plackett-Burman design will be reduced to 12 (Table 1) for a five gene pathway (VioA, VioB, VioC, VioD and VioE), given the fact that either the strong promoter (+1) or the weak promoter (-1) is used to

9 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

drive the expression of each gene. As a comparison, a full factorial design for a five gene pathway requires 32 constructs (25) to cover all combinations of the gene expression space. For an 11-gene pathway, it just requires 12 genetic constructs with PB design. The total number of genetic constructs will be 121 if a full factorial design is pursued (112) for the 11-gene pathway. Obviously, PB design is very powerful to screen main effects from multiple input variables with minimal experimental runs. Following the arrangements indicated in Table 1, we assembled 12 violacein pathways with the individual gene component (VioA, VioB, VioC, VioD and VioE) either driven by a strong promoter (+1, promoter No. 2) or a weak promoter (-1, promoter 36). The detailed genetic configuration of the 12 violacein pathways is illustrated in Fig. 4. The output of the pathway, violacein production in shake flasks, was statistically analyzed by the one-way analysis of variance (ANOVA) to screen the major enzymatic steps that determine pathway efficiency. Based on the mean value distribution of violacein production, a p value can be calculated for each of the pathway component to statistically quantify how significant each of the enzymatic step contributes to the violacein production (Fig. 5). ANOVA analysis indicates that the effect of VioA and VioE on violacein production are insignificant (p > 0.05, Fig. 5), as violacein production is hardly impacted whether a strong or weak promoter is used, indicating that a broad range of VioA or VioE expression would accommodate a high violacein production phenotype. However, the effect of VioB, VioC and VioD on violacein production is statistically significant (p < 0.05, Fig. 5): the production of violacein is significantly affected when the promoter is switched from a weak version to a strong version. With this guidance, the five-step biosynthetic pathways were grouped into three pivotal components defined by the three important genes (VioB, VioC and VioD). The less important genes VioA and VioE were lumped with the important genes in synthetic operons, thus reducing the five dimensional gene expression design space to a three dimensional design space. In this way, fractional factorial

10 ACS Paragon Plus Environment

Page 11 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

design allowed us to quickly screen the relative impacts of the pathway components and reduce the gene expression design space, where we can reorganize gene clusters in synthetic operons to reduce experiment effort. 2.3. Box–Behnken design to determine the optimal expression level of the main pathway components After screening gene expression combinations and identifying the key genetic determinants of violacein pathway, we applied a quadratic polynomial regression model to determine the optimal gene expression level of the investigated pathways. To reduce the gene expression design space, we constructed three synthetic operons comprising VioAB, VioD and VioEC to reconfigure violacein pathway. Multivariate analysis was performed to identify the quadratic correlation between violacein production and the expression level of the three reorganized gene clusters using terms of linear and quadratic interactions. Details of the mathematical formulation of the quadratic statistical model are given in supplementary Fig. S5. We next applied Box-Behnken design to probe the optimal expression level of the reorganized gene cluster (Table 2). A total of 13 violacein pathways (the central point is repeated three times to estimate the prediction variance over the design space) were constructed and the detailed genetic configuration of the pathways are illustrated in Fig. 6 Violacein production of the engineered strains were tested in shake flasks. Representative strains carrying these pathways are also illustrated in Fig. 3b. All of the strains were found to be able to produce the purple pigment violacein in shake flasks. Quantitative relationships between gene expression and violacein production are plotted as a 3-D surface with contour plots projected on the x-y plane (Fig. 7a and 7b). This genotype-phenotype landscape allows us to quickly identify the optimal gene expression levels with a limited number of pathway constructs and experiment inputs. The quadratic response surface predicted that a maximal violacein production of 500.2 mg/L would

11 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 31

be achieved when the expression level of VioAB, VioD and VioEC is tuned to -0.59, 0.77 and 0.17, respectively (Fig. 7a and 7b). Validation of the model predictions was performed by constructing pathways that are driven by promoters from the constructed libraries (section 1) that can drive the above optimal gene expressions. Due to the discretized nature of the promoter library, we chose promoters from the Linlog transformed libraries (Fig. 2c) with transcriptional activity that is closest to the predicted values. For example, we used promoter No. 26, No. 10 and No. 31, which approximate the optimal expression of -0.59, 0.77 and -0.17, respectively, to construct pathways that drive VioAB, VioD and VioEC expressions. The resulting strain produced violacein at 525.4 mg/L with minimal byproduct accumulation (around 2.7 mg/L deoxyviolacein), which is in good agreement with the model prediction (500.2 mg/L) and represents an additional 21.5% increase compared to the best strain obtained in the PB screening experiments. When this strain was cultivated in a 2-liter bioreactor, violacein production was further increased to 1.31 g/L with the glucose conversion yield at 0.021 g/g (suppl. Fig. S7). Violacein was found to be primarily intracellular (Fig. 8c) and accounts for 3.3% of the total dry cell weight (or 32.7 mg/gDW). Regression coefficients for each of the terms in the quadratic model are shown in Fig. 8a. The proposed model was found to fit well the experimental data with an adjusted coefficient of determination (R2) at 0.885 (Fig. 8b and suppl. Fig. S6). AVONA analysis indicated that the linear term (VioAB, VioD and VioEC) and their associated quadratic terms (VioAB*VioAB, VioD*VioD and VioEC*VioEC) play most significant role on violacein production (suppl. Fig. S6); while the cross-interactions of pathway modules (VioAB*VioD, VioAB*VioEC and VioD*VioEC) are less significant (p > 0.05). In summary, the quadratic regression model allowed us to identify the optimal gene expression level of the investigated pathway components and further increase the metabolite production by 21.5%.

12 ACS Paragon Plus Environment

Page 13 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

2.4. Correlation between gene expression and violacein production may follow Weber’s law and statistical thermodynamic theory The reason we used the Linlog transformation to rescale the screened promoter libraries is rooted in two basic biophysical laws. The first one is Weber’s law that describes how physiological and cellular sensory systems would change upon an input of cellular stimuli. In a strengthened version, Weber’s law states that cellular response dynamics is proportional to the relative change in input signal but not its absolute value

39, 40

. Translated into mathematical

language, it implies that the appreciable phenotypic response dP is proportional to the relative change of the stimuli dS/S. By integration, the cellular response P follows a logarithmic relationship with the input S (i.e. P = k lnS + C, k and C are constants). When Weber’s law is used to delineate the gene expression – metabolite production relationship, it would naturally follow that changes in production titer should be logarithmically correlated with promoter strength. As such, our statistical models used the logarithmic value of promoter strength as input variable, to probe the production titer-gene expression relationship. A second consideration that encouraged us to use Linlog transformation is based on observations that promoter transcriptional activity could be well described with statistical thermodynamic theory

41-43

. This theory accounts for the binding states of RNA polymerase

(RNAP) and the molecular interactions among RNAP, promoter and transcriptional factors (TFs). The basic assumption is that gene expression level is proportional to the equilibrium probability that RNAP is bound to the promoter of interest

44

. This promoter occupancy

probability could be further formulated with the ratio of favorable binding states over the sum of all possible states that RNAP could have, with each of the binding states following the Boltzmann partition function. The probability of the formation of favorable RNAP-promoter complex is simply a function of the binding energy (P = k*e-∆G/(kT), P is the binding probability which is proportional to the promoter activity, ∆G is the free energy change upon binding with a 13 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 31

specific promoter, k is Boltzmann constant, T is the absolute temperature). Due to the exponential relationship between promoter activity and the free energy change (P = k*e-∆G/(kT)), the Linlog transformation actually allows the production titer to be directly related to the free energy change of RNAP-promoter interaction. A practical consideration of using the Linlog transformation is to normalize the discretized gene expression data into evenly distributed logarithmic variables. For example, Linlog transformation has converted the highly skewed gene expression data (Fig. 1b) into evenly distributed logarithmic variables (Fig. 2c). This geometric scaling allowed us to choose the appropriate promoters to perform combinatorial pathway engineering. As the effect of gene expression on metabolite production involves complex biological events (including transcription, translation, protein folding, enzyme catalysis and metabolic flux redistribution), correlating violacein production with these logarithmically-scaled promoters covers these multiple unknown steps. Consequently, the response of violacein production to the relative expression of VioAB, VioD and VioEC is defined by a smooth surface which enabled us to easily extrapolate and determine the optimal expression level of the investigated gene clusters (Fig. 7a and Fig. 7b). In contrast, the genotype-phenotype landscape displays a rather broad and skewed pattern when the actual values of promoter activity are used (Fig. 7c and Fig. 7d), making it rather difficult to identify the optimal expression value of the investigated gene cluster.

3. Conclusions Combinatorial pathway engineering requires iterative cycles of pathway debottlenecking to improve strain performance. To minimize experimental effort and speed up this process, we applied statistical model-based DoE to guide the implementation of combinatorial pathway engineering. An efficient promoter library was constructed and coupled with Plackett-Burman design to effectively screen enzymatic steps that determine pathway efficiency and reduce the gene expression design space. Then a quadratic regression model (Box-Behnken design) was 14 ACS Paragon Plus Environment

Page 15 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

used to determine the optimal expression level of the investigated pathway modules. The final strain produced 1.31 g/L of violacein and represents a promising starter strain for further industrial application. The T7 promoter library shows transcriptional activity covering three orders of magnitude and can be widely used in other synthetic biology and metabolic engineering applications. This statistical DoE methodology addresses one of the major challenges in implementing multivariate combinatorial metabolic engineering, for example to reduce gene expression space, combinatorial search of the optimal gene expression patterns and rapid identification of the desired overproduction phenotype. We envision this optimization framework can generally aid the efforts to accelerate strain engineering and improve metabolite production.

4. Materials and Methods 4.1. T7 promoter library construction and screening T7 promoter library was constructed by Gibson assembling of synthetic overlapped oligoes to replace the original T7 promoter in pETM6-eGFP. Single stranded degenerated DNA oligoes T7lacO_F and T7lacO_R (Supplementary Table S1) were annealed in Tris-HCl and EDTA buffer following three cycles of heating (95 °C for 2 minutes) and cooling (25 °C for 1.5 minutes) on a Biorad PCR block. Then 5 µL of 10 µM of the annealed oligoes were mixed with 2.5 µL of the AvrII and XbaI digested and gel purified pETM6-eGFP vector (~25 ng plasmid DNA) and 7.5 µL of the 2 x Gibson Assembly. The mixture was kept at 50 °C for 1 hour. Then 3 µL of the Gibson reaction was chemically transformed into 12 µL NEB5α high efficient competent cell. Overnight grown colonies were scraped with a razor blade and the library plasmid DNA was prepared with the Zyppy miniprep kits. 1.5 µL of the library plasmid was transformed into 20 µL BL21(DE3) star cell with electroporation. Positive clones containing T7 promoter libraries should grow on LB-Ampicillin plate.

15 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 31

For promoter activity screening, BL21 transformants were individually inoculated into 2.5 mL LB broth supplemented with 100 µg/mL ampicillin and grown at 37 °C with shaking 250 rpm overnight. The next morning, 20 µL of the overnight culture was inoculated to 220 µL fresh LB (with 100 µg/mL ampicillin) in a Greiner Bio-One black fluorescence plate. 10 µL 5 mM IPTG or sterile water was added to each well with a multichannel pipette. The entire plate was incubated at a 30 °C plate shaker with shaking at 400 rpm. Then green fluorescence was detected with a SpectraMax microplate reader (Molecular device) with excitation at 495 nm and emission at 512 nm every hour. Promoter activity was calculated with the rate of green fluorescence accumulation divided by the rate of cell density increase. Almost all the samples have a linear response curve with the R2 above 0.95. All experiments were performed in triplicates to ensure reproducibility. 4.2. ePathBrick directed modular assembly of violacein pathway Chromobacterium violaceum 12472 type strain was purchased from ATCC and the genomic DNA was isolated and purified with the Invitrogen genomic DNA purification kit. VioA, VioB, VioC, VioD and VioE were PCR amplified using primers listed in Table S1 with the 2xQ5 PCR master mix (NEB). The amplified VioA, VioC and VioE PCR products were cleaned up using Zymoclean PCR kits and further double digested with NdeI and XhoI and ligated with the NdeI and XhoI digested pETM6 to give pETM6-VioA, pETM6-VioC and pETM6-VioE. The amplified and PCR-cleaned VioB and VioD gene fragments were Gibson assembled into the NdeI and XhoI digested and gel purified pETM6 to give pETM6-VioB and pETM6-VioD. Then T7 weak (Promoter#36), medium (Promoter#44) and strong (Promoter#2) promoters were subcloned to the previously constructed violacein pathways to replace the original T7 promoter in pETM6VioA, pETM6-VioB, pETM6-VioC, pETM6-VioD and pETM6-VioE. To facilitate the expression of multiple gens and reduce the length of operon structure, a new vector pCDMx was constructed by mutating the additional XbaI site on pCDM4 using primers Xba_F and Xba_R. Site directed mutagenesis was performed using Pfu Ultra DNA polymerase (Agilent genomic science) and BW27784 as host strain. Operon and monocistronic gene configuration were assembled following a combinatorial modular approach reported by Xu et al 45, 46. For Plackett-Burman screening, monocistronic gene construct was used to ensure context-independent gene expression. For example, pCDMx-P36-VioC-P2-VioD (monocistronic) was constructed by ligating the AvrII and SalI digested pETM6-P2-VioD with the NheI and SalI digested pCDMx-P36-VioC. pETM6-P26VioAB (operon) was constructed by ligating the ApaI and SpeI digested pETM6-P26-VioA with the ApaI and XbaI digested pETM6-VioB. pCDMx-P31-VioEC-P10-VioD (VioEC in operon, VioD is placed downstream of VioEC insulated by a single terminator and driven by a P10 promoter) was constructed by ligating the AvrII and SalI digested pETM6-P10-VioD with the NheI and SalI digested pCDMx-P31-VioEC. Following this facile gene assembly procedure, all violacein pathways were constructed in this modular and combinatorial fashion. Normally, one can finish three rounds of cloning and assemble around 40 genetic constructs in one week with a normal 16 ACS Paragon Plus Environment

Page 17 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

workload schedule (40 hours per week). The detailed genetic configuration of the constructed pathways are listed in Fig. 4 and Fig. 6. All constructed pathways were listed in supplementary Table S2. 4.3. Media and growth condition LB broth was routinely used to cultivate E. coli NEB5α, BW27784 and BL21(DE3) star to maintain, propagate and prepare plasmid and screen promoter activity. A modified AAM media 37 was used to cultivate violacein producing strains. The modified media contains 3.5 g/L KH2PO4, 5.0 g/L K2HPO4, 3.5 g/L (NH4)2HPO4, 2 g/L casamino acids, 8.4 g/L MOPS, 0.72 g/L Tricine, 2.8 mg/L FeSO4•7H2O, 2.92 g/L NaCl, 0.51 g/L NH4Cl, 1 mM MgSO4, 0.1 mM CaCl2 and 20 g/L glucose supplemented with ampicillin (70 µg/mL) and streptomycin (40 µg/mL). Strains constructed with Plackett-Burman and Box-Behnken design were cultivated in 250 mL flask containing 30 mL modified AAM media and grow at 25 °C for 3 days. 0.2 mM IPTG was used to induce the expression of protein at OD around 0.6~0.8. Scaled-up experiment was carried out using a 2-liter benchtop bioreactor using the same media but with 20 g/L glucose pulsing at 20 hour and 44 hour. 0.2 mM IPTG, 70 µg/mL ampicillin and 40 µg/mL streptomycin were also pulsed at 20 hour and 44 hour to the fermentation broth to maintain protein expression and strain genetic stability. The dissolved oxygen was maintained at 20% saturation by coupling agitation with DO control. The aeration was controlled at 1 vvm and temperature was maintained at 20 °C by cooling water. 4.4. Metabolite extraction and analysis Residual glucose was analyzed with Agilent 1260 HPLC equipped with a BioRad Aminex HPX87h column and refractive index detector eluted with 14 mM sulfuric acid. Violacein was primarily found intracellular. The extraction and assay of violacein was carried out following the protocol reported by Jones et al 37. Violacein and deoxyviolacein standards were purchased form Sigma-Aldrich. A spectrophotometer based assay was also developed according to the maximal absorbance of violacein at 575 nm. 4.5. Data analysis and model fitting Statistical data analysis and model fitting were performed with the JMP 12.0 software suite. All quantitative plots were made on the OriginPro. 9.0 software.

5. Supporting information The Supporting Information is available free of charge on the ACS Publications website at DOI: Supplementary information includes primers and synthetic oligoes (Table S1), strains and plasmids (Table S2), sequence of T7 promoter library (Fig. S1), T7 promoter library leaky expression pattern (Fig. S2), main effect plot of PB design (Fig. S3), statistical analysis of PB design (Fig. S4), model of BoxBehnken design (Fig. S5), statistical analysis of Box-Behnken design (Fig. S6) and fermenter scale-up of violacein-producing strain (Fig. S7).

17 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

Contributions PX and GS designed the study. PX, EAR and SYS performed the study with input from Adlai R Grayson. PX analyzed the data and wrote the manuscript with advices from GS.

Acknowledgements The authors would like to thank the GS lab members (particularly KJ Qiao) for helpful discussions. PX would like to thank the postdoctoral funding support from DOE and Stephanopoulos lab.

Competing financial interests The authors declare no competing financial interests with the results obtained in this study.

References: (1) Xu, P.; Bhan, N.; Koffas, M. A. G., (2013) Engineering plant metabolism into microbes: from systems biology to synthetic biology. Current Opinion in Biotechnology, 24 (2), 291-299. (2) Biggs, B. W.; De Paepe, B.; Santos, C. N. S.; De Mey, M.; Kumaran Ajikumar, P., (2014) Multivariate modular metabolic engineering for pathway and strain optimization. Current Opinion in Biotechnology, 29, 156-162. (3) Lee, J. W.; Kim, T. Y.; Jang, Y. S.; Choi, S.; Lee, S. Y., (2011) Systems metabolic engineering for chemicals and materials. Trends Biotechnol, 29 (8), 370-8. (4) Tai, Y. S.; Xiong, M.; Zhang, K., (2015) Engineered biosynthesis of medium-chain esters in Escherichia coli. Metab Eng, 27, 20-8. (5) Xiong, M.; Schneiderman, D. K.; Bates, F. S.; Hillmyer, M. A.; Zhang, K., (2014) Scalable production of mechanically tunable block polymers from sugar. Proceedings of the National Academy of Sciences, 111 (23), 8357-8362. (6) Qiao, K.; Imam Abidi, S. H.; Liu, H.; Zhang, H.; Chakraborty, S.; Watson, N.; Kumaran Ajikumar, P.; Stephanopoulos, G., (2015) Engineering lipid overproduction in the oleaginous yeast Yarrowia lipolytica. Metab Eng, 29, 56-65. (7) Xu, P.; Li, L.; Zhang, F.; Stephanopoulos, G.; Koffas, M., (2014) Improving fatty acids production by engineering dynamic pathway regulation and metabolic control. Proceedings of the National Academy of Sciences of the United States of America, 111 (31), 11299-11304. (8) Lin, Y.; Shen, X.; Yuan, Q.; Yan, Y., (2013) Microbial biosynthesis of the anticoagulant precursor 4-hydroxycoumarin. Nat Commun, 4, 2603. (9) Thodey, K.; Galanie, S.; Smolke, C. D., (2014) A microbial biomanufacturing platform for natural and semisynthetic opioids. Nat Chem Biol, 10 (10), 837-844. (10) Tai, M.; Stephanopoulos, G., (2013) Engineering the push and pull of lipid biosynthesis in oleaginous yeast Yarrowia lipolytica for biofuel production. Metabolic Engineering, 15 (1), 1-9. (11) Stephanopoulos, G., Synthetic Biology and Metabolic Engineering. (2012) ACS Synthetic Biology, 1 (11), 514-525. 18 ACS Paragon Plus Environment

Page 19 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

(12) Lan, E. I.; Liao, J. C., (2012) ATP drives direct photosynthetic production of 1-butanol in cyanobacteria. Proceedings of the National Academy of Sciences, 109 (16), 6018-6023. (13) Singh, A.; Soh, K.; Hatzimanikatis, V.; Gill, R., (2011) Manipulating redox and ATP balancing for improved production of succinate in E. coli. Metabolic Engineering, 76-81. (14) Leonard, E.; Ajikumar, P.; Thayer, K.; Xiao, W.; Mo, J.; Tidor, B.; Stephanopoulos, G.; Prather, K., (2010) Combining metabolic and protein engineering of a terpenoid biosynthetic pathway for overproduction and selectivity control. Proc Natl Acad Sci U S A, 13654-13659. (15) Zelcbuch, L.; Antonovsky, N.; Bar-Even, A.; Levin-Karp, A.; Barenholz, U.; Dayagi, M.; Liebermeister, W.; Flamholz, A.; Noor, E.; Amram, S.; Brandis, A.; Bareia, T.; Yofe, I.; Jubran, H.; Milo, R., (2013) Spanning high-dimensional expression space using ribosome-binding site combinatorics. Nucleic Acids Research, 41 (9), e98-e98. (16) Juminaga, D.; Baidoo, E. E.; Redding-Johanson, A. M.; Batth, T. S.; Burd, H.; Mukhopadhyay, A.; Petzold, C. J.; Keasling, J. D., (2012) Modular engineering of L-tyrosine production in Escherichia coli. Appl Environ Microbiol, 78 (1), 89-98. (17) Anthony, J.; Anthony, L.; Nowroozi, F.; Kwon, G.; Newman, J.; Keasling, J., (2009) Optimization of the mevalonate-based isoprenoid biosynthetic pathway in Escherichia coli for production of the antimalarial drug precursor amorpha-4,11-diene. Metab Eng, 11 (1), 13-19. (18) Bokinsky, G.; Peralta-Yahya, P. P.; George, A.; Holmes, B. M.; Steen, E. J.; Dietrich, J.; Soon Lee, T.; Tullman-Ercek, D.; Voigt, C. A.; Simmons, B. A.; Keasling, J. D., (2011) Synthesis of three advanced biofuels from ionic liquid-pretreated switchgrass using engineered Escherichia coli. Proc Natl Acad Sci U S A, 108 (50), 19949-54. (19) Salis, H.; Mirsky, E.; Voigt, C., (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnol, 27 (10), 946-950. (20) Ajikumar, P. K.; Xiao, W.-H.; Tyo, K. E. J.; Wang, Y.; Simeon, F.; Leonard, E.; Mucha, O.; Phon, T. H.; Pfeifer, B.; Stephanopoulos, G., (2010) Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli. Science, 330 (6000), 70-74. (21) Wu, J.; Du, G.; Zhou, J.; Chen, J., (2013) Metabolic engineering of Escherichia coli for (2S)pinocembrin production from glucose by a modular metabolic strategy. Metabolic Engineering, 16, 4855. (22) Xu, P.; Gu, Q.; Wang, W.; Wong, L.; Bower, A. G. W.; Collins, C. H.; Koffas, M. A. G., (2013) Modular optimization of multi-gene pathways for fatty acids production in E. coli. Nature Communications, 4, 1409. (23) Chen, B.; Lee, D.-Y.; Chang, M. W., (2015) Combinatorial metabolic engineering of Saccharomyces cerevisiae for terminal alkene production. Metabolic Engineering, 31, 53-61. (24) Na, D.; Yoo, S. M.; Chung, H.; Park, H.; Park, J. H.; Lee, S. Y., (2013) Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat Biotech, 31 (2), 170-174. (25) Santos, C. N. S.; Xiao, W.; Stephanopoulos, G., (2012) Rational, combinatorial, and genomic approaches for engineering L-tyrosine production in Escherichia coli. Proceedings of the National Academy of Sciences, 109 (34), 13538-13543. (26) Yadav, V. G.; De Mey, M.; Giaw Lim, C.; Kumaran Ajikumar, P.; Stephanopoulos, G., (2012) The future of metabolic engineering and synthetic biology: Towards a systematic practice. Metabolic Engineering, 14 (3), 233-241. (27) Boock, J. T.; Gupta, A.; Prather, K. L. J., (2015) Screening and modular design for metabolic pathway optimization. Current Opinion in Biotechnology, 36, 189-198. (28) Xu, P.; Ding, Z.; Qian, Z.; Zhao, C.; Zhang, K., (2008) Improved production of mycelial biomass and ganoderic acid by submerged culture of Ganoderma lucidum SB97 using complex media. Enzyme and Microbial Technology, 42 (4), 325-331.

19 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

(29) Mandenius, C.-F.; Brundin, A., (2008) Bioprocess optimization using design-of-experiments methodology. Biotechnology Progress, 24 (6), 1191-1203. (30) Alonso-Gutierrez, J.; Kim, E.-M.; Batth, T. S.; Cho, N.; Hu, Q.; Chan, L. J. G.; Petzold, C. J.; Hillson, N. J.; Adams, P. D.; Keasling, J. D.; Garcia Martin, H.; Lee, T. S., (2015) Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering. Metabolic Engineering, 28, 123-133. (31) Lee, M. E.; Aswani, A.; Han, A. S.; Tomlin, C. J.; Dueber, J. E., (2013) Expression-level optimization of a multi-enzyme pathway in the absence of a high-throughput assay. Nucleic Acids Research. (32) Jeschek, M.; Gerngross, D.; Panke, S., (2016) Rationally reduced libraries for combinatorial pathway optimization minimizing experimental effort. Nat Commun, 7. (33) Zhou, H.; Vonk, B.; Roubos, J. A.; Bovenberg, R. A. L.; Voigt, C. A., (2015) Algorithmic cooptimization of genetic constructs and growth conditions: application to 6-ACA, a potential nylon-6 precursor. Nucleic Acids Research. (34) Plackett, R. L.; Burman, J. P., (1946) The Design of Optimum Multifactorial Experiments. Biometrika, 33 (4), 305-325. (35) Ferreira, S. L. C.; Bruns, R. E.; Ferreira, H. S.; Matos, G. D.; David, J. M.; Brandão, G. C.; da Silva, E. G. P.; Portugal, L. A.; dos Reis, P. S.; Souza, A. S.; dos Santos, W. N. L., (2007) Box-Behnken design: An alternative for the optimization of analytical methods. Analytica Chimica Acta, 597 (2), 179-186. (36) Alper, H.; Fischer, C.; Nevoigt, E.; Stephanopoulos, G., (2005) Tuning genetic control through promoter engineering. Proceedings of the National Academy of Sciences of the United States of America, 102 (36), 12678-12683. (37) Jones, J. A.; Vernacchio, V. R.; Lachance, D. M.; Lebovich, M.; Fu, L.; Shirke, A. N.; Schultz, V. L.; Cress, B.; Linhardt, R. J.; Koffas, M. A. G., (2015) ePathOptimize: A Combinatorial Approach for Transcriptional Balancing of Metabolic Pathways. Scientific Reports, 5, 11301. (38) Temme, K.; Hill, R.; Segall-Shapiro, T. H.; Moser, F.; Voigt, C. A., (2012) Modular control of multiple pathways using engineered orthogonal T7 polymerases. Nucleic Acids Research, 40 (17), 87738781. (39) Adler, M.; Mayo, A.; Alon, U., (2014) Logarithmic and Power Law Input-Output Relations in Sensory Systems with Fold-Change Detection. PLoS Comput Biol, 10 (8), e1003781. (40) Shoval, O.; Goentoro, L.; Hart, Y.; Mayo, A.; Sontag, E.; Alon, U., (2010) Fold-change detection and scalar symmetry of sensory input fields. Proceedings of the National Academy of Sciences, 107 (36), 15995-16000. (41) Kuhlman, T.; Zhang, Z.; Saier, M. H.; Hwa, T., (2007) Combinatorial transcriptional control of the lactose operon of Escherichia coli. Proceedings of the National Academy of Sciences, 104 (14), 60436048. (42) Kinney, J. B.; Murugan, A.; Callan, C. G.; Cox, E. C., (2010) Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proceedings of the National Academy of Sciences, 107 (20), 9158-9163. (43) Brewster, R. C.; Jones, D. L.; Phillips, R., (2012) Tuning Promoter Strength through RNA Polymerase Binding Site Design in Escherichia coli. PLoS Comput Biol, 8 (12), e1002811. (44) Bintu, L.; Buchler, N. E.; Garcia, H. G.; Gerland, U.; Hwa, T.; Kondev, J.; Kuhlman, T.; Phillips, R., (2005) Transcriptional regulation by the numbers: applications. Current Opinion in Genetics & Development, 15 (2), 125-135. (45) Xu, P.; Vansiri, A.; Bhan, N.; Koffas, M., (2012) ePathBrick: A Synthetic Biology Platform for Engineering Metabolic Pathways in E-coli. ACS Synth. Biol., 1 (7), 256-266. (46) Xu, P.; Koffas, M. A. G.; Polizzi, K.; Kontoravdi, C., (2013) Assembly of Multi-gene Pathways and Combinatorial Pathway Libraries Through ePathBrick Vectors. Synthetic Biology, 1073, 107-129.

20 ACS Paragon Plus Environment

Page 21 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Table and Figure Legends Table 1. Plackett-Burman design to screen main pathway components that determine violacein pathway efficiency. -1, weak promoter; +1, strong promoter; and N.D., not detectable. All the reported value indicates mean ± SD, SD is the standard deviation of three replicates. Table 2. Box-Behnken design to determine the quadratic correlations between violacein production and the relative expression of VioAB, VioD and VioEC. -1, weak promoter; 0, medium strength promoter; +1, strong promoter; and N.D., not detectable. All the reported value indicates mean ± SD, SD is the standard deviation of three replicates. Fig.1 Construction of T7 promoter library that covers a broad range of transcriptional dynamics. (a) Synthetic degenerated oligoes used to construct the T7 promoter and lacI repressor binding region (lacO). Nx demotes x consecutive degenerated nucleotides. (b) Transcriptional activity of original mutant T7 promoter libraries screened by green fluorescence signal. (c) Representative sequence of mutant promoters. Fig. 2 Linlog transformation of discretized promoter library into logarithmically scaled variables. (a) Forward Linlog transformation. P is the absolute promoter activity (FU/OD) and X is the rescaled dimensionless promoter activity. (b) Reverse Linlog transformation. (c) Linlog rescaled promoter activity that is evenly distributed spanning across three orders of magnitude. This geometric rescaling allows us to pick the appropriate promoters for combinatorial pathway engineering. Fig. 3 Schematic representation of violacein biosynthetic pathway (a) and pictorial illustration of constructed violacein strains on agar plates that correspond to pathway no.1 through no.9 in the Box-Behnken design (b). Fig. 4 Genetic configuration of the constructed pathways that conforms to the Plackett-Burman design (Table 1). All the genes are organized in monocistronic form to ensure context21 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

independent gene expression. VioA and VioB are carried by the pETM6 vector; VioC, VioD and VioE are carried by the pCDMx vector. +1: strong promoter (P2); -1 weak promoter (P36); T: terminator; A: VioA; B: VioB; C: VioC; D: VioD; E: VioE. Fig. 5 Analysis of variance to determine how significant each of the pathway components contribute to violacein production. This data point was retrieved form the Plackett-Buramn design. Blue box indicates the 25% and 75% of the violacein production distribution; x represents the 1% and 99% of the violacein production distribution. Pink line indicates the mean value of the data and diamond point indicates the average of the data. Fig. 6 Genetic configuration of the constructed pathways that conforms to the Box-Behnken design (Table 2). VioA and VioB are organized in operon and carried by the pETM6 vector; VioE and VioC are organized in operon and carried by the pCDMx vector. +1: strong promoter (P2); 0: medium strength promoter (P44); -1 weak promoter (P36); T: terminator; A: VioA; B: VioB; C: VioC; D: VioD; E: VioE. Fig. 7 Violacein production-gene expression level phenotype-genotype landscape. (a) Violacein production in response to the expression of VioAB and VioEC in logarithmic scale. (b) Violacein production in response to the expression of VioD and VioEC in logarithmic scale. (c) Violacein production in response to the expression of VioAB and VioEC in linear scale. (d) Violacein production in response to the expression of VioD and VioEC in linear scale. Fig. 8 Summary of Box-Behnken fitting results and validation of model predictions. (a) Regression coefficients of Box-Behnken design a1 through a9 are indicated in column bars. (b) Model predicted results are in good agreement with experimental value with an adjusted coefficient of determination (R2) at 0.885. (c) Representative pictures of violacein culture from the bench-top bioreactor. Upper panel shows the E. coli suspension culture and the lower panel shows the settlement of the violacein-producing E. coli.

22 ACS Paragon Plus Environment

Page 23 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Tables and Figures Table 1. VioA

VioB

VioC

VioD

VioE

Violacein (mg/L)

Deoxyviolacein (mg/L)

-1 1 1 -1 1 1 1 -1 1 -1 -1 -1

1 1 -1 1 -1 1 -1 -1 1 -1 1 -1

-1 1 -1 -1 -1 -1 1 -1 1 1 1 1

1 -1 -1 1 1 -1 1 -1 1 -1 -1 1

-1 -1 1 1 -1 -1 1 1 1 -1 1 -1

264.3 ± 5.9 176.5 ± 6.4 298.6 ± 15.8 309.4 ± 12.4 373.2 ± 18.1 231.5 ± 7.5 247.2 ± 8.9 329.1 ± 27.7 164.6 ± 5.9 216.3 ± 19.5 141.9 ± 10.3 244.3 ± 19.2

N.D. 5.4 ± 2.1 N.D. 1.4 ± 0.2 N.D. N.D. 41.7 ± 8.9 N.D. 25.8 ± 3.6 5.8 ± 1.4 26.8 ± 1.7 9.3 ± 2.5

Table 2 VioAB

VioD

VioEC

Violacein (mg/L)

0 1 1 0 0 -1 0 -1 0 -1 0 0 1 -1 1

1 -1 1 0 1 -1 -1 0 0 0 -1 0 0 1 0

-1 0 0 0 1 0 1 -1 0 1 -1 0 1 0 -1

427.2 ± 12.3 334.5 ± 10.8 384.2 ± 3.1 474.5 ± 15.4 386.1 ± 7.6 319.1 ± 15.4 295.5 ± 15.6 434.2 ± 19.1 462.1 ± 20.4 383.6 ± 4.6 336.8 ± 19.4 468.6 ± 16.9 257.6 ± 2.1 478.1 ± 21.1 348.1 ± 4.2

Deoxyviolacein (mg/L) N.D. N.D. 11.7 ± 2.6 N.D. 9.6 ± 3.2 2.9 ± 0.5 30.4 ± 4.3 N.D. 5.4 ± 0.9 14.7 ± 1.6 N.D. 8.4 ± 2.5 29.3 ± 3.1 N.D. N.D.

23 ACS Paragon Plus Environment

ACS Synthetic Biology

a

T7 promoter

lacO

ATN5GATCN3AAAT TAATACN2CTCACTATA GG AATNGTN3CGGN3ACNATT TAN5CTAGN3TTTA ATTATGN2GAGTGATAT CC TTANCAN3GCCN3TGNTAA -35 -24 -10

T7/lacO eGFP

lacI

b 20,000 Promoter activity (F/OD)

pETM6-eGFP

16,000

ori

12,000

bla

8,000 4,000 0 2 16 4 32 11 47 10 40 17 1 35 12 18 25 19 34 15 24 20 27 30 45 23 29 38 43 6 33 14 48 44 31 42 3 9 41 5 26 7 13 36 8 22 28 21 37 39 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

c

Library # 180

190

200

210

220

230

Template ---ATNNNNNGATCNNNAAATTAATACNNCTCACTATAGGAATNGTNNNCGGNNNACNATTC 630-36_T 7F_2015-07-09_C05 AGGATTAAGGGATCCACAAATTAATACTCCTCACTATAGGAATTGTGTGCGGGTTACTATTC 630-14_T7F_2015-07-09_F04 AGGATCAGCCGATCTGGAAATTAATACGACTCACTATAGGAATTGTAGGCGGCATACTATTC 630-32_T 7F_2015-07-09_D04 AGGATGAGCTGATCCCGAAATTAATACATCTCACTATAGGAAG-GTCGGCGGTGGACCATTC 630-40_T 7F_2015-07-29_A06 AGGATGGAACGATCTATAAATTAATACACCTCACTATAGGAATTGTCTACGGTGAACTATTC

Fig. 1

24 ACS Paragon Plus Environment

Page 25 of 31

c

lg Pmax + lg Pmin 2 X= lg Pmax − lg Pmin 2 P = Pmax , X = 1

b

lg P −

x

 P 2 P =  max  Pmax Pmin  Pmin  X = 1, P = Pmax

(Eq. 1)

P = Pmin, X = −1

X = −1, P = Pmin

P = Pmax Pmin , X = 0

X = 0, P = Pmax Pmin

1.0

(Eq. 2)

1

0.5 0 0.0 -0.5 -1.0 -1 -1.5

2 16 4 10 45 23 29 38 43 14 48 44 31 42 3 9 41 5 26 7 13 36 8 22 28 21 37

a

Linlog promoter activity (a.u.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Library #

Fig. 2

25 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 3

26 ACS Paragon Plus Environment

Page 26 of 31

Page 27 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig. 4

27 ACS Paragon Plus Environment

ACS Synthetic Biology

Violacein (mg/L)

400 350 300 250 200 150 100

-1

1 VioA p = 0.8747

-1

1 VioB p = 0.0023

-1

-1

-1

1 VioC p = 0.0003

400

Violacein (mg/L)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 31

350 300 250 200 150 100

1 VioD p = 0.0458

1 VioE p = 0.8602

Fig. 5

28 ACS Paragon Plus Environment

Page 29 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig. 6

29 ACS Paragon Plus Environment

ACS Synthetic Biology

501.0 499.6 497.4 493.9 486.8 458.3 444.0 415.5 387.0 358.5 330.0 301.5 273.0

500

40 0 350 300

450 400 350 300

0

0. 8

0. 6

c

-0 -1. .8 0

Vio EC

0

0.

4

.6

-0

-0 .2

-0 -0 . 2 -0. . 4 6

1.

1.

0

0. 8

0. 6

0.

4

0

0.2 0. 0

4

.2

0

.2

-0 -1. .8 0

- 0.

oD Vi

AB

Vio EC

-0 -0 . 2 -0. . 4 6

.0

.6

0.2 0. 0

4

-0 .2

-0

- 0.

1.0

0.8

0.6 0.4

0 -1. .8 -0

.0

1.0

0.8

0.6 0.4

0 -1. .8 -0

501.0 499.3 497.5 494.1 487.1 473.3 445.5 417.8 390.0 362.3 334.5 306.8 279.0

500

0

g/L) Violacein (m

450

b g/L) Violacein (m

a

o Vi

d 496.0

500

482.1

500.0 496.0 486.2

500

468.1

356.6 328.8

350

300.9 273.0

30 0 20 00

00 40

0

12

80 00

0

0 0 0

8

40

20

00 0

16 00 0

0

20

00 0

16 00 0

B

00

20 00

0

oD Vi

1 2

0 0 0

8

0

0

80 00

40

00

0

Vi o EC

0

12

16 00 0

30 0

1 2

00 40

350

0

0

400

Fig. 7

30 ACS Paragon Plus Environment

0

00

16 00 0

00

0

Vio EC

384.5

400

472.4 444.8 417.1 389.5 361.9 334.3 306.6 279.0

450

0

g/L) Violacein (m

412.4

g/L) Violacein (m

440.3

450

oA Vi

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 31

0

Page 31 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig. 8

31 ACS Paragon Plus Environment