Subscriber access provided by UNIV OF CAMBRIDGE
Article
Probing Inter-Cell Variability using Bulk Measurements Harrison Steel, and Antonis Papachristodoulou ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.8b00014 • Publication Date (Web): 25 May 2018 Downloaded from http://pubs.acs.org on May 27, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
Probing Inter-Cell Variability using Bulk Measurements
Harrison Steel∗ and Antonis Papachristodoulou∗ Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ
E-mail:
[email protected];
[email protected] Abstract The measurement of noise is critical when assessing the design and function of synthetic biological systems. Cell-to-cell variability can be quantied experimentally using single-cell measurement techniques such as ow cytometry and uorescent microscopy. However, these approaches are costly and impractical for high-throughput parallelised experiments, which are frequently conducted using plate-reader devices. In this paper we describe reporter systems that allow estimation of the cell-to-cell variability in a biological system's output using only measurements of a cell culture's bulk properties. We analyse one potential implementation of such a system which is based upon a uorescent protein FRET reporter pair, nding that with typical parameters from the literature it is able to reliably estimate variability. We also briey describe an alternate implementation based upon an activating sRNA circuit. The feasible region of parameter values for which the reporter system can function is assessed, and the dependence of its performance on both extrinsic and intrinsic noise is investigated. Experimental realisation of these constructs can yield novel reporter systems that allow measurement of a synthetic gene circuit's output, as well as the intra-population variability of this output, at little added cost.
1
ACS Paragon Plus Environment
ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 34
A major challenge in the design of reliable synthetic biological systems is assessing and regulating the impact of variability and noise upon their behaviour.
1
In a particular system,
noise sources can be broadly classied as arising from either intrinsic (due to the stochastic nature of the system's constituent biochemical processes) or extrinsic (due to uctuations in the concentration and behaviour of cellular components with which it interacts) sources.
2,3
Extrinsic noise that impacts the behaviour of synthetic circuits can be caused by both shortand long-term phenomena: It can be introduced by temporary uctuations in cellular machinery abundance
4
of synthetic circuits,
(such as as ribosome sequestration, which impacts the translation rate
5,6
7
or uneven distribution of proteins during cell division ), as well as
epigenetic dierences in proteome and cell state that are passed on as cells divide.
810
Total
gene expression noise is thus a function of both cell-wide uctuations, as well as variability in gene-specic regulation.
11
In many cases noise can negatively inuence the function of synthetic biological circuits.
12,13
Fluctuations in the concentration of individual elements of a synthetic network
can propagate throughout the system, impacting the behaviour of other components.
4,14
This has motivated the design and implementation of synthetic systems that reduce variability,
15
for example via inclusion of feedback control architectures.
1618
At the same time,
in certain circumstances noise can be benecial: In natural systems variability between cells provides an evolutionary advantage in changing environments, tion transfer in genetic networks.
21
19,20
and can enhance informa-
Similarly, synthetic systems can be designed that benet
from the stochasticity of gene expression, accurate modelling of their behaviour.
22
making this factor essential in some cases for
23,24
When synthetic biological designs are realised experimentally, assessment of their noise properties and performance variability is therefore a critical part of their characterisation. Fluorescent reporter proteins are frequently used as an output for such systems due to the
2
ACS Paragon Plus Environment
Page 3 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
ease with which they are measured.
25
Experiments are often repeated many times (under
theoretically identical conditions) using plate-reader hardware
26
to observe variation in their
behaviour, which is then quantied in terms of the variability in a mean experimental outcome (for example, a cell culture's bulk uorescence output). However, measurement of only a cell culture's mean uorescence can disguise important but often unnoticed behaviours, such as bi-modal responses.
28
27
To accurately characterise many systems, measurement of
intra-population cell-to-cell variability is thus required,
27
for which techniques such as uo-
rescence microscopy and ow cytometry (which can measure individual uorescence levels of a large number of cells) are employed extensively.
29
Recently, improvements in the capabil-
ities of these single-cell measurement technologies have facilitated high-throughput studies of cell-to-cell variability.
30
However, single-cell measurement techniques remain time- and
equipment-intensive, making their use unfavourable if large numbers of samples must be measured at regular time intervals, as is the case when parallelised experiments are used to test synthetic circuits over a range of input parameter combinations (often done using a plate reader).
This experimental challenge motivates the aim of this paper:
To design a synthetic
biological reporter circuit that allows characterisation of cell-to-cell variability using only measurements of a cell culture's mean behaviour. Such a reporter could be used in any synthetic biological system with a uorescent protein output to give an estimate of cell-to-cell variability at little added cost.
This would be of great utility for many high-throughput
noise-measurement experiments (which currently require ow cytometry) as it would allow them to be performed eciently and in parallel using a plate-reader device. We achieve noise estimation via a biological implementation of a multiplication function that can be used to measure a variable's mean value, as well as its mean squared value, at a population level. It is then possible to estimate the cell-to-cell standard deviation of a system's output, which previously required measurements at a single-cell level. Thus, though our system does not
3
ACS Paragon Plus Environment
ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 34
return the complete shape of the expression distribution (as does ow cytometry), for cases in which the distribution mean and width are the primary parameters of interest (as in many past studies
13,15,17,18
) it can provide an ecient experimental alternative.
We begin the paper with a description of the simple statistics behind this estimation procedure, showing how measurement of only bulk culture parameters can allow quantication of cell-to-cell variability. In the case of log-normally distributed behaviour (as is often found in biological systems
31
), we demonstrate that this provides a direct estimate of the
width of the system's output distribution.
We then describe a uorescent-protein FRET
implementation for such a circuit (an alternate implementation using an activating sRNA circuit is described in Supplementary Section 2), and discuss expected parameter values for its operation. The FRET system's variance-estimating performance is then analysed, and its operational parameter range and sensitivity to both intrinsic and extrinsic noise is assessed. These results are discussed with reference to potential experimental conditions in which our reporter system might be employed, and potential sources for error (as well as alternate implementations which might minimise these) are outlined.
Noise Distributions and Quantication We consider a measurable random biological variable of interest,
X,
which might (for exam-
ple) correspond to the uorescence output from a single cell due to production of a uorescent reporter protein within that cell. The expectation of number of cells. In this context
X , E[X],
is its mean value over a large
E[X] would (approximately) correspond to the total uores-
cence of a cell colony divided by the number of cells of which it is comprised, hence giving the average uorescence per cell.
Many factors are not considered in this approximation,
such as the attenuation of emitted uorescence as it passes through experimental media (thereby making cells further from a detector device appear slightly dimmer), however these
4
ACS Paragon Plus Environment
Page 5 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
secondary eects will be ignored in our current analysis. The variance of a random variable
X
is given by:
32
V ar[X] = E[X 2 ] − E[X]2 , where
(1)
E[X 2 ] is the expected (arithmetic mean) value of the variable X
to as the second moment of the random variable
X.
squared, also referred
In the current context,
X2
would cor-
respond to the squared uorescence value of a single cell.
Values of
X
for individual cells will vary within a population, and when a large number
of cells are measured individually (i.e. at a single-cell level) the approximate distribution of
X
can be ascertained. For many variables of interest in biological systems, such as the level
of mRNA or protein produced, this distribution is observed to be approximately log-normal during steady growth
33
cells in stationary phase
(though it may be better described by a normal distribution for
34
). It has been shown that a log-normal distribution can emerge in
such systems due to the inherent complexity of the noisy biochemical processes involved.
2 A log-normal distribution for a given variable will be denoted Lognormal(µ,σ ), where
σ
µ and
are the mean and standard deviation respectively of the variable's natural logarithm. The
nth
moment of a log-normal distribution can be calculated analytically:
1
2 σ2
E[X n ] = enµ+ 2 n
where of
31
n
35
,
(2)
is a positive integer. In reality we will not be able to measure the absolute value
E[X 2 ],
but rather a value calculated by our reporter system that is proportional to its
value. We denote the measured second moment
Em [X 2 ] = γm E[X 2 ],
where
γm
reects this
proportionality and is termed the 2nd moment gain.
Experimentally, variation between cells in the value of a parameter in terms of the width of the distribution when the variable
5
ACS Paragon Plus Environment
X
X
is often quantied
is plot on a log scale (i.e. the
ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
parameter
σ ).
Page 6 of 34
We can derive an expression for this variable for the log-normal distribution
in terms of measurable quantities, giving:
s Em [X 2 ] σ = ln − ln(γm ), Em [X]2 where
σ
on
Em [X]
Em [X 2 ]
changes in
is the measured rst moment (the mean). The square-root-log dependence of is worth noting, as it means small changes in
Em [X 2 ].
σ
will be measurable as large
Even if we do not estimate/measure the parameter
an increasing function of quantity
(3)
Em [X 2 ]/Em [X]2
Em [X 2 ]/Em [X]2 grows, so does
γm , we note that σ
is
(when the argument is real), meaning that as the
σ.
If the underlying distribution is not log-normal
and/or unknown we can estimate the population standard deviation using (1), though again estimation of
γm
is necessary.
Noise strength (fano factor) is a common parameter of interest when characterising the variability of biological systems, and can be used to discriminate between dierent mechanisms contributing to noise in protein abundance.
36,37
This parameter, dened as the ratio of
the expression variance to its mean is constant for a perfectly poissonian process.
38
However,
for most biological process (for which a log-normal distribution is anticipated) the coecient of variation provides a better measure of gene-expression noise.
37
The (arithmethic) Co-
ecient of Variation (CV ) is dened as the ratio of the arithmetic standard-deviation to the mean (SD[X]/E[X] where between these parameters.
39
SD[X] =
p V ar[X])
due to the linear relationship observed
We therefore have:
CV =
p
eσ2 − 1 ≈ σ
where the last approximation uses the truncated Taylor-series small sigma (σ
. 0.8) .
(4)
ex ≈ 1 + x
which is valid for
This highlights the appropriateness of the coecient of variation for
assessing noise in biological systems exhibiting log-normal behaviour: It is proportional to
6
ACS Paragon Plus Environment
Page 7 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
the width of the expression distribution when plot on a log scale.
We have thus outlined a method for determining the level of variability in a parameter
X,
as is often measured via ow cytometry, but here only using measurements of bulk
parameters. To build an estimator of
Em [X 2 ]
we must create a system with fundamental
structure:
X
where we can both measure
X,
−− +X) −* −Y
and the variable
Y ∝ X 2.
We now describe and analyse a
possible FRET implementation for such a variance estimating circuit. A second potential implementation, based upon an activating sRNA circuit, is described in Supplementary Section 2.
A Biological Implementation: FRET Reporter Pair One approach to measuring a variable (Em [X
2
])
X
in terms of its mean (Em [X]) and second moment
is to use a dimerising FP-FRET (Fluorescent Protein - Fluorescence Resonance
Energy Transfer) protein pair that can approximate a multiplication function (Fig. 1a). In this case we have chosen the mClover3 (C ) and mRuby3 (R) uorescent proteins, other possibilities exist. promoter
Px ,
41
40
but many
Both uorescent fusion proteins are expressed in tandem from the
for which we aim to measure the variability in transcription. This variability
may arise from any upstream processes (e.g. gene regulatory networks) interacting with
Px .
Figure 1
When spatially separated, the uorescent proteins have their usual excitation/emission behaviour. However, when they are brought into close proximity (∼
10nm
or less
41
) it is
possible to excite the complex at the lower excitation wavelength of mClover3 (termed the
7
ACS Paragon Plus Environment
ACS Synthetic Biology
a)
Px mClover
mRuby
Δ. λC
Δ. λR
δC
/0
δR KOFF
KOFF
mClover
/0
mRuby
(C)
(R)
KON δF
/0
mClover mRuby
(F) 1
b)
0.8
Fss/Css
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 34
10-
0x =1 N
KO
0.6
K ON
0.4
Figure 1:
a)
x10-4
KON =2.5
0.2 0
=6
4
x10-4
KON =1 x10-4 KON = 0.2 x10-4
0
0.5
1
Δ
1.5
A FRET Reporter system for analysis of gene expression variability.
Structure of the system, in which the concentration of a FRET-active dimer (F ) is
approximately proportional to the square of the concentration of its unbound components
C
and
R. C
and
R
each consist of a fusion of a uorescent protein (mClover3 and mRuby3
respectively) with a protein binding domain (brown).
Dashed arrows represent reactions
and their directions (with rate parameters as dened in the text), and degradation/dilution reactions (δ ) represent removal of species to the null state
∅.
The DNA operon (top) consists
of a promoter (Px ), Ribosome Binding Sites (RBS, dark-green), genes (labelled), and a transcriptional terminator. (7)) on the value of
∆.
FSS /CSS (as dened by (6) and KON in units of s−1 . For an ideal
The dependence of the ratio
Colours denote dierent values of
multiplication operation values of
b)
FSS /CSS
would be linear in
∆,
which is best satised for small
KON .
donor ), and measure a change in uorescence in the emission spectrum of mRuby3 (the acceptor ). This occurs via non-radiative energy transfer through long-range dipole to dipole
interactions when the urophores are proximal. FRET exist,
41
42
A diversity of approaches to measuring
though we will concentrate on ratiometric measurements of FRET intensity
which may be performed using a plate reader (or uorescence microscope or ow cytometer).
43
8
ACS Paragon Plus Environment
Page 9 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
To co-localise the two uorescent proteins such that FRET can occur we fuse them to complementary protein binding domains, as has been done in a number of FRET-based protein localisation studies.
44,45
Each uorescent protein can therefore exist in isolation, or
they can dimerise to form a complex
F
(Fig. 1a). The binding strength of the complementary
domains represents the major parameter for tuning in our system, which could initially be investigated experimentally using a pair with ligand-dependent binding. properties will determine the forward and backward rates (KON and
45,46
Their binding
KOF F ) for the reaction:
KON
C +R− )−−* −− F KOF F
If we assume the concentrations of both
C
and
R
variability that we aim to analyse (the parameter
are proportional to the gene expression
∆
introduced at
C
provides multiplicative ability when the concentrations of
and
Px ),
R
then this system
are greater than
F.
Following the terminology of the previous section the concentration (and hence uorescence) of either mClover or mRuby is a proxy for the variable concentration of
F
represents the variable
Y.
X
which we aim to measure, and the
We can model this system using a system of
dierential equations of the form:
where and
F
C, R
C˙ = ∆λC − δC C − KON CR + KOF F F
(5a)
R˙ = ∆λR − δR R − KON CR + KOF F F
(5b)
F˙ = KON CR − KOF F F − δF F
(5c)
are the concentrations of the mClover and mRuby fusion proteins respectively,
is the concentration of the bound FRET complex.
introduced by
Px
which we aim to measure,
λC,R
is the random scaling factor
are lumped transcription/translation rates
(we model these in a single step for simplicity), and corresponding species.
∆
δC,R,F
are degradation rates for the
By setting the time derivatives to zero it is possible to solve the
system of equations (5) analytically to nd the steady-state (ss) concentrations of each
9
ACS Paragon Plus Environment
ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 34
complex, giving:
Css , Rss =
∆λC,R − δF Fss . δC,R
(6)
and
Fss = a(b + ∆c − (∆2 d + 2bc∆ + b2 )0.5 )
(7)
where:
a =1/(2KON δF2 )
(8a)
b =δC δR (KOF F + δF )
(8b)
c =KON δF (λC + λR )
(8c)
2 d =KON δF2 (λC − λR )2
(8d)
For an ideal multiplication operation to occur, we aim to tune system parameters such that
2 2 Fss ∝ ∆2 , which would then mean (approximately) that Fss ∝ Css , Rss .
Observing that
2 ∂Fss ∆=0 ) and rst derivative ( ∂∆ |∆=0 ) are zero, to achieve Fss ∝ ∆
the constant term (Fss |
we
thus desire:
∂2 Fss ab2 (c2 − d) = ∼ Constant, ∂∆2 (∆2 d + 2bc∆ + b2 )1.5 for all
∆.
This criteria is best satised if
(9)
b is large and c and d are small.
large by setting the protein degradation terms to be fast (i.e.
δC,R,F
Here
b can be made
are large). However,
tuning these parameters can be challenging (and may introduce noise): A more eective approach is to focus on of
d
setting
c
and
λC = λR ). KON
d
which can be made small by reducing
KON
is the rate at which the fusion proteins
C
(and in the case and
R
dimerise,
which can be reduced by making the binding of their attached protein domains weak. By minimising
Fss
this constraint also achieves
C, R ∝ ∆
analysis of the opposite scenario, in which a large form, and thus
Fss
Fss
in (6). This follows from an intuitive
means most protein is in its dimerised
will be proportional to whichever of
10
Css
ACS Paragon Plus Environment
or
Rss
is smaller.
Page 11 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
These results can be rearranged to give:
Fss =
which if
λR = λC
and
δR = δC
∆(λR − λC ) + δC Css KON Css · KOF F + δF δR
(10)
(i.e. both fusion proteins are expressed and degraded at equal
rates), simplies to:
Fss = giving a straight-forward estimate of
KOF F
we have
γm ≈ 1/Kd ,
KON 2 C 2 ≈ γm Css KOF F + δF ss γm
for our simulated system. In the case where
Kd = KOF F /KON
where
(11)
δF