Subscriber access provided by UNIV OF DURHAM
Letter
Rapid and Programmable Protein Mutagenesis Using Plasmid Recombineering Sean Andrew Higgins, Sorel Ounkap, and David F Savage ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00112 • Publication Date (Web): 14 Jul 2017 Downloaded from http://pubs.acs.org on July 15, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
1
Rapid and Programmable Protein Mutagenesis Using Plasmid
2
Recombineering
3
Sean A. Higgins1, Sorel Ouonkap1, David F. Savage1,2,*
4 5
1
Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, 94720, USA
6
2
Department of Chemistry, UC Berkeley, Berkeley, CA, 94720, USA
7 8
* To whom correspondence should be addressed. Email:
[email protected].
9
Address: 2151 Berkeley Way, Berkeley, CA 94720; (510) 643-7847
10 11 12 13
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
14
Page 2 of 39
ABSTRACT
15
Comprehensive and programmable protein mutagenesis is critical for
16
understanding structure-function relationships and improving protein function.
17
However, current techniques enabling comprehensive protein mutagenesis are
18
based on PCR and require in vitro reactions involving specialized protocols and
19
reagents. This has complicated efforts to rapidly and reliably produce desired
20
comprehensive
21
recombineering is a simple and robust in vivo method for the generation of protein
22
mutants for both comprehensive library generation as well as programmable
23
targeting of sequence space. Using the fluorescent protein iLOV as a model target,
24
we build a complete mutagenesis library and find it to be specific and
25
comprehensive, detecting 99.8% of our intended mutations. We then develop a
26
thermostability screen and utilize our comprehensive mutation data to rapidly
27
construct a targeted and multiplexed library that identifies significantly improved
28
variants, thus demonstrating rapid protein engineering in a simple protocol.
protein
libraries.
Here
we
demonstrate
ACS Paragon Plus Environment
that
plasmid
Page 3 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
29 30 31
ACS Synthetic Biology
KEYWORDS recombineering, protein mutagenesis, directed evolution, iLOV, fluorescence thermostability
32
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
33
INTRODUCTION
34
Directed mutagenesis of a desired protein is an important technique both for
35
understanding structure-function relationships as well as improving protein
36
function for research, biotechnology, and medical applications. For example,
37
techniques like deep mutational scanning, where every position in a protein is
38
mutated to all possible amino acids, can be applied to understand key variants
39
associated with disease1, while targeted mutagenesis of proteins such as Green
40
Fluorescent Protein (GFP) have expanded our capacity to visualize many biological
41
processes2. The ability to generate comprehensive mutation libraries and
42
programmed libraries focused on specific locations or amino acids is crucial to these
43
applications.
44
In order to address these needs, many in vitro based approaches have been
45
developed. Firnberg and Ostermeier have built libraries composed almost entirely
46
of single mutations using specialized protocols based on uracil-containing template
47
DNA3, while Melnikov and Mikkelsen constructed a comprehensive library by
48
splitting one gene into many different regions small enough to be synthesized on a
49
programmable microarray, followed by multiplexed in vitro recombination4. Belsare
50
and Lewis have demonstrated targeted, combinatorial library construction using
51
alternating cycles of fragment and joining PCR5. Recently, Wrenbeck and Whitehead
52
introduced nicking mutagenesis, using specialized nucleases to selectively degrade
53
the wild type (WT) template DNA6.
54
An alternative approach would be to incorporate synthetic oligonucleotides
55
in vivo directly into a gene of interest in a programmable fashion. In E. coli,
ACS Paragon Plus Environment
Page 4 of 39
Page 5 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
56
oligonucleotides introduced into the cell via electroporation can recombine with the
57
genome or resident plasmids with the help of the lambda phage protein Beta, in a
58
process termed recombineering7. Mechanistically, it is thought that Beta-bound
59
oligonucleotides anneal to the replication fork of replicating deoxyribonucleic acid
60
(DNA) and are subsequently incorporated into the daughter strand, thus directly
61
encoding mutations into a new DNA molecule8. Recombineering is therefore a
62
compelling method for genetic manipulation. Cheap and easily obtained standard
63
oligonucleotides are the only varying input and the protocol - mixing
64
oligonucleotides in pooled reactions - is straightforward.
65
This process was shown to be capable of mutating the E. coli genome for
66
rapid metabolic engineering in a process termed Multiplexed Automated Genome
67
Engineering (MAGE), which used multiple rounds of recombineering to increase the
68
penetrance of mutations9. Other work has demonstrated that thousands of pooled,
69
barcoded oligonucleotides can be used, in parallel, to modify the expression of >
70
95% of E. coli genes and map their effect on fitness10. More recent studies have
71
combined recombineering with the programmable DNA nuclease Cas9, as a means
72
of enforcing mutational penetrance, to mutate tens of thousands of loci in parallel
73
with high efficiency11.
74
Despite its success in genome engineering, recombineering of plasmids is
75
relatively uncharacterized12. Plasmid recombineering (PR) is of particular interest
76
in protein mutagenesis as plasmids are easily shuttled between different strains and
77
organisms for cloning and screening. Notably, recombineering strains achieve
78
enhanced mutation efficiency by knocking out mismatch repair and possess a higher
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
79
genome-wide mutation rate13, which can complicate screens or selections sensitive
80
to suppressor mutations. The use of plasmids, however, uncouples protein variation
81
from any background mutation in the genome. Importantly, Nyerges et al. found that
82
recombineering itself produces few if any off-target mutations14. Thomason et al.
83
have previously demonstrated that PR is capable of generating mutations,
84
insertions, and deletions with efficiencies comparable to genomic recombineering.
85
We reasoned that the principles of MAGE – multiplexed reactions and multiple
86
mutation rounds – would be applicable to PR as well.
87
To benchmark comprehensive PR for protein engineering we sought to
88
measure the efficiency, bias, and overall performance of saturation mutagenesis on
89
the small protein iLOV. iLOV is a 110 residue protein derived from the Light,
90
Oxygen, Voltage (LOV) domains of the A. thaliana phototropin 2 protein15. The
91
native LOV domain binds flavin mononucleotide (FMN) and uses this co-factor as a
92
photosensor to direct downstream signal transduction. Mutational analysis has
93
revealed that a cysteine to alanine substitution in the FMN binding site interrupts
94
the native photocycle and instead dramatically increases the protein’s fluorescent
95
properties. iLOV is an ideal candidate for further engineering because fluorescent
96
proteins that don’t require molecular oxygen for chromophore maturation are a
97
desirable alternative to green fluorescent protein. In previous experiments, DNA
98
shuffling was used to isolate iLOV, a variant that has six amino acid mutations
99
relative to the wild-type phototropin 2 LOV2 sequence and an improved
100
fluorescence quantum yield of 0.4415. Additional approaches to engineer further
101
improved iLOV variants have also relied on error-prone PCR and DNA shuffling,
ACS Paragon Plus Environment
Page 6 of 39
Page 7 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
102
missing much of the possible sequence-space16. Due to the potential utility of iLOV
103
and its comparatively limited engineering relative to other fluorescent proteins, we
104
hypothesized iLOV could serve as an excellent model system for exploring the utility
105
of PR. Finally, the gene length of iLOV is exceptionally short (330 bp) and analysis of
106
iLOV libraries is suited to deep sequencing. Current paired-end sequencing covers
107
the entirety of the open reading frame and can accurately identify all mutations to a
108
single sequenced plasmid. This provides insight into the mechanisms and utility of
109
recombineering.
110
Here we demonstrate that PR is capable of constructing both comprehensive
111
protein libraries and targeted mutagenesis libraries focusing on a small section of
112
sequence space. We built a complete mutagenesis library of iLOV and found it to be
113
specific and unbiased, detecting 99.8 % of our intended mutations. We explored this
114
fitness landscape in the context of thermostability using a plated-based screen that
115
allowed us to identify many desirable thermostabilizing mutations. To demonstrate
116
the iterative and programmable nature of our platform, we designed and built a
117
multiplexed library focused on these mutational hotspots and isolated significantly
118
more stable variants. In total, this work demonstrates that plasmid recombineering
119
is a rapid and robust method for the generation of protein mutants for both
120
unbiased, comprehensive libraries and programmable targeting of specific regions
121
in sequence space.
122 123
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
124
RESULTS AND DISCUSSION
125
A single round of PR generates a comprehensive mutation library
126
To generate a comprehensive mutation library of iLOV, oligonucleotides
127
were designed to target the gene within a high copy ColE1 plasmid (Supp. Figure 1).
128
The target plasmid contained a promoterless iLOV coding region to prevent growth
129
biases between mutants possessing different fitness during multiple rounds of
130
transformation and outgrowth. Note that this step does introduce some additional
131
complexity, such as additional time between library generation and screening and
132
the potential to lose library diversity. This step is not inherent to the
133
recombineering method, however, but was chosen in order to most accurately
134
investigate the naive iLOV library. 10 ng of target plasmid was mixed with an
135
equimolar mixture of 110 recombineering oligonucleotides, one primer for each
136
codon in iLOV (Figure 1A). These oligonucleotides were 60 bp long and
137
complementary to the lagging strand, which was previously demonstrated to be
138
more efficient than targeting the leading strand17. Oligos contained a centrally
139
located NNM mutation codon, (where N = A/C/G/T and M = A/C) which encodes all
140
amino acids except methionine and tryptophan. Tryptophan, in particular, is known
141
to quench flavin fluorescence in flavoproteins18 and was excluded from the library.
142
Although modified oligonucleotides have been shown to enhance recombineering
143
efficiency, e.g. phosphorothioation, standard oligonucleotides were used to
144
minimize cost and complexity9.
145
Initial experiments confirmed that PR can be used to generate diverse
146
libraries in a programmable fashion. The plasmid and oligonucleotide mixture was
ACS Paragon Plus Environment
Page 8 of 39
Page 9 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
147
first electroporated into the recombineering strain EcNR29, grown overnight and
148
miniprepped. As observed in Thomason et al.12 we found that a fraction of the
149
plasmids had converted into multimers two or three times the length of the target
150
plasmid. Because we intended to perform multiple rounds of PR, we chose to gel
151
extract the monomeric form of the plasmid. Notably, this step is unnecessary if the
152
target gene is subsequently isolated by PCR or subcloning, especially since
153
multimeric forms are often successfully modified plasmid molecules12. Deep
154
sequencing of the recovered library, hereafter termed the Round 1 library, revealed
155
substantial mutagenesis compared to the non-recombineered plasmid (Supp. Figure
156
2). Further analysis revealed that, while the majority of reads were WT iLOV
157
sequence, 29% of reads contained a single amino acid mutation (Supp. Table 1).
158
Note that the true recombineering efficiency for introducing a single
159
nonsynonymous mutations is likely ~ 23% after accounting for sequencing errors
160
relative to the negative control.These mutations covered every position in the
161
protein and nearly all targeted amino acid conversions were observed (Figure 1B).
162
1867 out of the possible 1870 single residue mutations were detected in the Round
163
1 library (Supp. Table 1). Sequencing errors do not significantly contribute to the
164
coverage of single nonsynonymous mutations (Supp. Figure 3)..
165
Further founds of PR were used to increase the penetrance of mutations.
166
Although the Round 1 library covered targeted mutations comprehensively, it was
167
roughly 61% WT iLOV sequence. Furthermore, the programmable nature of PR
168
allows the construction of more targeted libraries with combinatorial multiplexing
169
of high fitness mutations, such as would be useful in directed evolution (Figure 1C).
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
170
Four additional rounds of recombineering were performed to reduce the WT
171
fraction of the library and investigate the distribution of variants with 2+ mutations.
172
The number of reads with codon mutations increased substantially with further
173
rounds of recombineering (Figure 1D). In the Round 5 library, single mutations
174
were the most common (33%), with the remainder composed of WT sequences
175
(26%) and sequences containing 2+ mutations (41%).
176 177
Recombineering libraries can be finely controlled to alter the composition of
178
mutations
179
In the absence of prior information, the ideal mutagenesis technique would
180
produce every mutant targeted with equal frequency. Because each variant is
181
initially present in the same amount, a uniform library requires the least amount of
182
screening (or selection) in order to isolate improved mutants. A non-uniform
183
method, in contrast, might produce a highly variable distribution of mutants and
184
requires more screening or selection in order to fully explore sequence space. We
185
therefore analyzed our sequencing data to characterize the uniformity and sequence
186
preferences of PR libraries.
187
We detected two distinct types of bias: positional bias and mismatch bias. In
188
positional bias, oligonucleotides targeted to different positions in the coding
189
sequence incorporate with characteristically different efficiencies (regardless of the
190
mutation produced at that position). In mismatch bias, oligonucleotides that are
191
more similar to the WT sequence (e.g. differing only at the first base of the targeted
192
codon) incorporated with high efficiency while more divergent oligos (e.g. different
ACS Paragon Plus Environment
Page 10 of 39
Page 11 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
193
at all three positions) incorporated with lower efficiency. Positional bias is evident
194
when comparing the frequencies of single codon mutations across all 110 codons in
195
iLOV (Figure 2A). Despite the nearly 60 bp of homology between the oligonucleotide
196
and the template plasmid, oligos targeting adjacent codons can exhibit a 2-3 fold
197
different incorporation. The largest such discrepancy is found at codon 42, which is
198
mutated 5.7 times more frequently than codon 41. Variation in efficiency has been
199
observed in other recombineering studies and was somewhat correlated with the
200
oligonucleotide binding energy9. While our data was not clearly correlated with
201
binding energy (Supp. Figure 4), a biological replicate revealed replicable positional
202
bias (Supp. Figure 5), suggesting the presence of an underlying physical mechanism
203
for positional bias.
204
Comprehensive mutagenesis applications, such as deep mutational scanning,
205
ideally begin with uniformly distributed mutants in a naïve library. We
206
hypothesized that the positional bias observed in the Round 1 library – constructed
207
using equimolar mutagenic oligonucleotides – could be corrected by altering the
208
mixture of oligonucleotides used in the electroporation step. A second library was
209
therefore constructed by normalizing the concentration of each oligonucleotide in
210
the library according to the frequency of mutations obtained at the corresponding
211
position in Round 1. This library was constructed using PR, and the first 40 amino
212
acids were deep sequenced. The normalized library was indeed more uniform, with
213
a largest adjacent codon discrepancy of 2.1 fold, compared to 3.3 fold in the
214
equimolar replication library (Figure 2B).
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
215
In order to understand the importance of this normalization in a quantitative
216
fashion, we performed a simulation to evaluate the pragmatic impact of
217
oligonucleotide normalization on effective library size – the number of samples that
218
must be taken from a library to achieve a desired representation of the library
219
diversity. In the ideal case, i.e. mutants are found in the library with equal frequency,
220
if one wishes to sample, say, 95% of the diversity contained within the library, we
221
must screen roughly three times the number of distinct members. That is, the
222
effective library size is threefold the targeted library size. Our equimolar and
223
normalized libraries are not uniformly distributed, but random sampling from our
224
sequenced data in silico allows calculation of the effective library size. Our
225
simulation revealed a substantial improvement in sampling efficiency, from a mean
226
effective library size of 139 in the original library to 107 in the normalized library,
227
which is quite close to that of an ideal uniformly distributed library (Supp. Figure 6).
228
The presence of mismatch bias in the library demonstrates that
229
recombineering favors incorporation of oligos that are more similar to the WT
230
template sequence. Our oligonucleotides all contained one, two, or three nucleotide
231
mismatches relative to WT iLOV. After computationally enumerating all possible
232
oligonucleotides for each codon in iLOV, we expected that 14% of oligonucleotides
233
would contain one mismatch, 43% two, and 43% three. In contrast, oligonucleotide-
234
template pairs with two or three mismatches were observed at only 31% and 22%
235
respectively, while single nucleotide mismatches were overrepresented by more
236
than threefold at 47% (Figure 2C). This value is inflated due to the presence of
237
sequencing errors, of which the vast majority are single mismatches, but is still
ACS Paragon Plus Environment
Page 12 of 39
Page 13 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
238
significantly higher than expected given the error rate in the negative control (Supp.
239
Table 1). Together, mismatch bias and positional bias are substantial sources of
240
variation in the library, accounting for roughly 60% of model variance (Supp. Figure
241
7).
242
While a single round of PR generates many single mutants, additional rounds
243
shift the distribution to increasing mutation numbers. The Round 5 library, for
244
example, is 25% double mutants. These mutations are well represented, with 5817
245
out of a possible 5940 locations detected (Figure 2D). However, it is clear that the
246
same positional bias seen in the Round 1 library is preserved in subsequent rounds.
247
This is to be expected if recombineering events are statistically independent, and
248
can be seen by the correlation of double-mutation hotspots with single mutation
249
positional bias (Supp. Figure 8). This data suggests that double mutations in a
250
normalized library will be far more uniformly distributed.
251
The double mutation data also indicates that some recombineering events
252
are not perfectly independent. A plot of the pairwise distance between all detected
253
double mutants reveals an uneven distribution in frequency (Figure 2E).
254
Specifically, double mutations are less likely to be within 30 bp (i.e. 10 amino acids)
255
of one another. This effect becomes more pronounced with additional rounds of
256
recombineering (Supp. Figure 9). In the Round 5 library, this ‘zone of exclusion’ is
257
significant enough that double mutations a few amino acids apart are nearly four
258
times less frequent than double mutations with a much larger separation (e.g. three
259
amino acid gap versus 10 amino acid gap). We hypothesize that this bias is due to
260
the mechanism of recombineering which requires oligonucleotides to anneal to a
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
261
complementary locus via homology arms flanking the NNM mutagenic codon. In this
262
mechanism, sequential oligonucleotides would ‘overwrite’ previous mutations due
263
to incorporation of the most recent oligonucleotide’s homology arms, which extend
264
30 bp on either side of the central NNM. Another potential mechanism that disfavors
265
incorporation of nearby double mutants is mutational reduction of oligo
266
incorporation efficiency. In this hypothetical mechanism, mutations generated in
267
early rounds of PR could reduce the homology and, therefore, incorporation
268
efficiency of oligos in subsequent PR rounds.
269 270
Screening the iLOV library identifies mutations conferring thermostability
271
Previous screening and structural work indicates that a well-packed binding
272
site for the FMN fluorophore may lead to improved photochemical properties of
273
iLOV, such as photostability, by limiting the dynamics of the FMN chromophore and
274
its ability to dissipate energy following excitation16. Additionally, searches for
275
improved LOV-based fluorescent reporters have turned up homologous variants
276
such as CreiLOV that, while brighter (50% greater quantum yield), exhibit
277
substantial toxicity upon expression19. More generally, Tawfik and colleagues have
278
theorized that thermostable proteins serve as more fruitful starting points for
279
engineering and directed evolution20. In this view, variants with greater
280
thermostability can better tolerate mutations, increasing the likelihood of observing
281
mutations that improve protein function in a manner unrelated to thermostability.
282
We thus investigated the thermostability of our library, which contained nearly
283
every single amino acid mutation and a small fraction of possible double mutations.
ACS Paragon Plus Environment
Page 14 of 39
Page 15 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
284
We created a plate-based assay to screen for thermostabilized variants in the
285
Round 5 library. The library was plated at high colony density onto standard LB-
286
Agar, grown overnight, then incubated at 60 °C for two hours (Figure 3A). This
287
treatment completely abrogates the fluorescence of WT iLOV as well as nearly all
288
mutants (Figure 3B). Some colonies remained fluorescent, however, and these were
289
recovered and expressed in 96-well plate format. Cultures expressing these library
290
members were lysed, clarified, and analyzed for thermostability. Fluorescence
291
measurements were taken every 0.5° C during a temperature ramp from 25° C to
292
95° C and used to calculate the melting temperature (Tm) for each clone, the
293
temperature at which 50% of maximal fluorescence is retained (Figure 3C). All 93
294
assayed clones demonstrated a substantial increase in thermostability relative to
295
iLOV (Figure 3D). Improvements in Tm ranged from two to nearly ten degrees C.
296 297 298
Recombineering based multiplexing allows rapid and robust directed evolution
299
While comprehensive single mutant libraries are ideal for exploring
300
structure-function relationships in an unbiased manner, hypothesis-driven
301
investigation requires targeted libraries for exploring the effect of mutations at
302
specific sites of interest. Because PR targets only sites programmed by synthetic
303
oligonucleotides, we hypothesized that PR could generate a specific library in a cost-
304
effective and straightforward protocol.
305
To demonstrate the rapid multiplexing capability of PR, we designed a
306
second library containing the top 25 most frequent thermostabilizing mutations
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 39
307
from the initial plate screen (Figure 4A). Several of these mutations consisted of
308
alternative amino acids at the same position, and in these cases a different
309
oligonucleotide was designed for each. The encoding oligonucleotides were
310
designed such that homology arms would stop short of neighboring mutations so as
311
not to overwrite them (Supp. Figure 10).
312
This
multiplexed
library
resulted
in
striking
improvements
to
313
thermostability, with the best variants having Tm values nearly 20° C greater than
314
iLOV (Figure 4B). Isolating and re-cloning these variants verified that despite the
315
significant increase in Tms, the shape of their melt curves was not significantly
316
different from that of iLOV, even for the most thermostable mutant (Figure 4C). It
317
was noted during the screening process that one variant in particular, hereafter
318
referred to as thermostable LOV (tLOV), seemed to produce abnormally bright
319
lysate under high expression conditions. tLOV and iLOV were expressed and
320
purified in parallel and their absorbance and emission were characterized in vitro.
321
To accurately quantify relative quantum yield, the emission curves were normalized
322
for absorption at 450 nm and integrated. tLOV was found to be approximately 10%
323
brighter (Supp. Figure 11), and sequencing revealed the presence of four mutations
324
scattered throughout the protein, all introduced by PR (Figure 4D). Notably, none of
325
these locations are directly within the FMN binding pocket, and their contribution to
326
thermostability or quantum yield improvement are not obvious. Thus, it would have
327
been difficult to rationally design tLOV. Moreover, a targeted mutagenesis technique
328
is required for isolating quadruple mutants reliably: iLOV is a small protein, but a
329
quadruple mutant library would contain > 1013 variants, well beyond our current
ACS Paragon Plus Environment
Page 17 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
330
screening capacities. Finally, thermostability was verified by differential scanning
331
calorimetry of iLOV vs tLOV (Supp. Figure 12).
332
An ideal method for generating comprehensive protein libraries would be
333
simple and robust, enabling both complete and targeted mutagenesis without a
334
change in reagents. In this study, we demonstrate that PR can be effectively used for
335
both comprehensive and programmable mutagenesis of the fluorescent protein
336
iLOV, using methods that are generalizable to any gene of interest.
337
Previous recombineering studies have successfully built libraries in the
338
genome, demonstrating specificity and fine control of mutational composition.
339
Wang et al.9 used multiple rounds of recombineering to mutagenize six consecutive
340
nucleotides using 90 bp oligonucleotides and found a 75% mutation rate after five
341
rounds. We find quite similar behavior in the sequencing analysis of the iLOV
342
recombineered plasmid libraries constructed here. Our Round 5 library consisted of
343
74% mutant variants, and the mutations were well distributed in sequence space.
344
The modest positional bias could be substantially ameliorated by altering the ratios
345
of oligonucleotides added to the electroporation mixture. This resulted in a nearly
346
uniform distribution of mutations such that the effective library size was almost
347
ideal (Supp. Figure 6). This approach could easily be used to accommodate more
348
complex libraries with weighted mutation frequencies at various locations. The
349
correlation of replicate libraries (Supp. Figure 5) strongly suggests an underlying
350
physical mechanism for differing oligonucleotide recombineering efficiencies. It will
351
be important to understand this effect in order to predict efficiencies rather than
352
rely on empirical data as used here, which required additional sequencing and labor.
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
353
Regardless of these modifications, the reagent cost and experimental effort remain
354
low – standard 60 bp oligonucleotides and simple cycles of electroporation, growth,
355
and plasmid isolation.
356
Additional biases resulting from the mechanism of recombineering were
357
detected and while their magnitude was smaller, their effect on library size and
358
screening can be significant. Previous work has found that recombineering
359
efficiency drops sharply with the size of the modification made9. Here, too, single
360
nucleotide mutations were observed to be more common than expected. This effect
361
alters the distribution of codons present in the final library, with the template
362
codons determining this frequency shift. Notably, mismatch bias is intrinsic to the
363
annealing of oligonucleotides and is likely present for in vitro methods as well.
364
The ‘zone of exclusion’ around a first mutation generates another mode of
365
bias. A second mutation is less likely to appear inside this zone than outside of it.
366
This effect became more pronounced with increasing rounds of recombineering. Is it
367
likely this results from oligonucleotides’ homology arms overwriting earlier
368
mutations during later rounds of PR. In other words, we hypothesized that
369
homology arms are capable of introducing revertant mutations. We thus predict that
370
the length of homology arms would impact the length of the ‘zone of exclusion.’ As
371
some amount of homology is absolutely required for recombineering, no form of
372
recombineering is suitable for efficiently generating adjacent mutations from
373
different oligonucleotides. This limitation can likely be overcome by using single
374
oligonucleotides encoding sequential mutations but at the cost of reduced efficiency
375
due to mismatch bias. Wang et al.9 observed similar mutation rates for between one
ACS Paragon Plus Environment
Page 18 of 39
Page 19 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
376
and four mismatches, but found efficiency dropped by ~half when incorporating
377
mismatches of five or ten nucleotides. Again, if this effect fundamentally stems from
378
the annealing of oligonucleotides then it is likely present for in vitro methods as
379
well.
380
In vitro methods represent the most powerful alternatives for generating
381
comprehensive mutation libraries. The chief advantage of these approaches is that
382
they can generate a library composed mostly of mutated variants. Ostermeier and
383
colleagues accomplished this by utilizing specialized protocols based on uracil-
384
containing template DNA3, while Wrenbeck and Whitehead selectively degrade the
385
WT strand using single-strand nicking endonucleases6. Such methods have been
386
used to generate nearly comprehensive libraries of genes for exploring the entirety
387
of the fitness landscape21, and to comprehensively examine the possible
388
evolutionary pathways leading from one allele to another22.
389
In directed protein evolution, iterative rounds of mutagenesis can be used to PCR-based protocols3,
390
multiplex fitness-improving
391
synthesis4, and some other recombineering techniques11 excel at generating
392
libraries composed of single mutations. However, in many applications, 2+
393
mutations are desired at non-contiguous locations. Recent work has developed a
394
PCR-based method to accomplish this goal in vitro5, and we hypothesized that PR
395
was well suited to serve as a complementary approach in vivo, doing away with
396
cloning altogether. To this end, we comprehensively explored the iLOV single
397
mutation sequence-space for thermostability, selected the fitness enhancing
398
mutations, and demonstrated the utility of PR for advanced protein engineering by
mutations.
ACS Paragon Plus Environment
direct
gene
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
399
multiplexing many different single and double mutations at discontinuous sites
400
across iLOV in a second library. The ability to select and easily mutate numerous
401
specific and non-contiguous locations across a protein is highly useful for a variety
402
of techniques that utilize experimental or phylogenetic data to computationally
403
predict and enhance enzymes23, explore epistatic interactions24, or even scan SNPs
404
in human proteins for disease prediction1.
405
iLOV engineering has been relatively limited in comparison to other
406
fluorescent proteins15. Because the domain has been taken out of its natural
407
structural context, we hypothesized that its thermal stability could be increased.
408
Consistent with this idea, many mutations were found to improve the thermal
409
stability of iLOV up to a robust, 10° C increase in Tm. These improvements were then
410
stacked by multiplexing the 25 most common mutations from the first screen. As a
411
measure of convenience, the same pooled, single electroporation protocol was used,
412
although iterative rounds of transformation and outgrowth set a lower limit on
413
throughput – typically ~16 hours per cycle. One particular variant among the
414
thermostabilized pool, tLOV, was found to be ~10% brighter that iLOV in vitro. This
415
result is consistent with previous work demonstrating that constraining the FMN
416
fluorophore can improve the photochemical properties of iLOV16. It would be
417
interesting to perform comprehensive mutagenesis of the thermostabilized iLOV
418
mutants in search of further improvements to the protein’s brightness or red/blue
419
spectral shifting, as increased thermostability has been hypothesized to permit
420
greater exploration of function-altering mutations20.
ACS Paragon Plus Environment
Page 20 of 39
Page 21 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
421
In summary, we have demonstrated that PR retains many of the ideal
422
properties of genome recombineering, including specificity and programmability.
423
We found that PR was suited for the construction of both comprehensive and
424
targeted libraries, and that the simplicity of the protocol led to rapid and reliable
425
screening experiments. In particular, PR is suitable for cycles of iterative design,
426
construction, and sampling of genetic libraries, requiring no specialized reagents or
427
protocols. We developed a thermostability screen of the fluorescent protein iLOV
428
and used the resulting mutation data to rapidly construct a multiplexed library that
429
identified significantly improved variants, including the first enhancement to the
430
protein’s brightness since its development.
431
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
432
MATERIALS AND METHODS
433
Strains and Media
Page 22 of 39
434
Strain EcNR2 (Addgene ID: 26931)9 was used for generating PR libraries in
435
plasmid pSAH031 (Addgene ID: 90330). For thermostability screening and protein
436
expression, iLOV libraries were cloned into pTKEI-Dest (Addgene ID: 79784)25 using
437
Golden Gate cloning26 with restriction enzyme BsmbI (NEB) and transformed into
438
either Tuner (Novagen) or XJ b Autolysis E. coli (Zymo Research). Unless otherwise
439
stated, strains were grown in standard LB (Teknova) supplemented with kanamycin
440
(Fisher) at 60 µg/mL.
441
Recombineering Library Construction
442
Libraries were constructed using a modified protocol from Wang 201127.
443
Briefly,
110
oligonucleotides
(Supp.
Table
2)
or
444
oligonucleotides (Supp. Table 3) were mixed and diluted in water. A final volume of
445
50 µL of 2 µM oligonucleotides, plus 10 ng of pSAH031, was electroporated into 1
446
mL of induced and washed EcNR2 using a 1 mm electroporation cuvette (BioRad
447
GenePulser). A Harvard Apparatus ECM 630 Electroporation System was used with
448
settings 1800 kV, 200 Ω, 25 µF. Three replicate electroporations were performed,
449
then individually allowed to recover at 30° C for 2 hr in 1 mL of SOC (Teknova)
450
without antibiotic. LB and kanamycin was then added to 6 mL final volume and
451
grown overnight. Cultures were miniprepped (QIAprep Spin Miniprep Kit) and
452
monomer plasmids were isolated by agarose gel electrophoresis and gel extraction
453
(QIAquick Gel Extraction Kit) to remove multimer plasmids12. The three replicates
454
were then combined, completing a round of PR.
ACS Paragon Plus Environment
25
thermostabilizing
Page 23 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
455
ACS Synthetic Biology
Library Sequencing and Analysis
456
The iLOV open reading frame was amplified from PR libraries by PCR to add
457
indices and priming sequences for deep sequencing (Supp. Table 4). PCR products
458
were sequenced on Illumina platform sequencers (MiSeq and HiSeq) through the
459
Berkeley Genomics Sequencing Laboratory. Sequencing data were analyzed with a
460
custom MATLAB pipeline. Briefly, reads were filtered to remove those that were of
461
low quality, frameshifted, or did not exactly match the annealing portion of the
462
amplifying primers. A detailed description of sequencing analysis can be found in
463
Supporting Methods. Finally, full-length reads were compared with the iLOV target
464
sequence for mutation analysis.
465
In silico Effective Library Size Simulation
466
Simulations were performed using a custom MATLAB script. The simulations
467
focused on amino acid positions 5 – 40, for which we possess comprehensive
468
frequency data from the equimolar oligonucleotide and normalized PR experiments.
469
Successive sampling, with replacement, of these 36 positions was performed, with
470
each position’s likelihood commensurate with the observed mutation frequency.
471
Once 34 out of 36 positions had been observed in the simulation, which we
472
arbitrarily define as ‘well-sampled’ (94.4% of the library diversity), the total
473
number of samples taken was recorded as the effective library size. This process
474
was then repeated 104 times for each library to generate a distribution of effective
475
library sizes (Supp. Figure 6).
476
Thermostability Screening
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 39
477
Colonies were screened on 10 cm dishes containing standard LB-agar with
478
kanamycin. Tuner cells expressing the library were found to be fluorescent in the
479
absence of induction after 24 hours. Plates were incubated at 60° C for 2 hours, after
480
which the vast majority of colonies were no longer fluorescent. Approximately
481
500,000 colonies were screened, of which 244 remained fluorescent. These colonies
482
were pooled, miniprepped, and deep sequenced to identify protein mutations. This
483
DNA was also transformed into XJb cells to recover individual variants. 93 colonies
484
were then grown overnight in 96-well deep well plates with 1 mL of LB + kanamycin
485
supplemented with 100 µM Isopropyl β-D-1-thiogalactopyranoside (IPTG) and 3
486
mM arabinose at 37° C. Cultures were frozen and thawed to lyse the cells, then
487
clarified by centrifugation. Supernatant was analyzed in a StepOnePlus™ Real-Time
488
PCR System (Applied Biosystems) to estimate a Tm for each variant.
489
Protein Expression and in vitro Characterization
490
iLOV variants were expressed and lysed in XJb cells as above but in 100 mL
491
volume. Excess FMN (Sigma) was added to the lysate to ensure all proteins
492
possessed
493
chromatography using HisPur™ Ni-NTA Resin (Thermo Scientific). Variants were
494
filtered using Vivaspin 6 3,000 molecular weight cut-off (Sartorius) with phosphate
495
buffered saline (PBS) pH 7.4 (Gibco). Variants were further purified by size
496
exclusion chromatography using a NGC Chromatography System (Bio-Rad). Purified
497
proteins were stored at 4° C in PBS. For quantum yield determination, emission
498
between 460 nm and 600 nm was measured in a FluoroLog Spectrophotometer
499
(Horiba), and 450 nm absorbance was measured in an Infinity M1000 PRO
ligand.
iLOV
variants
were
then
purified
ACS Paragon Plus Environment
by
nickel
affinity
Page 25 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
500
monochromator (Tecan). Emission curves were integrated in MATLAB and
501
normalized for absorbance. Thermostability measurements were made in a Nano
502
differential scanning calorimeter (TA Instruments).
503
Accession Codes
504
Sequence Read Archive: Sequencing data have been deposited under
505
accession
numbers
SAMN07204029,
SAMN07204030,
506
SAMN07204042, SAMN07204073, and SAMN07204112.
507 508
ACS Paragon Plus Environment
SAMN07204032,
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
509
Figure Legends:
510
Figure 1: Plasmid Recombineering (PR) of iLOV generates a specific and
511
comprehensive mutation library. (A) Cartoon of comprehensive PR accomplished
512
using synthetic oligonucleotides tiled across the target gene. (B) Frequency of single
513
amino acid mutations mapped by residue and location in the Round 1 library. Black
514
indicates the WT iLOV residue. White indicates no detected reads. (C) Cartoon of
515
programmable PR targeting a specific sequence space. Sampling of the highest
516
fitness mutants can inform subsequent library design, with recombineering oligos
517
specifically targeting mutations of interest. (D) Mutational distribution of the library
518
after additional rounds of recombineering.
519 520 521
Figure 2: The recombineering mechanism produces moderate bias within the
522
library and can be manipulated by varying the delivered oligonucleotide
523
concentration. (A) Frequency of single amino acid mutations across iLOV in the
524
Round 1 library. (B) Biological replicate of Round 1 library with normalized
525
oligonucleotide concentrations. (C) Observed vs. expected distribution of one, two,
526
or three nucleotide mismatches among single amino acid mutations. The 95%
527
confidence intervals for these measurements are all smaller than +/- 0.0025
528
(normal approximation to the binomial distribution). (D) Frequency of double
529
mutations in the Round 5 library. White = not detected. Highly represented rows
530
and columns arise from positional bias (Supp. Figure 8). (E) Frequency of double
531
mutations in the Round 5 library as a function of the pairwise distance between
ACS Paragon Plus Environment
Page 26 of 39
Page 27 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
532
double mutations. For each distance, data has been normalized for the number of
533
possible double mutations. Error bars, standard deviation.
534 535 536
Figure 3: A plate-based thermostability screen identifies mutations that improve
537
iLOV fluorescence at elevated temperatures. (A) Cartoon of the thermostability
538
screen assay and subsequent hit validation procedure. (B) Representative
539
fluorescent images of library colonies before and after temperature challenge. (C)
540
Representative fluorescent thermal melt curves of lysate for two thermostable hits.
541
(D) Histogram of Tms for 93 thermostabilized iLOV variants.
542 543 544
Figure 4: Multiplexing thermostabilizing mutations rapidly identifies doubly
545
improved iLOV variants. (A) 25 thermostabilizing mutations mapped to the iLOV
546
protein sequence. (B) Histogram of initial thermostable hits (Blue) superimposed on
547
hits from the multiplexed library (Cyan). (C) Representative melt curves of two
548
thermostabilized variants compared to iLOV. (D) Crystal structure of iLOV (PDB
549
4EES) indicating the location of four mutations found in tLOV: R11S, I35V, R90F,
550
G106V.
551
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
552 553 554
SUPPORTING INFORMATION The Supporting Information is available free of charge on the ACS Publications website at DOI:
555
Figure S1: Map of iLOV PR target plasmid. Figure S2: Internal control for
556
sequence analysis pipeline. Figure S3: Effect of sequencing errors on library
557
coverage. Figure S4: Recombineering efficiency correlation with binding energy or
558
hairpin formation. Figure S5: Recombineering efficiency correlation between
559
replicates. Figure S6: Simulation of effective library size in silico. Figure S7: Fraction
560
of total library variance accounted for by mismatch and positional bias. Figure S8:
561
Positional bias correlation with double mutations. Figure S9: ‘Zone of Exclusion’ for
562
Rounds 1,3, and 5. Figure S10: Design of multiplexed library oligonucleotides. Figure
563
S11: Emission scan of purified iLOV and tLOV. Figure S12: Thermostability of iLOV
564
and tLOV via differential scanning calorimetry. Table S1: Library coverage statistics
565
for negative control, Round 1, and Round 5. Table S2: PR oligonucleotides for
566
comprehensive iLOV library. Table S3: PR oligonucleotides for multiplexed
567
thermostability library. Table S4: PCR primers used for Illumina sequencing. Table
568
S5: Nucleotide and amino acid sequences of iLOV and tLOV.
569
ACS Paragon Plus Environment
Page 28 of 39
Page 29 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
570
ACS Synthetic Biology
AUTHOR INFORMATION
571
Corresponding Author
572
Email:
[email protected]. Address: 2151 Berkeley Way, Berkeley, CA
573
94704; (510) 643-7847
574
Author Contributions
575
S.A.H. and D.F.S designed the research. S.A.H. and S.O. performed the
576
experiments. S.A.H. performed the computational analysis and analyzed the data.
577
S.A.H. and D.F.S wrote the paper. Reagents described in this work are available on
578
Addgene (https://www.addgene.org/David_Savage/), MATLAB scripts used for
579
sequencing analysis are available online at https://github.com/savagelab.
580
Notes
581
The authors declare no competing financial interest.
582
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
583
ACKNOWLEDGMENTS
584
We thank H. Wang (Columbia) for providing the E. coli strain EcNR2. We
585
would like to thank A. Flamholz and B. Oakes for productive discussions and
586
readings of the manuscript, and C. Cassidy-Amstutz for assistance with in vitro
587
biochemistry. This work was supported by NIH New Innovator award
588
1DP2EB018658-01 to D.F.S., and S.A.H. was supported by NIH Training Grant
589
5T32GM066698-10 and Agilent.
590
ACS Paragon Plus Environment
Page 30 of 39
Page 31 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
591
REFERENCES
592
(1) Majithia, A. R., Tsuda, B., Agostini, M., Gnanapradeepan, K., Rice, R., Peloso, G.,
593
Patel, K. A., Zhang, X., Broekema, M. F., Patterson, N., Duby, M., Sharpe, T., Kalkhoven,
594
E., Rosen, E. D., Barroso, I., Ellard, S., Kathiresan, S., O’Rahilly, S., Chatterjee, K.,
595
Florez, J. C., Mikkelsen, T., Savage, D. B., and Altshuler, D. (2016) Prospective
596
functional classification of all possible missense variants in PPARG. Nat. Genet. 48,
597
1570–1575.
598
(2) Heim, R., Cubitt, a B., and Tsien, R. Y. (1995) Improved green fluorescence.
599
Nature 373, 663–664.
600
(3) Firnberg, E., and Ostermeier, M. (2012) PFunkel: Efficient, Expansive, User-
601
Defined Mutagenesis. PLoS One 7, e52031.
602
(4) Melnikov, A., Rogov, P., Wang, L., Gnirke, A., and Mikkelsen, T. S. (2014)
603
Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent
604
fitness landscapes. Nucleic Acids Res. 42, e112.
605
(5) Belsare, K. D., Andorfer, M. C., Cardenas, F., Chael, J. R., Park, H. J., and Lewis, J. C.
606
(2016) A Simple Combinatorial Codon Mutagenesis Method for Targeted Protein
607
Engineering. ACS Synth. Biol. 6, 416–420.
608
(6) Wrenbeck, E. E., Klesmith, J. R., Stapleton, J. A., Adeniran, A., Tyo, K. E. J., and
609
Whitehead, T. A. (2016) Plasmid-based one-pot saturation mutagenesis. Nat.
610
Methods 13, 928–930.
611
(7) Copeland, N. G., Jenkins, N. a, and Court, D. L. (2001) Recombineering: a powerful
612
new tool for mouse functional genomics. Nat. Rev. Genet. 2, 769–779.
613
(8) Mosberg, J. a., Lajoie, M. J., and Church, G. M. (2010) Lambda red recombineering
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
614
in Escherichia coli occurs through a fully single-stranded intermediate. Genetics 186,
615
791–799.
616
(9) Wang, H. H., Isaacs, F. J., Carr, P. a, Sun, Z. Z., Xu, G., Forest, C. R., and Church, G. M.
617
(2009) Programming cells by multiplex genome engineering and accelerated
618
evolution. Nature 460, 894–898.
619
(10) Warner, J. R., Reeder, P. J., Karimpour-Fard, A., Woodruff, L. B. a, and Gill, R. T.
620
(2010) Rapid profiling of a microbial genome using mixtures of barcoded
621
oligonucleotides. Nat. Biotechnol. 28, 856–862.
622
(11) Garst, A. D., Bassalo, M. C., Pines, G., Lynch, S. a, Halweg-Edwards, A. L., Liu, R.,
623
Liang, L., Wang, Z., Zeitoun, R., Alexander, W. G., and Gill, R. T. (2016) Genome-wide
624
mapping of mutations at single-nucleotide resolution for protein, metabolic and
625
genome engineering. Nat. Biotechnol. 35, 48–55.
626
(12) Thomason, L. C., Constantino, N., Shaw, D. V., and Court, D. L. (2007) Multicopy
627
plasmid modification with phage λ red recombineering. Plasmid 58, 148–158.
628
(13) Turrientes, M. C., Baquero, F., Levin, B. R., Martinez, J. L., Ripoll, A., Gonzalez-
629
Alba, J. M., Tobes, R., Manrique, M., Baquero, M. R., Rodriguez-Dominguez, M. J.,
630
Canton, R., and Galan, J. C. (2013) Normal Mutation Rate Variants Arise in a Mutator
631
(Mut S) Escherichia coli Population. PLoS One 8, e72963.
632
(14) Nyerges, Á., Csörgő, B., Nagy, I., Bálint, B., Bihari, P., Lázár, V., Apjok, G.,
633
Umenhoffer, K., Bogos, B., Pósfai, G., and Pál, C. (2016) A highly precise and portable
634
genome engineering method allows comparison of mutational effects across
635
bacterial species. Proc. Natl. Acad. Sci. 113, 2502–2507.
636
(15) Chapman, S., Faulkner, C., Kaiserli, E., Garcia-Mata, C., Savenkov, E. I., Roberts, A.
ACS Paragon Plus Environment
Page 32 of 39
Page 33 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
637
G., Oparka, K. J., and Christie, J. M. (2008) The photoreversible fluorescent protein
638
iLOV outperforms GFP as a reporter of plant virus infection. Proc. Natl. Acad. Sci. U.
639
S. A. 105, 20038–20043.
640
(16) Christie, J. M., Hitomi, K., Arvai, A. S., Hartfield, K. a., Mettlen, M., Pratt, A. J.,
641
Tainer, J. a., and Getzoff, E. D. (2012) Structural tuning of the fluorescent protein
642
iLOV for improved photostability. J. Biol. Chem. 287, 22295–22304.
643
(17) Lim, S. I., Min, B. E., and Jung, G. Y. (2008) Lagging Strand-Biased Initiation of
644
Red Recombination by Linear Double-Stranded DNAs. J. Mol. Biol. 384, 1098–1105.
645
(18) Callis, P. R., and Liu, T. (2006) Short range photoinduced electron transfer in
646
proteins: QM-MM simulations of tryptophan and flavin fluorescence quenching in
647
proteins. Chem. Phys. 326, 230–239.
648
(19) Mukherjee, A., Weyant, K. B., Agrawal, U., Walker, J., Cann, I. K. O., and
649
Schroeder, C. M. (2015) Engineering and characterization of new LOV-based
650
fluorescent proteins from chlamydomonas reinhardtii and vaucheria frigida. ACS
651
Synth. Biol. 4, 371–377.
652
(20) Tokuriki, N., and Tawfik, D. S. (2009) Stability effects of mutations and protein
653
evolvability. Curr. Opin. Struct. Biol. 19, 596–604.
654
(21) Firnberg, E., Labonte, J. W., Gray, J. J., and Ostermeier, M. (2014) A
655
comprehensive, high-resolution map of a Gene’s fitness landscape. Mol. Biol. Evol. 31,
656
1581–1592.
657
(22) Steinberg, B., and Ostermeier, M. (2016) Environmental changes bridge
658
evolutionary valleys. Sci. Adv. 2, e1500921–e1500921.
659
(23) Heinzelman, P., Snow, C. D., Wu, I., Nguyen, C., Villalobos, A., Govindarajan, S.,
ACS Paragon Plus Environment
ACS Synthetic Biology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
660
Minshull, J., and Arnold, F. H. (2009) A family of thermostable fungal cellulases
661
created by structure-guided recombination. Proc. Natl. Acad. Sci. U. S. A. 106, 5610–
662
5615.
663
(24) Olson, C. A., Wu, N. C., and Sun, R. (2014) A comprehensive biophysical
664
description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24,
665
2643–2651.
666
(25) Nadler, D. C., Morgan, S.-A., Flamholz, A., Kortright, K. E., and Savage, D. F.
667
(2016) Rapid construction of metabolite biosensors using domain-insertion
668
profiling. Nat. Commun. 7, 12266.
669
(26) Engler, C., Kandzia, R., and Marillonnet, S. (2008) A one pot, one step, precision
670
cloning method with high throughput capability. PLoS One 3, e3647.
671
(27) Wang, H., and Church, G. (2011) Multiplexed genome engineering and
672
genotyping methods applications for synthetic biology and metabolic engineering.
673
Methods Enzym. 498, 409–426.
674
ACS Paragon Plus Environment
Page 34 of 39
a
ACS Synthetic Biology
b
Target Gene in Plasmid Amino Acid (One Letter Code)
Saturation Recombineering Targeting Plasmids
Comprehensive Mutation Library
Wild Type
A C D E F G H I K L N P Q R S T V Y
512 256 128 64 32 16 8 4 2
20
40
60
80
1
100
Not Detected
Amino Acid in iLOV Open Reading Frame
d
Selected Oligos Encode Mutations of Interest
Targeted Recombineering
Iterative Cycles Improving Fitness
Selection or Screen
0.7 Fraction of Total Library
1 2 3 4 5 6 7 8 9 10 11 12 13 14c 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
Oligos Encode Any Library of Interest
0.6 0.5 0.4 0.3 0.2 0.1
Highest Fitness Mutants Final Improved Variant
Frequency of Amino Acid Mutation (Read Counts)
Page 35 of 39
Discard Nonfunctional Variants
0 Num
1
2
ber o
f Rou
3 nds
of Re
4
ACS Paragon Plus Environment
5
5
com
bine
ering
3
4
cid oA
in
f Am
er o
b Num
0
1
2
ns
atio
Mut
ACS Synthetic Biology
0.035 0.03
0.02 0.015 0.01 0.005 0
20
40
3 Mismatches
3 2 Mismatches
2
1 Mismatch
1 Target Gene in Plasmid
0
10
20
30
40
e
0
512
512
32 16 8 4
Frequency of Double Mutations (Normalized to Expected Frequency)
64
Observed Expected
0.5 0.4 0.3 0.2 0.1 0
1
2
3
Number of Mismatches
Zone of Exclusion ~ Ten Amino Acids
2.5
256
Frequency of Amino Acid Mutation Codon Position of Mutation 1 (Read Counts)
128
100
Amino Acid in iLOV Open Reading Frame
Wild Type
256
80
c
Normalized Library
4
60 Amino Acid in iLOV Open Reading Frame
Frequency of Nucelotide Mutation
0
Frequency of Double Mutant Position (Read Counts)
1 2 3 4 5 6 7 8 9 b 10 11 12 13 14 15 16 17 18 19 d 20 21 22 23 24 25 26 27 28 29 30 31 0 32 100 Frame 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
Page 36 of 39
0.025
Frequency of Amino Acid Mutation (Normalized to Mean)
Frequency of Amino Acid Mutation
a
128
Target Gene in Plasmid
64 32
55
1.5
16 8 4
2
1
0.5
1
1
110 Not Detected Not Detected 0
2
55
110
0
0
Codon Position of Mutation 2
ACS Paragon Plus Environment
20
40
60
80
100
Distance Between Double Mutations (Amino Acids)
Page 37 of 39 Fluorescence Temperature Screen
Isolation of Hits and Tm Determination
Fluorescent Variants Selected
60° Celsius Two Hours
Tm
Temperature
Protein Expression and Cell Lysis
b
c
Pre-Incubation
Post-Incubation
iLOV E2P E2I Q87A
0.9 0.8 0.7 0.6 0.5 0.4 0.3 50
60
70
Temperature (° Celsius)
ACS Paragon Plus Environment
80
Temperature Ramp and Fluorescence Monitoring
d 25 Number of Hits
500,000 Colonies Screened
Fraction of Initial Fluorescence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
ACS Synthetic Biology
Fluorescence
a
20 15 10
iLOV
5 0 62
64
66
68
70
Fluorescence Tm (° Celsius)
72
a
Page A 38 I G P T V F A P PV M IEKNFVITDPRLPDNPIIFASDGFLELTEYSREEILGRNARFLQGPETDQATVQKIRDAIRDQRETTVQLINYTKSGKKFWNLLHLQPVRDQKGELQYFIGVQLDGSDHV 20
10
1
of 39
A S
N T
30
60
50
40
80
70
90
100
110
I35V
d
c 1 25
Fraction of Initial Fluorescence
Number of Hits
1 2 3 b 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
ACS Synthetic Biology
T S N A
P CC LLS
20 15 10 5 0
65
70
75
Fluorescence Tm (° Celsius)
80
0.9
R11S
0.8 0.7 iLOV tLOV Best Hit
0.6 0.5 0.4 50
60
70
R90F G106V
80
90
Temperature (° Celsius)
ACS Paragon Plus Environment
Oligos Encode Any Library of Interest
Page 39 of ACS 39 Synthetic Biology
1 2 3 4 5 6
Saturation Recombineering
Targeted Recombineering
ACS Paragon Plus Environment Comprehensive Mutation Library
Specific Mutation Library