ACS Synthetic Biology - ACS Publications - American Chemical Society

Jul 14, 2017 - †Department of Molecular and Cell Biology, and ‡Department of Chemistry, UC Berkeley, Berkeley, California 94720, United States...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF DURHAM

Letter

Rapid and Programmable Protein Mutagenesis Using Plasmid Recombineering Sean Andrew Higgins, Sorel Ounkap, and David F Savage ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00112 • Publication Date (Web): 14 Jul 2017 Downloaded from http://pubs.acs.org on July 15, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

1

Rapid and Programmable Protein Mutagenesis Using Plasmid

2

Recombineering

3

Sean A. Higgins1, Sorel Ouonkap1, David F. Savage1,2,*

4 5

1

Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, 94720, USA

6

2

Department of Chemistry, UC Berkeley, Berkeley, CA, 94720, USA

7 8

* To whom correspondence should be addressed. Email: [email protected].

9

Address: 2151 Berkeley Way, Berkeley, CA 94720; (510) 643-7847

10 11 12 13

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

14

Page 2 of 39

ABSTRACT

15

Comprehensive and programmable protein mutagenesis is critical for

16

understanding structure-function relationships and improving protein function.

17

However, current techniques enabling comprehensive protein mutagenesis are

18

based on PCR and require in vitro reactions involving specialized protocols and

19

reagents. This has complicated efforts to rapidly and reliably produce desired

20

comprehensive

21

recombineering is a simple and robust in vivo method for the generation of protein

22

mutants for both comprehensive library generation as well as programmable

23

targeting of sequence space. Using the fluorescent protein iLOV as a model target,

24

we build a complete mutagenesis library and find it to be specific and

25

comprehensive, detecting 99.8% of our intended mutations. We then develop a

26

thermostability screen and utilize our comprehensive mutation data to rapidly

27

construct a targeted and multiplexed library that identifies significantly improved

28

variants, thus demonstrating rapid protein engineering in a simple protocol.

protein

libraries.

Here

we

demonstrate

ACS Paragon Plus Environment

that

plasmid

Page 3 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

29 30 31

ACS Synthetic Biology

KEYWORDS recombineering, protein mutagenesis, directed evolution, iLOV, fluorescence thermostability

32

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

33

INTRODUCTION

34

Directed mutagenesis of a desired protein is an important technique both for

35

understanding structure-function relationships as well as improving protein

36

function for research, biotechnology, and medical applications. For example,

37

techniques like deep mutational scanning, where every position in a protein is

38

mutated to all possible amino acids, can be applied to understand key variants

39

associated with disease1, while targeted mutagenesis of proteins such as Green

40

Fluorescent Protein (GFP) have expanded our capacity to visualize many biological

41

processes2. The ability to generate comprehensive mutation libraries and

42

programmed libraries focused on specific locations or amino acids is crucial to these

43

applications.

44

In order to address these needs, many in vitro based approaches have been

45

developed. Firnberg and Ostermeier have built libraries composed almost entirely

46

of single mutations using specialized protocols based on uracil-containing template

47

DNA3, while Melnikov and Mikkelsen constructed a comprehensive library by

48

splitting one gene into many different regions small enough to be synthesized on a

49

programmable microarray, followed by multiplexed in vitro recombination4. Belsare

50

and Lewis have demonstrated targeted, combinatorial library construction using

51

alternating cycles of fragment and joining PCR5. Recently, Wrenbeck and Whitehead

52

introduced nicking mutagenesis, using specialized nucleases to selectively degrade

53

the wild type (WT) template DNA6.

54

An alternative approach would be to incorporate synthetic oligonucleotides

55

in vivo directly into a gene of interest in a programmable fashion. In E. coli,

ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

56

oligonucleotides introduced into the cell via electroporation can recombine with the

57

genome or resident plasmids with the help of the lambda phage protein Beta, in a

58

process termed recombineering7. Mechanistically, it is thought that Beta-bound

59

oligonucleotides anneal to the replication fork of replicating deoxyribonucleic acid

60

(DNA) and are subsequently incorporated into the daughter strand, thus directly

61

encoding mutations into a new DNA molecule8. Recombineering is therefore a

62

compelling method for genetic manipulation. Cheap and easily obtained standard

63

oligonucleotides are the only varying input and the protocol - mixing

64

oligonucleotides in pooled reactions - is straightforward.

65

This process was shown to be capable of mutating the E. coli genome for

66

rapid metabolic engineering in a process termed Multiplexed Automated Genome

67

Engineering (MAGE), which used multiple rounds of recombineering to increase the

68

penetrance of mutations9. Other work has demonstrated that thousands of pooled,

69

barcoded oligonucleotides can be used, in parallel, to modify the expression of >

70

95% of E. coli genes and map their effect on fitness10. More recent studies have

71

combined recombineering with the programmable DNA nuclease Cas9, as a means

72

of enforcing mutational penetrance, to mutate tens of thousands of loci in parallel

73

with high efficiency11.

74

Despite its success in genome engineering, recombineering of plasmids is

75

relatively uncharacterized12. Plasmid recombineering (PR) is of particular interest

76

in protein mutagenesis as plasmids are easily shuttled between different strains and

77

organisms for cloning and screening. Notably, recombineering strains achieve

78

enhanced mutation efficiency by knocking out mismatch repair and possess a higher

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

79

genome-wide mutation rate13, which can complicate screens or selections sensitive

80

to suppressor mutations. The use of plasmids, however, uncouples protein variation

81

from any background mutation in the genome. Importantly, Nyerges et al. found that

82

recombineering itself produces few if any off-target mutations14. Thomason et al.

83

have previously demonstrated that PR is capable of generating mutations,

84

insertions, and deletions with efficiencies comparable to genomic recombineering.

85

We reasoned that the principles of MAGE – multiplexed reactions and multiple

86

mutation rounds – would be applicable to PR as well.

87

To benchmark comprehensive PR for protein engineering we sought to

88

measure the efficiency, bias, and overall performance of saturation mutagenesis on

89

the small protein iLOV. iLOV is a 110 residue protein derived from the Light,

90

Oxygen, Voltage (LOV) domains of the A. thaliana phototropin 2 protein15. The

91

native LOV domain binds flavin mononucleotide (FMN) and uses this co-factor as a

92

photosensor to direct downstream signal transduction. Mutational analysis has

93

revealed that a cysteine to alanine substitution in the FMN binding site interrupts

94

the native photocycle and instead dramatically increases the protein’s fluorescent

95

properties. iLOV is an ideal candidate for further engineering because fluorescent

96

proteins that don’t require molecular oxygen for chromophore maturation are a

97

desirable alternative to green fluorescent protein. In previous experiments, DNA

98

shuffling was used to isolate iLOV, a variant that has six amino acid mutations

99

relative to the wild-type phototropin 2 LOV2 sequence and an improved

100

fluorescence quantum yield of 0.4415. Additional approaches to engineer further

101

improved iLOV variants have also relied on error-prone PCR and DNA shuffling,

ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

102

missing much of the possible sequence-space16. Due to the potential utility of iLOV

103

and its comparatively limited engineering relative to other fluorescent proteins, we

104

hypothesized iLOV could serve as an excellent model system for exploring the utility

105

of PR. Finally, the gene length of iLOV is exceptionally short (330 bp) and analysis of

106

iLOV libraries is suited to deep sequencing. Current paired-end sequencing covers

107

the entirety of the open reading frame and can accurately identify all mutations to a

108

single sequenced plasmid. This provides insight into the mechanisms and utility of

109

recombineering.

110

Here we demonstrate that PR is capable of constructing both comprehensive

111

protein libraries and targeted mutagenesis libraries focusing on a small section of

112

sequence space. We built a complete mutagenesis library of iLOV and found it to be

113

specific and unbiased, detecting 99.8 % of our intended mutations. We explored this

114

fitness landscape in the context of thermostability using a plated-based screen that

115

allowed us to identify many desirable thermostabilizing mutations. To demonstrate

116

the iterative and programmable nature of our platform, we designed and built a

117

multiplexed library focused on these mutational hotspots and isolated significantly

118

more stable variants. In total, this work demonstrates that plasmid recombineering

119

is a rapid and robust method for the generation of protein mutants for both

120

unbiased, comprehensive libraries and programmable targeting of specific regions

121

in sequence space.

122 123

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

124

RESULTS AND DISCUSSION

125

A single round of PR generates a comprehensive mutation library

126

To generate a comprehensive mutation library of iLOV, oligonucleotides

127

were designed to target the gene within a high copy ColE1 plasmid (Supp. Figure 1).

128

The target plasmid contained a promoterless iLOV coding region to prevent growth

129

biases between mutants possessing different fitness during multiple rounds of

130

transformation and outgrowth. Note that this step does introduce some additional

131

complexity, such as additional time between library generation and screening and

132

the potential to lose library diversity. This step is not inherent to the

133

recombineering method, however, but was chosen in order to most accurately

134

investigate the naive iLOV library. 10 ng of target plasmid was mixed with an

135

equimolar mixture of 110 recombineering oligonucleotides, one primer for each

136

codon in iLOV (Figure 1A). These oligonucleotides were 60 bp long and

137

complementary to the lagging strand, which was previously demonstrated to be

138

more efficient than targeting the leading strand17. Oligos contained a centrally

139

located NNM mutation codon, (where N = A/C/G/T and M = A/C) which encodes all

140

amino acids except methionine and tryptophan. Tryptophan, in particular, is known

141

to quench flavin fluorescence in flavoproteins18 and was excluded from the library.

142

Although modified oligonucleotides have been shown to enhance recombineering

143

efficiency, e.g. phosphorothioation, standard oligonucleotides were used to

144

minimize cost and complexity9.

145

Initial experiments confirmed that PR can be used to generate diverse

146

libraries in a programmable fashion. The plasmid and oligonucleotide mixture was

ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

147

first electroporated into the recombineering strain EcNR29, grown overnight and

148

miniprepped. As observed in Thomason et al.12 we found that a fraction of the

149

plasmids had converted into multimers two or three times the length of the target

150

plasmid. Because we intended to perform multiple rounds of PR, we chose to gel

151

extract the monomeric form of the plasmid. Notably, this step is unnecessary if the

152

target gene is subsequently isolated by PCR or subcloning, especially since

153

multimeric forms are often successfully modified plasmid molecules12. Deep

154

sequencing of the recovered library, hereafter termed the Round 1 library, revealed

155

substantial mutagenesis compared to the non-recombineered plasmid (Supp. Figure

156

2). Further analysis revealed that, while the majority of reads were WT iLOV

157

sequence, 29% of reads contained a single amino acid mutation (Supp. Table 1).

158

Note that the true recombineering efficiency for introducing a single

159

nonsynonymous mutations is likely ~ 23% after accounting for sequencing errors

160

relative to the negative control.These mutations covered every position in the

161

protein and nearly all targeted amino acid conversions were observed (Figure 1B).

162

1867 out of the possible 1870 single residue mutations were detected in the Round

163

1 library (Supp. Table 1). Sequencing errors do not significantly contribute to the

164

coverage of single nonsynonymous mutations (Supp. Figure 3)..

165

Further founds of PR were used to increase the penetrance of mutations.

166

Although the Round 1 library covered targeted mutations comprehensively, it was

167

roughly 61% WT iLOV sequence. Furthermore, the programmable nature of PR

168

allows the construction of more targeted libraries with combinatorial multiplexing

169

of high fitness mutations, such as would be useful in directed evolution (Figure 1C).

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

170

Four additional rounds of recombineering were performed to reduce the WT

171

fraction of the library and investigate the distribution of variants with 2+ mutations.

172

The number of reads with codon mutations increased substantially with further

173

rounds of recombineering (Figure 1D). In the Round 5 library, single mutations

174

were the most common (33%), with the remainder composed of WT sequences

175

(26%) and sequences containing 2+ mutations (41%).

176 177

Recombineering libraries can be finely controlled to alter the composition of

178

mutations

179

In the absence of prior information, the ideal mutagenesis technique would

180

produce every mutant targeted with equal frequency. Because each variant is

181

initially present in the same amount, a uniform library requires the least amount of

182

screening (or selection) in order to isolate improved mutants. A non-uniform

183

method, in contrast, might produce a highly variable distribution of mutants and

184

requires more screening or selection in order to fully explore sequence space. We

185

therefore analyzed our sequencing data to characterize the uniformity and sequence

186

preferences of PR libraries.

187

We detected two distinct types of bias: positional bias and mismatch bias. In

188

positional bias, oligonucleotides targeted to different positions in the coding

189

sequence incorporate with characteristically different efficiencies (regardless of the

190

mutation produced at that position). In mismatch bias, oligonucleotides that are

191

more similar to the WT sequence (e.g. differing only at the first base of the targeted

192

codon) incorporated with high efficiency while more divergent oligos (e.g. different

ACS Paragon Plus Environment

Page 10 of 39

Page 11 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

193

at all three positions) incorporated with lower efficiency. Positional bias is evident

194

when comparing the frequencies of single codon mutations across all 110 codons in

195

iLOV (Figure 2A). Despite the nearly 60 bp of homology between the oligonucleotide

196

and the template plasmid, oligos targeting adjacent codons can exhibit a 2-3 fold

197

different incorporation. The largest such discrepancy is found at codon 42, which is

198

mutated 5.7 times more frequently than codon 41. Variation in efficiency has been

199

observed in other recombineering studies and was somewhat correlated with the

200

oligonucleotide binding energy9. While our data was not clearly correlated with

201

binding energy (Supp. Figure 4), a biological replicate revealed replicable positional

202

bias (Supp. Figure 5), suggesting the presence of an underlying physical mechanism

203

for positional bias.

204

Comprehensive mutagenesis applications, such as deep mutational scanning,

205

ideally begin with uniformly distributed mutants in a naïve library. We

206

hypothesized that the positional bias observed in the Round 1 library – constructed

207

using equimolar mutagenic oligonucleotides – could be corrected by altering the

208

mixture of oligonucleotides used in the electroporation step. A second library was

209

therefore constructed by normalizing the concentration of each oligonucleotide in

210

the library according to the frequency of mutations obtained at the corresponding

211

position in Round 1. This library was constructed using PR, and the first 40 amino

212

acids were deep sequenced. The normalized library was indeed more uniform, with

213

a largest adjacent codon discrepancy of 2.1 fold, compared to 3.3 fold in the

214

equimolar replication library (Figure 2B).

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

215

In order to understand the importance of this normalization in a quantitative

216

fashion, we performed a simulation to evaluate the pragmatic impact of

217

oligonucleotide normalization on effective library size – the number of samples that

218

must be taken from a library to achieve a desired representation of the library

219

diversity. In the ideal case, i.e. mutants are found in the library with equal frequency,

220

if one wishes to sample, say, 95% of the diversity contained within the library, we

221

must screen roughly three times the number of distinct members. That is, the

222

effective library size is threefold the targeted library size. Our equimolar and

223

normalized libraries are not uniformly distributed, but random sampling from our

224

sequenced data in silico allows calculation of the effective library size. Our

225

simulation revealed a substantial improvement in sampling efficiency, from a mean

226

effective library size of 139 in the original library to 107 in the normalized library,

227

which is quite close to that of an ideal uniformly distributed library (Supp. Figure 6).

228

The presence of mismatch bias in the library demonstrates that

229

recombineering favors incorporation of oligos that are more similar to the WT

230

template sequence. Our oligonucleotides all contained one, two, or three nucleotide

231

mismatches relative to WT iLOV. After computationally enumerating all possible

232

oligonucleotides for each codon in iLOV, we expected that 14% of oligonucleotides

233

would contain one mismatch, 43% two, and 43% three. In contrast, oligonucleotide-

234

template pairs with two or three mismatches were observed at only 31% and 22%

235

respectively, while single nucleotide mismatches were overrepresented by more

236

than threefold at 47% (Figure 2C). This value is inflated due to the presence of

237

sequencing errors, of which the vast majority are single mismatches, but is still

ACS Paragon Plus Environment

Page 12 of 39

Page 13 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

238

significantly higher than expected given the error rate in the negative control (Supp.

239

Table 1). Together, mismatch bias and positional bias are substantial sources of

240

variation in the library, accounting for roughly 60% of model variance (Supp. Figure

241

7).

242

While a single round of PR generates many single mutants, additional rounds

243

shift the distribution to increasing mutation numbers. The Round 5 library, for

244

example, is 25% double mutants. These mutations are well represented, with 5817

245

out of a possible 5940 locations detected (Figure 2D). However, it is clear that the

246

same positional bias seen in the Round 1 library is preserved in subsequent rounds.

247

This is to be expected if recombineering events are statistically independent, and

248

can be seen by the correlation of double-mutation hotspots with single mutation

249

positional bias (Supp. Figure 8). This data suggests that double mutations in a

250

normalized library will be far more uniformly distributed.

251

The double mutation data also indicates that some recombineering events

252

are not perfectly independent. A plot of the pairwise distance between all detected

253

double mutants reveals an uneven distribution in frequency (Figure 2E).

254

Specifically, double mutations are less likely to be within 30 bp (i.e. 10 amino acids)

255

of one another. This effect becomes more pronounced with additional rounds of

256

recombineering (Supp. Figure 9). In the Round 5 library, this ‘zone of exclusion’ is

257

significant enough that double mutations a few amino acids apart are nearly four

258

times less frequent than double mutations with a much larger separation (e.g. three

259

amino acid gap versus 10 amino acid gap). We hypothesize that this bias is due to

260

the mechanism of recombineering which requires oligonucleotides to anneal to a

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

261

complementary locus via homology arms flanking the NNM mutagenic codon. In this

262

mechanism, sequential oligonucleotides would ‘overwrite’ previous mutations due

263

to incorporation of the most recent oligonucleotide’s homology arms, which extend

264

30 bp on either side of the central NNM. Another potential mechanism that disfavors

265

incorporation of nearby double mutants is mutational reduction of oligo

266

incorporation efficiency. In this hypothetical mechanism, mutations generated in

267

early rounds of PR could reduce the homology and, therefore, incorporation

268

efficiency of oligos in subsequent PR rounds.

269 270

Screening the iLOV library identifies mutations conferring thermostability

271

Previous screening and structural work indicates that a well-packed binding

272

site for the FMN fluorophore may lead to improved photochemical properties of

273

iLOV, such as photostability, by limiting the dynamics of the FMN chromophore and

274

its ability to dissipate energy following excitation16. Additionally, searches for

275

improved LOV-based fluorescent reporters have turned up homologous variants

276

such as CreiLOV that, while brighter (50% greater quantum yield), exhibit

277

substantial toxicity upon expression19. More generally, Tawfik and colleagues have

278

theorized that thermostable proteins serve as more fruitful starting points for

279

engineering and directed evolution20. In this view, variants with greater

280

thermostability can better tolerate mutations, increasing the likelihood of observing

281

mutations that improve protein function in a manner unrelated to thermostability.

282

We thus investigated the thermostability of our library, which contained nearly

283

every single amino acid mutation and a small fraction of possible double mutations.

ACS Paragon Plus Environment

Page 14 of 39

Page 15 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

284

We created a plate-based assay to screen for thermostabilized variants in the

285

Round 5 library. The library was plated at high colony density onto standard LB-

286

Agar, grown overnight, then incubated at 60 °C for two hours (Figure 3A). This

287

treatment completely abrogates the fluorescence of WT iLOV as well as nearly all

288

mutants (Figure 3B). Some colonies remained fluorescent, however, and these were

289

recovered and expressed in 96-well plate format. Cultures expressing these library

290

members were lysed, clarified, and analyzed for thermostability. Fluorescence

291

measurements were taken every 0.5° C during a temperature ramp from 25° C to

292

95° C and used to calculate the melting temperature (Tm) for each clone, the

293

temperature at which 50% of maximal fluorescence is retained (Figure 3C). All 93

294

assayed clones demonstrated a substantial increase in thermostability relative to

295

iLOV (Figure 3D). Improvements in Tm ranged from two to nearly ten degrees C.

296 297 298

Recombineering based multiplexing allows rapid and robust directed evolution

299

While comprehensive single mutant libraries are ideal for exploring

300

structure-function relationships in an unbiased manner, hypothesis-driven

301

investigation requires targeted libraries for exploring the effect of mutations at

302

specific sites of interest. Because PR targets only sites programmed by synthetic

303

oligonucleotides, we hypothesized that PR could generate a specific library in a cost-

304

effective and straightforward protocol.

305

To demonstrate the rapid multiplexing capability of PR, we designed a

306

second library containing the top 25 most frequent thermostabilizing mutations

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 39

307

from the initial plate screen (Figure 4A). Several of these mutations consisted of

308

alternative amino acids at the same position, and in these cases a different

309

oligonucleotide was designed for each. The encoding oligonucleotides were

310

designed such that homology arms would stop short of neighboring mutations so as

311

not to overwrite them (Supp. Figure 10).

312

This

multiplexed

library

resulted

in

striking

improvements

to

313

thermostability, with the best variants having Tm values nearly 20° C greater than

314

iLOV (Figure 4B). Isolating and re-cloning these variants verified that despite the

315

significant increase in Tms, the shape of their melt curves was not significantly

316

different from that of iLOV, even for the most thermostable mutant (Figure 4C). It

317

was noted during the screening process that one variant in particular, hereafter

318

referred to as thermostable LOV (tLOV), seemed to produce abnormally bright

319

lysate under high expression conditions. tLOV and iLOV were expressed and

320

purified in parallel and their absorbance and emission were characterized in vitro.

321

To accurately quantify relative quantum yield, the emission curves were normalized

322

for absorption at 450 nm and integrated. tLOV was found to be approximately 10%

323

brighter (Supp. Figure 11), and sequencing revealed the presence of four mutations

324

scattered throughout the protein, all introduced by PR (Figure 4D). Notably, none of

325

these locations are directly within the FMN binding pocket, and their contribution to

326

thermostability or quantum yield improvement are not obvious. Thus, it would have

327

been difficult to rationally design tLOV. Moreover, a targeted mutagenesis technique

328

is required for isolating quadruple mutants reliably: iLOV is a small protein, but a

329

quadruple mutant library would contain > 1013 variants, well beyond our current

ACS Paragon Plus Environment

Page 17 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

330

screening capacities. Finally, thermostability was verified by differential scanning

331

calorimetry of iLOV vs tLOV (Supp. Figure 12).

332

An ideal method for generating comprehensive protein libraries would be

333

simple and robust, enabling both complete and targeted mutagenesis without a

334

change in reagents. In this study, we demonstrate that PR can be effectively used for

335

both comprehensive and programmable mutagenesis of the fluorescent protein

336

iLOV, using methods that are generalizable to any gene of interest.

337

Previous recombineering studies have successfully built libraries in the

338

genome, demonstrating specificity and fine control of mutational composition.

339

Wang et al.9 used multiple rounds of recombineering to mutagenize six consecutive

340

nucleotides using 90 bp oligonucleotides and found a 75% mutation rate after five

341

rounds. We find quite similar behavior in the sequencing analysis of the iLOV

342

recombineered plasmid libraries constructed here. Our Round 5 library consisted of

343

74% mutant variants, and the mutations were well distributed in sequence space.

344

The modest positional bias could be substantially ameliorated by altering the ratios

345

of oligonucleotides added to the electroporation mixture. This resulted in a nearly

346

uniform distribution of mutations such that the effective library size was almost

347

ideal (Supp. Figure 6). This approach could easily be used to accommodate more

348

complex libraries with weighted mutation frequencies at various locations. The

349

correlation of replicate libraries (Supp. Figure 5) strongly suggests an underlying

350

physical mechanism for differing oligonucleotide recombineering efficiencies. It will

351

be important to understand this effect in order to predict efficiencies rather than

352

rely on empirical data as used here, which required additional sequencing and labor.

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

353

Regardless of these modifications, the reagent cost and experimental effort remain

354

low – standard 60 bp oligonucleotides and simple cycles of electroporation, growth,

355

and plasmid isolation.

356

Additional biases resulting from the mechanism of recombineering were

357

detected and while their magnitude was smaller, their effect on library size and

358

screening can be significant. Previous work has found that recombineering

359

efficiency drops sharply with the size of the modification made9. Here, too, single

360

nucleotide mutations were observed to be more common than expected. This effect

361

alters the distribution of codons present in the final library, with the template

362

codons determining this frequency shift. Notably, mismatch bias is intrinsic to the

363

annealing of oligonucleotides and is likely present for in vitro methods as well.

364

The ‘zone of exclusion’ around a first mutation generates another mode of

365

bias. A second mutation is less likely to appear inside this zone than outside of it.

366

This effect became more pronounced with increasing rounds of recombineering. Is it

367

likely this results from oligonucleotides’ homology arms overwriting earlier

368

mutations during later rounds of PR. In other words, we hypothesized that

369

homology arms are capable of introducing revertant mutations. We thus predict that

370

the length of homology arms would impact the length of the ‘zone of exclusion.’ As

371

some amount of homology is absolutely required for recombineering, no form of

372

recombineering is suitable for efficiently generating adjacent mutations from

373

different oligonucleotides. This limitation can likely be overcome by using single

374

oligonucleotides encoding sequential mutations but at the cost of reduced efficiency

375

due to mismatch bias. Wang et al.9 observed similar mutation rates for between one

ACS Paragon Plus Environment

Page 18 of 39

Page 19 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

376

and four mismatches, but found efficiency dropped by ~half when incorporating

377

mismatches of five or ten nucleotides. Again, if this effect fundamentally stems from

378

the annealing of oligonucleotides then it is likely present for in vitro methods as

379

well.

380

In vitro methods represent the most powerful alternatives for generating

381

comprehensive mutation libraries. The chief advantage of these approaches is that

382

they can generate a library composed mostly of mutated variants. Ostermeier and

383

colleagues accomplished this by utilizing specialized protocols based on uracil-

384

containing template DNA3, while Wrenbeck and Whitehead selectively degrade the

385

WT strand using single-strand nicking endonucleases6. Such methods have been

386

used to generate nearly comprehensive libraries of genes for exploring the entirety

387

of the fitness landscape21, and to comprehensively examine the possible

388

evolutionary pathways leading from one allele to another22.

389

In directed protein evolution, iterative rounds of mutagenesis can be used to PCR-based protocols3,

390

multiplex fitness-improving

391

synthesis4, and some other recombineering techniques11 excel at generating

392

libraries composed of single mutations. However, in many applications, 2+

393

mutations are desired at non-contiguous locations. Recent work has developed a

394

PCR-based method to accomplish this goal in vitro5, and we hypothesized that PR

395

was well suited to serve as a complementary approach in vivo, doing away with

396

cloning altogether. To this end, we comprehensively explored the iLOV single

397

mutation sequence-space for thermostability, selected the fitness enhancing

398

mutations, and demonstrated the utility of PR for advanced protein engineering by

mutations.

ACS Paragon Plus Environment

direct

gene

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

399

multiplexing many different single and double mutations at discontinuous sites

400

across iLOV in a second library. The ability to select and easily mutate numerous

401

specific and non-contiguous locations across a protein is highly useful for a variety

402

of techniques that utilize experimental or phylogenetic data to computationally

403

predict and enhance enzymes23, explore epistatic interactions24, or even scan SNPs

404

in human proteins for disease prediction1.

405

iLOV engineering has been relatively limited in comparison to other

406

fluorescent proteins15. Because the domain has been taken out of its natural

407

structural context, we hypothesized that its thermal stability could be increased.

408

Consistent with this idea, many mutations were found to improve the thermal

409

stability of iLOV up to a robust, 10° C increase in Tm. These improvements were then

410

stacked by multiplexing the 25 most common mutations from the first screen. As a

411

measure of convenience, the same pooled, single electroporation protocol was used,

412

although iterative rounds of transformation and outgrowth set a lower limit on

413

throughput – typically ~16 hours per cycle. One particular variant among the

414

thermostabilized pool, tLOV, was found to be ~10% brighter that iLOV in vitro. This

415

result is consistent with previous work demonstrating that constraining the FMN

416

fluorophore can improve the photochemical properties of iLOV16. It would be

417

interesting to perform comprehensive mutagenesis of the thermostabilized iLOV

418

mutants in search of further improvements to the protein’s brightness or red/blue

419

spectral shifting, as increased thermostability has been hypothesized to permit

420

greater exploration of function-altering mutations20.

ACS Paragon Plus Environment

Page 20 of 39

Page 21 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

421

In summary, we have demonstrated that PR retains many of the ideal

422

properties of genome recombineering, including specificity and programmability.

423

We found that PR was suited for the construction of both comprehensive and

424

targeted libraries, and that the simplicity of the protocol led to rapid and reliable

425

screening experiments. In particular, PR is suitable for cycles of iterative design,

426

construction, and sampling of genetic libraries, requiring no specialized reagents or

427

protocols. We developed a thermostability screen of the fluorescent protein iLOV

428

and used the resulting mutation data to rapidly construct a multiplexed library that

429

identified significantly improved variants, including the first enhancement to the

430

protein’s brightness since its development.

431

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

432

MATERIALS AND METHODS

433

Strains and Media

Page 22 of 39

434

Strain EcNR2 (Addgene ID: 26931)9 was used for generating PR libraries in

435

plasmid pSAH031 (Addgene ID: 90330). For thermostability screening and protein

436

expression, iLOV libraries were cloned into pTKEI-Dest (Addgene ID: 79784)25 using

437

Golden Gate cloning26 with restriction enzyme BsmbI (NEB) and transformed into

438

either Tuner (Novagen) or XJ b Autolysis E. coli (Zymo Research). Unless otherwise

439

stated, strains were grown in standard LB (Teknova) supplemented with kanamycin

440

(Fisher) at 60 µg/mL.

441

Recombineering Library Construction

442

Libraries were constructed using a modified protocol from Wang 201127.

443

Briefly,

110

oligonucleotides

(Supp.

Table

2)

or

444

oligonucleotides (Supp. Table 3) were mixed and diluted in water. A final volume of

445

50 µL of 2 µM oligonucleotides, plus 10 ng of pSAH031, was electroporated into 1

446

mL of induced and washed EcNR2 using a 1 mm electroporation cuvette (BioRad

447

GenePulser). A Harvard Apparatus ECM 630 Electroporation System was used with

448

settings 1800 kV, 200 Ω, 25 µF. Three replicate electroporations were performed,

449

then individually allowed to recover at 30° C for 2 hr in 1 mL of SOC (Teknova)

450

without antibiotic. LB and kanamycin was then added to 6 mL final volume and

451

grown overnight. Cultures were miniprepped (QIAprep Spin Miniprep Kit) and

452

monomer plasmids were isolated by agarose gel electrophoresis and gel extraction

453

(QIAquick Gel Extraction Kit) to remove multimer plasmids12. The three replicates

454

were then combined, completing a round of PR.

ACS Paragon Plus Environment

25

thermostabilizing

Page 23 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

455

ACS Synthetic Biology

Library Sequencing and Analysis

456

The iLOV open reading frame was amplified from PR libraries by PCR to add

457

indices and priming sequences for deep sequencing (Supp. Table 4). PCR products

458

were sequenced on Illumina platform sequencers (MiSeq and HiSeq) through the

459

Berkeley Genomics Sequencing Laboratory. Sequencing data were analyzed with a

460

custom MATLAB pipeline. Briefly, reads were filtered to remove those that were of

461

low quality, frameshifted, or did not exactly match the annealing portion of the

462

amplifying primers. A detailed description of sequencing analysis can be found in

463

Supporting Methods. Finally, full-length reads were compared with the iLOV target

464

sequence for mutation analysis.

465

In silico Effective Library Size Simulation

466

Simulations were performed using a custom MATLAB script. The simulations

467

focused on amino acid positions 5 – 40, for which we possess comprehensive

468

frequency data from the equimolar oligonucleotide and normalized PR experiments.

469

Successive sampling, with replacement, of these 36 positions was performed, with

470

each position’s likelihood commensurate with the observed mutation frequency.

471

Once 34 out of 36 positions had been observed in the simulation, which we

472

arbitrarily define as ‘well-sampled’ (94.4% of the library diversity), the total

473

number of samples taken was recorded as the effective library size. This process

474

was then repeated 104 times for each library to generate a distribution of effective

475

library sizes (Supp. Figure 6).

476

Thermostability Screening

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 39

477

Colonies were screened on 10 cm dishes containing standard LB-agar with

478

kanamycin. Tuner cells expressing the library were found to be fluorescent in the

479

absence of induction after 24 hours. Plates were incubated at 60° C for 2 hours, after

480

which the vast majority of colonies were no longer fluorescent. Approximately

481

500,000 colonies were screened, of which 244 remained fluorescent. These colonies

482

were pooled, miniprepped, and deep sequenced to identify protein mutations. This

483

DNA was also transformed into XJb cells to recover individual variants. 93 colonies

484

were then grown overnight in 96-well deep well plates with 1 mL of LB + kanamycin

485

supplemented with 100 µM Isopropyl β-D-1-thiogalactopyranoside (IPTG) and 3

486

mM arabinose at 37° C. Cultures were frozen and thawed to lyse the cells, then

487

clarified by centrifugation. Supernatant was analyzed in a StepOnePlus™ Real-Time

488

PCR System (Applied Biosystems) to estimate a Tm for each variant.

489

Protein Expression and in vitro Characterization

490

iLOV variants were expressed and lysed in XJb cells as above but in 100 mL

491

volume. Excess FMN (Sigma) was added to the lysate to ensure all proteins

492

possessed

493

chromatography using HisPur™ Ni-NTA Resin (Thermo Scientific). Variants were

494

filtered using Vivaspin 6 3,000 molecular weight cut-off (Sartorius) with phosphate

495

buffered saline (PBS) pH 7.4 (Gibco). Variants were further purified by size

496

exclusion chromatography using a NGC Chromatography System (Bio-Rad). Purified

497

proteins were stored at 4° C in PBS. For quantum yield determination, emission

498

between 460 nm and 600 nm was measured in a FluoroLog Spectrophotometer

499

(Horiba), and 450 nm absorbance was measured in an Infinity M1000 PRO

ligand.

iLOV

variants

were

then

purified

ACS Paragon Plus Environment

by

nickel

affinity

Page 25 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

500

monochromator (Tecan). Emission curves were integrated in MATLAB and

501

normalized for absorbance. Thermostability measurements were made in a Nano

502

differential scanning calorimeter (TA Instruments).

503

Accession Codes

504

Sequence Read Archive: Sequencing data have been deposited under

505

accession

numbers

SAMN07204029,

SAMN07204030,

506

SAMN07204042, SAMN07204073, and SAMN07204112.

507 508

ACS Paragon Plus Environment

SAMN07204032,

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

509

Figure Legends:

510

Figure 1: Plasmid Recombineering (PR) of iLOV generates a specific and

511

comprehensive mutation library. (A) Cartoon of comprehensive PR accomplished

512

using synthetic oligonucleotides tiled across the target gene. (B) Frequency of single

513

amino acid mutations mapped by residue and location in the Round 1 library. Black

514

indicates the WT iLOV residue. White indicates no detected reads. (C) Cartoon of

515

programmable PR targeting a specific sequence space. Sampling of the highest

516

fitness mutants can inform subsequent library design, with recombineering oligos

517

specifically targeting mutations of interest. (D) Mutational distribution of the library

518

after additional rounds of recombineering.

519 520 521

Figure 2: The recombineering mechanism produces moderate bias within the

522

library and can be manipulated by varying the delivered oligonucleotide

523

concentration. (A) Frequency of single amino acid mutations across iLOV in the

524

Round 1 library. (B) Biological replicate of Round 1 library with normalized

525

oligonucleotide concentrations. (C) Observed vs. expected distribution of one, two,

526

or three nucleotide mismatches among single amino acid mutations. The 95%

527

confidence intervals for these measurements are all smaller than +/- 0.0025

528

(normal approximation to the binomial distribution). (D) Frequency of double

529

mutations in the Round 5 library. White = not detected. Highly represented rows

530

and columns arise from positional bias (Supp. Figure 8). (E) Frequency of double

531

mutations in the Round 5 library as a function of the pairwise distance between

ACS Paragon Plus Environment

Page 26 of 39

Page 27 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

532

double mutations. For each distance, data has been normalized for the number of

533

possible double mutations. Error bars, standard deviation.

534 535 536

Figure 3: A plate-based thermostability screen identifies mutations that improve

537

iLOV fluorescence at elevated temperatures. (A) Cartoon of the thermostability

538

screen assay and subsequent hit validation procedure. (B) Representative

539

fluorescent images of library colonies before and after temperature challenge. (C)

540

Representative fluorescent thermal melt curves of lysate for two thermostable hits.

541

(D) Histogram of Tms for 93 thermostabilized iLOV variants.

542 543 544

Figure 4: Multiplexing thermostabilizing mutations rapidly identifies doubly

545

improved iLOV variants. (A) 25 thermostabilizing mutations mapped to the iLOV

546

protein sequence. (B) Histogram of initial thermostable hits (Blue) superimposed on

547

hits from the multiplexed library (Cyan). (C) Representative melt curves of two

548

thermostabilized variants compared to iLOV. (D) Crystal structure of iLOV (PDB

549

4EES) indicating the location of four mutations found in tLOV: R11S, I35V, R90F,

550

G106V.

551

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

552 553 554

SUPPORTING INFORMATION The Supporting Information is available free of charge on the ACS Publications website at DOI:

555

Figure S1: Map of iLOV PR target plasmid. Figure S2: Internal control for

556

sequence analysis pipeline. Figure S3: Effect of sequencing errors on library

557

coverage. Figure S4: Recombineering efficiency correlation with binding energy or

558

hairpin formation. Figure S5: Recombineering efficiency correlation between

559

replicates. Figure S6: Simulation of effective library size in silico. Figure S7: Fraction

560

of total library variance accounted for by mismatch and positional bias. Figure S8:

561

Positional bias correlation with double mutations. Figure S9: ‘Zone of Exclusion’ for

562

Rounds 1,3, and 5. Figure S10: Design of multiplexed library oligonucleotides. Figure

563

S11: Emission scan of purified iLOV and tLOV. Figure S12: Thermostability of iLOV

564

and tLOV via differential scanning calorimetry. Table S1: Library coverage statistics

565

for negative control, Round 1, and Round 5. Table S2: PR oligonucleotides for

566

comprehensive iLOV library. Table S3: PR oligonucleotides for multiplexed

567

thermostability library. Table S4: PCR primers used for Illumina sequencing. Table

568

S5: Nucleotide and amino acid sequences of iLOV and tLOV.

569

ACS Paragon Plus Environment

Page 28 of 39

Page 29 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

570

ACS Synthetic Biology

AUTHOR INFORMATION

571

Corresponding Author

572

Email: [email protected]. Address: 2151 Berkeley Way, Berkeley, CA

573

94704; (510) 643-7847

574

Author Contributions

575

S.A.H. and D.F.S designed the research. S.A.H. and S.O. performed the

576

experiments. S.A.H. performed the computational analysis and analyzed the data.

577

S.A.H. and D.F.S wrote the paper. Reagents described in this work are available on

578

Addgene (https://www.addgene.org/David_Savage/), MATLAB scripts used for

579

sequencing analysis are available online at https://github.com/savagelab.

580

Notes

581

The authors declare no competing financial interest.

582

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

583

ACKNOWLEDGMENTS

584

We thank H. Wang (Columbia) for providing the E. coli strain EcNR2. We

585

would like to thank A. Flamholz and B. Oakes for productive discussions and

586

readings of the manuscript, and C. Cassidy-Amstutz for assistance with in vitro

587

biochemistry. This work was supported by NIH New Innovator award

588

1DP2EB018658-01 to D.F.S., and S.A.H. was supported by NIH Training Grant

589

5T32GM066698-10 and Agilent.

590

ACS Paragon Plus Environment

Page 30 of 39

Page 31 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

591

REFERENCES

592

(1) Majithia, A. R., Tsuda, B., Agostini, M., Gnanapradeepan, K., Rice, R., Peloso, G.,

593

Patel, K. A., Zhang, X., Broekema, M. F., Patterson, N., Duby, M., Sharpe, T., Kalkhoven,

594

E., Rosen, E. D., Barroso, I., Ellard, S., Kathiresan, S., O’Rahilly, S., Chatterjee, K.,

595

Florez, J. C., Mikkelsen, T., Savage, D. B., and Altshuler, D. (2016) Prospective

596

functional classification of all possible missense variants in PPARG. Nat. Genet. 48,

597

1570–1575.

598

(2) Heim, R., Cubitt, a B., and Tsien, R. Y. (1995) Improved green fluorescence.

599

Nature 373, 663–664.

600

(3) Firnberg, E., and Ostermeier, M. (2012) PFunkel: Efficient, Expansive, User-

601

Defined Mutagenesis. PLoS One 7, e52031.

602

(4) Melnikov, A., Rogov, P., Wang, L., Gnirke, A., and Mikkelsen, T. S. (2014)

603

Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent

604

fitness landscapes. Nucleic Acids Res. 42, e112.

605

(5) Belsare, K. D., Andorfer, M. C., Cardenas, F., Chael, J. R., Park, H. J., and Lewis, J. C.

606

(2016) A Simple Combinatorial Codon Mutagenesis Method for Targeted Protein

607

Engineering. ACS Synth. Biol. 6, 416–420.

608

(6) Wrenbeck, E. E., Klesmith, J. R., Stapleton, J. A., Adeniran, A., Tyo, K. E. J., and

609

Whitehead, T. A. (2016) Plasmid-based one-pot saturation mutagenesis. Nat.

610

Methods 13, 928–930.

611

(7) Copeland, N. G., Jenkins, N. a, and Court, D. L. (2001) Recombineering: a powerful

612

new tool for mouse functional genomics. Nat. Rev. Genet. 2, 769–779.

613

(8) Mosberg, J. a., Lajoie, M. J., and Church, G. M. (2010) Lambda red recombineering

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

614

in Escherichia coli occurs through a fully single-stranded intermediate. Genetics 186,

615

791–799.

616

(9) Wang, H. H., Isaacs, F. J., Carr, P. a, Sun, Z. Z., Xu, G., Forest, C. R., and Church, G. M.

617

(2009) Programming cells by multiplex genome engineering and accelerated

618

evolution. Nature 460, 894–898.

619

(10) Warner, J. R., Reeder, P. J., Karimpour-Fard, A., Woodruff, L. B. a, and Gill, R. T.

620

(2010) Rapid profiling of a microbial genome using mixtures of barcoded

621

oligonucleotides. Nat. Biotechnol. 28, 856–862.

622

(11) Garst, A. D., Bassalo, M. C., Pines, G., Lynch, S. a, Halweg-Edwards, A. L., Liu, R.,

623

Liang, L., Wang, Z., Zeitoun, R., Alexander, W. G., and Gill, R. T. (2016) Genome-wide

624

mapping of mutations at single-nucleotide resolution for protein, metabolic and

625

genome engineering. Nat. Biotechnol. 35, 48–55.

626

(12) Thomason, L. C., Constantino, N., Shaw, D. V., and Court, D. L. (2007) Multicopy

627

plasmid modification with phage λ red recombineering. Plasmid 58, 148–158.

628

(13) Turrientes, M. C., Baquero, F., Levin, B. R., Martinez, J. L., Ripoll, A., Gonzalez-

629

Alba, J. M., Tobes, R., Manrique, M., Baquero, M. R., Rodriguez-Dominguez, M. J.,

630

Canton, R., and Galan, J. C. (2013) Normal Mutation Rate Variants Arise in a Mutator

631

(Mut S) Escherichia coli Population. PLoS One 8, e72963.

632

(14) Nyerges, Á., Csörgő, B., Nagy, I., Bálint, B., Bihari, P., Lázár, V., Apjok, G.,

633

Umenhoffer, K., Bogos, B., Pósfai, G., and Pál, C. (2016) A highly precise and portable

634

genome engineering method allows comparison of mutational effects across

635

bacterial species. Proc. Natl. Acad. Sci. 113, 2502–2507.

636

(15) Chapman, S., Faulkner, C., Kaiserli, E., Garcia-Mata, C., Savenkov, E. I., Roberts, A.

ACS Paragon Plus Environment

Page 32 of 39

Page 33 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

637

G., Oparka, K. J., and Christie, J. M. (2008) The photoreversible fluorescent protein

638

iLOV outperforms GFP as a reporter of plant virus infection. Proc. Natl. Acad. Sci. U.

639

S. A. 105, 20038–20043.

640

(16) Christie, J. M., Hitomi, K., Arvai, A. S., Hartfield, K. a., Mettlen, M., Pratt, A. J.,

641

Tainer, J. a., and Getzoff, E. D. (2012) Structural tuning of the fluorescent protein

642

iLOV for improved photostability. J. Biol. Chem. 287, 22295–22304.

643

(17) Lim, S. I., Min, B. E., and Jung, G. Y. (2008) Lagging Strand-Biased Initiation of

644

Red Recombination by Linear Double-Stranded DNAs. J. Mol. Biol. 384, 1098–1105.

645

(18) Callis, P. R., and Liu, T. (2006) Short range photoinduced electron transfer in

646

proteins: QM-MM simulations of tryptophan and flavin fluorescence quenching in

647

proteins. Chem. Phys. 326, 230–239.

648

(19) Mukherjee, A., Weyant, K. B., Agrawal, U., Walker, J., Cann, I. K. O., and

649

Schroeder, C. M. (2015) Engineering and characterization of new LOV-based

650

fluorescent proteins from chlamydomonas reinhardtii and vaucheria frigida. ACS

651

Synth. Biol. 4, 371–377.

652

(20) Tokuriki, N., and Tawfik, D. S. (2009) Stability effects of mutations and protein

653

evolvability. Curr. Opin. Struct. Biol. 19, 596–604.

654

(21) Firnberg, E., Labonte, J. W., Gray, J. J., and Ostermeier, M. (2014) A

655

comprehensive, high-resolution map of a Gene’s fitness landscape. Mol. Biol. Evol. 31,

656

1581–1592.

657

(22) Steinberg, B., and Ostermeier, M. (2016) Environmental changes bridge

658

evolutionary valleys. Sci. Adv. 2, e1500921–e1500921.

659

(23) Heinzelman, P., Snow, C. D., Wu, I., Nguyen, C., Villalobos, A., Govindarajan, S.,

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

660

Minshull, J., and Arnold, F. H. (2009) A family of thermostable fungal cellulases

661

created by structure-guided recombination. Proc. Natl. Acad. Sci. U. S. A. 106, 5610–

662

5615.

663

(24) Olson, C. A., Wu, N. C., and Sun, R. (2014) A comprehensive biophysical

664

description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24,

665

2643–2651.

666

(25) Nadler, D. C., Morgan, S.-A., Flamholz, A., Kortright, K. E., and Savage, D. F.

667

(2016) Rapid construction of metabolite biosensors using domain-insertion

668

profiling. Nat. Commun. 7, 12266.

669

(26) Engler, C., Kandzia, R., and Marillonnet, S. (2008) A one pot, one step, precision

670

cloning method with high throughput capability. PLoS One 3, e3647.

671

(27) Wang, H., and Church, G. (2011) Multiplexed genome engineering and

672

genotyping methods applications for synthetic biology and metabolic engineering.

673

Methods Enzym. 498, 409–426.

674

ACS Paragon Plus Environment

Page 34 of 39

a

ACS Synthetic Biology

b

Target Gene in Plasmid Amino Acid (One Letter Code)

Saturation Recombineering Targeting Plasmids

Comprehensive Mutation Library

Wild Type

A C D E F G H I K L N P Q R S T V Y

512 256 128 64 32 16 8 4 2

20

40

60

80

1

100

Not Detected

Amino Acid in iLOV Open Reading Frame

d

Selected Oligos Encode Mutations of Interest

Targeted Recombineering

Iterative Cycles Improving Fitness

Selection or Screen

0.7 Fraction of Total Library

1 2 3 4 5 6 7 8 9 10 11 12 13 14c 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Oligos Encode Any Library of Interest

0.6 0.5 0.4 0.3 0.2 0.1

Highest Fitness Mutants Final Improved Variant

Frequency of Amino Acid Mutation (Read Counts)

Page 35 of 39

Discard Nonfunctional Variants

0 Num

1

2

ber o

f Rou

3 nds

of Re

4

ACS Paragon Plus Environment

5

5

com

bine

ering

3

4

cid oA

in

f Am

er o

b Num

0

1

2

ns

atio

Mut

ACS Synthetic Biology

0.035 0.03

0.02 0.015 0.01 0.005 0

20

40

3 Mismatches

3 2 Mismatches

2

1 Mismatch

1 Target Gene in Plasmid

0

10

20

30

40

e

0

512

512

32 16 8 4

Frequency of Double Mutations (Normalized to Expected Frequency)

64

Observed Expected

0.5 0.4 0.3 0.2 0.1 0

1

2

3

Number of Mismatches

Zone of Exclusion ~ Ten Amino Acids

2.5

256

Frequency of Amino Acid Mutation Codon Position of Mutation 1 (Read Counts)

128

100

Amino Acid in iLOV Open Reading Frame

Wild Type

256

80

c

Normalized Library

4

60 Amino Acid in iLOV Open Reading Frame

Frequency of Nucelotide Mutation

0

Frequency of Double Mutant Position (Read Counts)

1 2 3 4 5 6 7 8 9 b 10 11 12 13 14 15 16 17 18 19 d 20 21 22 23 24 25 26 27 28 29 30 31 0 32 100 Frame 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Page 36 of 39

0.025

Frequency of Amino Acid Mutation (Normalized to Mean)

Frequency of Amino Acid Mutation

a

128

Target Gene in Plasmid

64 32

55

1.5

16 8 4

2

1

0.5

1

1

110 Not Detected Not Detected 0

2

55

110

0

0

Codon Position of Mutation 2

ACS Paragon Plus Environment

20

40

60

80

100

Distance Between Double Mutations (Amino Acids)

Page 37 of 39 Fluorescence Temperature Screen

Isolation of Hits and Tm Determination

Fluorescent Variants Selected

60° Celsius Two Hours

Tm

Temperature

Protein Expression and Cell Lysis

b

c

Pre-Incubation

Post-Incubation

iLOV E2P E2I Q87A

0.9 0.8 0.7 0.6 0.5 0.4 0.3 50

60

70

Temperature (° Celsius)

ACS Paragon Plus Environment

80

Temperature Ramp and Fluorescence Monitoring

d 25 Number of Hits

500,000 Colonies Screened

Fraction of Initial Fluorescence

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

ACS Synthetic Biology

Fluorescence

a

20 15 10

iLOV

5 0 62

64

66

68

70

Fluorescence Tm (° Celsius)

72

a

Page A 38 I G P T V F A P PV M IEKNFVITDPRLPDNPIIFASDGFLELTEYSREEILGRNARFLQGPETDQATVQKIRDAIRDQRETTVQLINYTKSGKKFWNLLHLQPVRDQKGELQYFIGVQLDGSDHV 20

10

1

of 39

A S

N T

30

60

50

40

80

70

90

100

110

I35V

d

c 1 25

Fraction of Initial Fluorescence

Number of Hits

1 2 3 b 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

ACS Synthetic Biology

T S N A

P CC LLS

20 15 10 5 0

65

70

75

Fluorescence Tm (° Celsius)

80

0.9

R11S

0.8 0.7 iLOV tLOV Best Hit

0.6 0.5 0.4 50

60

70

R90F G106V

80

90

Temperature (° Celsius)

ACS Paragon Plus Environment

Oligos Encode Any Library of Interest

Page 39 of ACS 39 Synthetic Biology

1 2 3 4 5 6

Saturation Recombineering

Targeted Recombineering

ACS Paragon Plus Environment Comprehensive Mutation Library

Specific Mutation Library