Evolved Sequence Contexts for Highly Efficient Amber Suppression

Oct 9, 2014 - The expansion of the genetic code with noncanonical amino acids (ncAA) enables the function of proteins to be tailored with high molecul...
0 downloads 0 Views 643KB Size
Subscriber access provided by UNIV OF SOUTHERN QUEENSLAND

Article

Evolved Sequence Contexts for Highly Efficient Amber Suppression with Noncanonical Amino Acids Moritz Pott, Moritz Johannes Schmidt, and Daniel Summerer ACS Chem. Biol., Just Accepted Manuscript • DOI: 10.1021/cb5006273 • Publication Date (Web): 09 Oct 2014 Downloaded from http://pubs.acs.org on October 10, 2014

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Chemical Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

Evolved Sequence Contexts for Highly Efficient Amber Suppression with Noncanonical Amino Acids Moritz Pott, Moritz Johannes Schmidt and Daniel Summerer* Department of Chemistry, Zukunftskolleg and Konstanz Research School Chemical Biology, University of Konstanz, Konstanz, Germany.

KEYWORDS: Genetic Code Expansion - Noncanonical Amino Acids - Directed Evolution - Amber Suppression

ABSTRACT: The expansion of the genetic code with noncanonical amino acids (ncAA) enables the function of proteins to be tailored with high molecular precision. In this approach, the ncAA is charged to an orthogonal nonsense suppressor tRNA by an aminoacyl-tRNA-synthetase (aaRS) and incorporated into the target protein in vivo by suppression of nonsense codons in the messenger RNA (mRNA) during ribosomal translation. Compared to sense codon translation, this process occurs with reduced efficiency. However, it is still poorly understood, how the local sequence context of the nonsense codon affects suppression efficiency. Here, we report sequence contexts for highly efficient suppression of the widely used amber codon in E. coli for the orthogonal Methanocaldococcus jannaschii tRNATyr/TyrRS and Methanosarcina mazei tRNAPyl/PylRS pairs. In vivo selections of sequence context libraries consisting of each two random codons directly up- and downstream of an amber codon afforded contexts with strong preferences for particular mRNA nucleotides and/or amino acids that markedly differed from preferences of contexts obtained in control selections with sense codons. The contexts provided high amber suppression efficiencies with little ncAAdependence that were transferrable between proteins and resulted in protein expression levels of 70 - 110 % compared to levels of control proteins without amber codon. These sequence contexts represent stable tags for robust and highly efficient incorporation of ncAA into proteins in standard E. coli strains and provide general design rules for the engineering of amber codons into target genes. The expansion of the genetic code with noncanonical amino acids (ncAA) enables the structure and function of proteins to be tailored with a previously unattainable level of precision, directly in living cells. This approach relies on orthogonal tRNA/aminoacyl-tRNA-synthetase (aaRS) pairs that promote the selective, co-translational incorporation of ncAA in response to nonsense codons. A large set of ncAA with various useful chemical and biophysical functions is now available, such as ncAA with bioorthogonal chemical reactivities, posttranslational modifications, as well as photo-crosslinking, photocaged, fluorescent, spin labeled, and metal-binding ncAA.1,2 Escherichia coli is a preferred host for the expression of proteins containing ncAA, since it offers straightforward genetic engineering, high growth rates, high levels of protein expression, and a large set of applicable ncAA. The most widely used orthogonal tRNA/aaRS pairs for this approach in E. coli are the Methanocaldococcus jannaschii tRNATyr/TyrRS pair1,3 and the Methanosarcina mazei (or Methanosarcina barkeri) tRNAPyl/PylRS pair4,5 that are mostly employed for suppression of the amber stop codon, TAG. This codon is competitively recognized by both the amber suppressor tRNA and release factor 1 (RF-1), which terminates translation and limits

amber suppression efficiency. This results in a varied reduction of protein expression yields in the range of ~20 % compared to progenitor proteins without amber codon.1 Various efforts to increase this efficiency have been introduced, including 1.) the increase of expression levels of tRNA and aaRS by the optimization of promoters and codon usage, or by the introduction of multiple copies of tRNA and aaRS into expression plasmids,6-9 2.) the optimization of tRNAs for improved aminoacylation or improved delivery by elongation factor Tu (EF-Tu) to the ribosome,9,10 and 3.) reduced amber codon recognition by RF-1 by its genomic knock-out,11-14 by engineering RF-1 itself15-17 or its binding site at the ribosome.18,19 However, a factor that can reduce amber suppression efficiency, but whose influence is poorly understood, is the local sequence context of the target amber codon. This factor introduces an additional layer of complexity into the process that may limit amber suppression efficiency even in highly optimized systems as an overlying effect. Since the initial finding that amber suppression efficiency strongly depends on the identity of the 3´-nucleotide (nt) adjacent to the amber codon,20,21 influence of several other nucleotide and/or amino acid positions have been reported. A variety of potential 1

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mechanisms have been proposed to explain these dependences, including 1.) interactions between nucleotides of the mRNA downstream of the UAG stop codon with RF-122 or the amber suppressor tRNA,23 2.) influence of the last two amino acids preceding a stop codon, potentially dependent on specific amino acid properties such as polarity, charge, or the propensity to form specific secondary structures in the nascent peptide,24-27 and 3.) direct interactions between the P-site tRNA and the suppressor tRNA.26,28,29 Though these studies focused on homologous amber suppressor tRNAs and canonical amino acids, they suggest a high potential for improvement of amber suppression efficiency also with heterologous, orthogonal tRNA/aaRS pairs and ncAA. Indeed, positiondependence of protein expression yields is frequently observed with such pairs.1 Since this latter dependence has not been systematically studied, the selection of permissive sites for ncAA-incorporation into proteins is typically based on empiric testing and general assumptions of protein mutagenesis, e.g. that ncAA are best tolerated at surface-exposed, solvent accessible sites without specific secondary structure. Here we report evolved sequence contexts for highly efficient amber suppression in E. coli for both the M. jannaschii tRNATyr/TyrRS pair and the M. mazei tRNAPyl/PylRS pair in combination with a set of structurally diverse ncAA. These sequence contexts provide significantly increased amber suppression in standard E. coli strains, with little dependence on the ncAA structure. The efficiencies rival the ones of sense codon translation and are transferrable between proteins. The sequence contexts thus represent potential general tags for robust and efficient incorporation of ncAA into proteins. Moreover, the gained insights into the sequence preferences of amber suppression generally provide design rules for the engineering of amber codons into target genes.

Page 2 of 11

terminus and included a flexible GGAS linker (Fig. 1A and B). This design should provide selective insights only into the sequence dependence of amber suppression with minimized interference of individual sequences on the overall protein function.

RESULTS Sequence-context dependence of amber suppression. Previous studies with homologous suppressor tRNAs and canonical amino acids have shown that the two codons upstream of the amber codon strongly affect stop codon suppression,24-27 whereas codons further upstream appear not to be critical.27 Moreover, the majority of the nucleotide positions of these two codons show genomic bias in highly expressed E. coli genes, reflecting potential roles in termination fidelity.30-32 Finally, both nucleotides directly downstream of the amber codon, most importantly the +4 nucleotide, have been shown to strongly affect amber suppression efficiency.22,23,33 To gain insights into the sequence context dependence of amber suppression efficiency with heterologous, orthogonal tRNA/aaRS pairs and ncAA by a comprehensive and unbiased approach, we created a random library of sequences around an amber codon. Based on the aforementioned data, we designed a random region of two codons up- and downstream of the amber codon (nucleotide positions -6 to -1 and +4 to +9, see Fig. 1A) that included all amino acid variations at all relevant positions with minimized bias (NNK-NNK-TAG-NNK-NNK, resulting in a theoretical diversity of ~106). This design also includes all nucleotide variations at the important +4 and +5 nucleotide positions and covers the majority of other nucleotide positions at the four chosen codons. We fused this library to a gene encoding green fluorescent protein (GFP) because of its stable fold and robustness in the context of fusion constructs. We appended the random region to the unstructured N-

Figure 1: Sequence context dependence of amber suppression efficiency and ncAA used in this study. A: Design of a fusion construct consisting of an N-terminal random sequence library around an amber codon and of a C-terminal GFP. Nucleotides (nt) of the random region are shown (N = A, G, C, or T; K = G, T). Amino acid (aa) sequences are shown in single letter code. tRNA is shown as dark grey sticks and ncAA as red circle. Nucleotide positions are numbered starting with the first nt of the amber codon, TAG. B: Structures of GFP (green, top, pdb entry 1GFL34) and chloramphenicol-acetyltransferase (grey, bottom, pdb entry 1PD5). Sites of fusion of sequence libraries at the N-termini (“N”) are shown as black lines. C: Chemical structures of ncAA used in this study. D: GFP expression levels of a collection of randomly selected E. coli clones each expressing a random sequence context mutant of GFP as shown in Fig. 1A and co-expressing tRNATyr/YRSpCN in presence of 1 mM ncAA 1. AU: arbitrary units.

We cloned the library under control of an araBAD promoter and transformed the resulting plasmid into E. coli GH371 containing plasmid pREP_pCNF. This plasmid encodes an orthogonal tRNA/aaRS pair of M. jannaschii consisting of an amber suppressor tRNATyr under control of the proK promoter and a polyspecific TyrRS mutant35 previously selected for the genetic encoding of p-cyano-L-phenylalanine (p-CNF)36 under 2

ACS Paragon Plus Environment

Page 3 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

control of a glnRS´-promoter.8 The polyspecificity of this aaRS enables straightforward testing of amber suppression efficiency with different ncAA without the need for customization. We randomly selected library clones and expressed GFP in presence of 1 mM p-acetyl-L-phenylalanine (pAcF 1, Fig. 1C), a widely used ncAA for bioorthogonal conjugation reactions.37,38 Fluorescence measurements of the expression cultures corrected for cell densities revealed ≥32-fold differences between the individual clones, being in a similar range than previously reported for homologous amber suppressor tRNAs39 (Fig. 1D). This demonstrates a high potential to improve the efficiency and robustness of amber suppression by the optimization of sequence contexts. Directed evolution of sequence contexts for highly efficient amber suppression. To evolve sequence contexts for highly efficient amber suppression, we constructed a library with an equivalent design as above using chloramphenicol(cam)acetyltransferase (CAT) as target protein. The random region was again fused to the unstructured N-terminus via a GGAS linker (Fig. 1A and B) and cloned under control of an araBAD promoter. This library was transformed into E. coli GH371 bearing either plasmid pREP_pCNF or pREP_PylRS_AF. The latter plasmid encodes tRNAPyl under control of the proK promoter and a rationally designed, polyspecific PylRS mutant40 under control of a glnRS´-promoter.8 This allows the assessment of individual random sequences for maximal amber suppression in combination with both the M. jannaschii tRNATyr/TyrRS pair and the M. mazei tRNAPyl/PylRS pair and with different ncAA by directed evolution (a negative counterselection for the removal of sequence contexts that may promote suppression with canonical amino acids turned out to be unnecessary, see below). We chose ncAA 1 and 6 for evolution experiments because of their significant structural dissimilarity and their usefulness for protein labeling37,38 and crosslinking,41,42 respectively. 5 x 106 cfu of the library were subjected to iterative cycles of positive selection. Specifically, cells were grown at 37 °C for 15 h on LB agar plates containing tetracyclin and carbenicillin for plasmid propagation, as well as 0.2 % arabinose, >34 µg/ml cam and 1 mM ncAA 1 or 6. After the first round, plates were scraped and a diversity of each 1 x 106 cfu was subjected to additional, iterative rounds of positive selection, until convergence of the libraries was indicated by multiple occurrence of at least one clone in at least eight randomly picked clones of the selection round. This was observed after five and eleven rounds, respectively. Features of identified sequence contexts. The sequences of clones from final selection rounds with 1 and 6 are shown in Table 1A and B, respectively (for mRNA sequences of clones of earlier selection rounds, see the Supporting Information, SI Tables 2-3). The most striking feature of the mRNA sequence of all clones in both selections was a strong preference for a low GC content (15 % and 22.5 %, respectively) and in particular for the presence of A (82.5 % and 77.5 %, respectively) at all N(nt) random positions (i.e. the first two nt of NNK, Table 1, see also Fig. 2A for frequency plots). In contrast, the K(nt) random positions that allow G or T showed a relatively balanced G-content 55 %. This preference was relatively independent of the position of the nucleotide within the random region, with the exceptions that nt +5 for tRNATyr/TyrRS and nt -3 for tRNAPyl/PylRS showed a comparably higher variability and nt -6 for tRNAPyl/PylRS was exclusively C (Fig. 2A). Similarly, a strong overall preference for specific amino acids

was observed at all positions (Table 1). 34 % and 30 % of all amino acid occurrences were asparagine or lysine, respectively. Glutamine as the third most abundant amino acid had a frequency of only 13 % and histidine and threonine had frequencies of 9 % and 8 %, respectively. Only four other amino acids were found with lower frequency, whereas eleven amino acids were not present at all (Fig. 2B, for amino acid sequences of clones of earlier selection rounds, see the SI). Table 1. Isolated Clones from Selections for Maximal Amber Suppression Efficiency

This represents a considerable preference for amide side chains (with asparagine preferred over glutamine) and a preference for lysine and partially for histidine, whereas arginine was not found. There was no pronounced dependence of the preferences on the position within the random region. Though the identified sequence contexts of both selections exhibited certain differences, their overall similarity raised the question, if the observed preferences were a result of general 3

ACS Paragon Plus Environment

ACS Chemical Biology effects, such as high general translation efficiencies or benefi-

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 11

clones that promote the highest amber suppression efficiencies, we performed individual growth assays with all hits of both selections with ncAA 1 and 6 (Fig. 3). Negative controls were performed with E. coli transformed with the pooled, unselected library of sequence contexts (L-Pool, Fig. 3) and three single clones of this library that were randomly selected (L1-3, Fig. 3, these clones showed sequences that were unrelated to the identified sequence contexts, see the SI).

cial stability/hydrophilic character of the sequences, rather then specific preferences of amber suppression. Figure 2: Features of sequence contexts identified by directed evolution for maximal amber suppression efficiency using random sequence context library-CAT fusion constructs. A: Sequence frequency plots of mRNA nucleotides within random region in identified sequence contexts from Table 1. Non-randomized codon is marked with a black frame. B: Sequence frequency plots of amino acids within random region in identified sequence contexts from Table 1. Non-randomized codon is marked as a grey box. Color code for amino acid side chains: red = acidic, black = hydrophobic, magenta = amide, blue = basic, green = others.

We performed control selections with random libraries designed as above, but having the amber codon replaced by the most frequently used codon of E. coli for the two structurally most closely related canonical amino acids tyrosine (TAT) and lysine (AAA), respectively. Selections were conducted as above in the absence of ncAA and convergence was observed after 15 and 12 rounds, respectively. Strikingly, the selections afforded sequence contexts that were completely unrelated to the ones identified in the selections with 1 and 6 (Table 1 C, D and Fig. 2A, B, for mRNA sequences of clones of earlier selection rounds, see SI Tables 4-5). For mRNA sequences, no pronounced preference for A was observed, with balanced GC contents of 50 % and 43.8 % for the N(nt) random positions in selections with codon TAT and AAA, respectively (Fig. 2A). In contrast to the previous selections that afforded N and K as the most frequent amino acids, only a single N was found. Moreover, the majority of identified amino acids were not found in previous selections, such as D, L, C, Y, G, F and R (Fig. 2B). mRNA and amino acid sequences for selections with codon TAT and AAA exhibited marked differences between each other (Table 1 and Fig. 2B). Amber suppression efficiencies of identified sequence contexts. To identify the sequence contexts among the selected

Figure 3: Growth assays on cam media with clones identified by directed evolution for maximal amber suppression efficiency using random sequence context library-CAT fusion constructs. A: Growth assays with identified clones from directed evolution with ncAA 1 in presence or absence of 1 mM 1. B: Growth assays with identified clones from directed evolution with ncAA 6 in presence or absence of 1 mM 6. L-total: pooled, unselected library. L1-3: Randomly picked individual clones of unselected library.

Diluted overnight cultures were printed on LB agar containing tetracyclin and carbenicillin for plasmid propagation, 0.2 % arabinose, no ncAA or 1 mM ncAA 1 or 6 and no cam or cam at 40, 80 or 120 µg/ml (Fig. 3). Monitoring of growth after 15 and 24 h incubation at 37 °C revealed enrichment for sequence contexts with significantly higher amber suppression efficiencies than the ones of the negative controls. Clones 1-R5-2 (amino acid sequence NN / TK around the amber codon) and 6-R11-1 (HH / KN) that occurred with the highest frequency in the two selections (compare Table 1A, B) showed the highest growth rates among hits (Fig. 3A, B). For the selected sequence contexts, growth was also observed in the absence of ncAA, in particular for the fastest growing clones of the selection with 1. This reflects promiscuity of the employed aaRS in absence of ncAA that however does not occur in the presence of ncAA. p-CNF-RS has previously been observed to be promiscuous in respect to L-phenylalanine in the absence of a cognate ncAA, but selective incorporation is observed in the presence of ≥1 mM of various ncAA within the detection limits of MS.35 Indeed, we confirmed the selective incorporation of 1 even into sequence context 1-R5-2 (that exhibited the highest growth in absence of ncAA) by ESI-MS (see Fig. 5D). To benchmark the efficiencies of amber suppression with the two clones 1-R5-2 and 6-R11-1, we conducted comparative growth assays using positive controls without amber codons. As first control, we employed a CAT gene with an N-terminal 4

ACS Paragon Plus Environment

Page 5 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

FLAG tag replacing the random region. This widely used tag is small, hydrophilic and well tolerated by various proteins.43 This provides a challenging benchmark of the suitability of sequence contexts as tags appended to target proteins. As second control, we performed growth assays with clones having the identical sequence as 1-R5-2 and 6-R11-1, but having the amber codon replaced by the most frequent codons of E. coli for the structurally most closely related amino acids tyrosine (TAT) and lysine (AAA), respectively (Fig. 4A, dataset 1 and Fig. 4B). This allows analyzing the influences of only the involved codon without influence of the general sequence context and minimal deviation from the testet ncAA structures. All positive control clones exhibited growth up to a cam concentration of 200 µg/ml in the presence and absence of ncAA 1 and 6. This was observed for both sequence contexts 1-R5-2(TAT) and 6-R11-1(AAA) and for the FLAG tag (Fig. 4A, dataset 1 and Fig. 4B). Clone 1-R5-2 bearing an amber codon did not grow in absence of 1, but exhibited growth in presence of 1 that was similar to the growth of the positive controls, i.e. up to 200 µg/ml.

Figure 4: Growth assays on cam media with hit clones identified by directed evolution for maximal amber suppression efficiency using random sequence context library-CAT fusion constructs with different ncAA. FLAG: positive control with a FLAG tag without amber codon replacing the sequence contexts 1-R5-2 and 6-R11-1 of CAT fusion constructs. 1-R5-2 (TAT) and 6-R11-1 (AAA): clones with the amber codon of 1-R5-2 and 6-R11-1 replaced by the sense codons AAA and TAT, coding for Lys and Tyr, respectively. For chemical structures of employed ncAA, see Fig. 1C.

This indicates that the identified sequence context promotes amber suppression with a similar efficiency as observed for sense codon translation and does not affect the function of the tagged protein to a greater extent than the widely used FLAG tag. Clone 6-R11-1 exhibited a growth up to 100-150 µg/ml, which was lower than the positive controls, but significantly higher than non-selected library members (Fig. 4 and Fig. 3B). Importantly, both the selections and the growth assays were conducted under non-optimal conditions, i.e. in presence of only 1 mM ncAA and using the non-optimized plasmid pREP_PylRS_AF that shows significantly lower aaRS expression levels than optimized constructs.8 These conditions were chosen (and proved successful) for the identification of optimized sequence contexts, but were not expected to directly result in maximal amber suppression themselves. Indeed,

introduction of sequence context 6-R11-1 into E. coli thioredoxin and expression under optimized conditions afforded suppression efficiencies that strongly exceeded typical efficiencies observed without the use of optimized sequence contexts (see below). Dependence of amber suppression efficiency of identified sequence contexts on ncAA structures. To assess potential influences of the ncAA structure on amber suppression efficiencies for the selected sequence contexts, we performed growth assays as above with ncAA 2 - 5 (for chemical structures, see Fig. 1B) in combination with context 1-R5-2. These ncAA differ from 1 in steric demand and polarity of the parasubstituent and in the electronic properties of the phenyl ring. The growth of clone 1-R5-2 in the presence of these ncAAs exhibited only little variation (growth between 150 - 200 µg/ml cam in all cases) and did not strongly differ from the growth of the positive controls: it was similar for ncAA 2, slightly higher for ncAA 3, and slightly lower for ncAA 4 - 5. These data suggest that the identified sequence context does not introduce a pronounced, intrinsic dependence of suppression efficiency on the structure of the ncAA. This provides a wide applicability for highly efficient amber suppression. Transferability between proteins. We next tested transferability of the identified sequence contexts between proteins in order to evaluate their applicability as general tags for highly efficient ncAA incorporation into proteins. We introduced sequence contexts 1-R5-2 and 6-R11-1 to the N-terminus of E. coli thioredoxin (TRX) bearing a C-terminal His6 tag under control of an araBAD promoter and transformed the resulting plasmids into E. coli GH371 containing plasmid pEVOL_pCNF or pEVOL_PylRS_AF, respectively (compared to the previously used pREP plasmids, these plasmids contain an additional aaRS gene under control of an araBAD promoter).8 As controls, we constructed equivalent TRX fusions with sequence contexts 1-R5-2(TAT) and 6-R111(AAA) as above, as well as with the randomly picked sequence contexts L1-3 of the unselected libraries (compare Fig. 3). Expression, Ni-NTA purification and analysis by SDS PAGE indicated significantly increased protein expression levels for 1-R5-2 compared to L1-3 in presence of 1, confirming the previous observations made in growth assays (Fig. 5A, compare to Fig. 3A). Strikingly, the expression level of TRX bearing sequence context 1-R5-2 was higher than the level of TRX bearing 1-R5-2(TAT) without the amber codon. Quantification by a BCA assay revealed expression yields of 14.8 and 13.3 mg/L, respectively (Fig. 5C, corresponding to >110 % yield for amber suppression compared to sense codon translation). Moreover, whereas the expression level of TRX bearing sequence context 6-R11-1 was slightly lower than the level of TRX bearing 6-R11-1(AAA), it was still very high compared to typical relative yields (i.e. ~20 %) obtained for protein expressions with amber suppression in standard E. coli strains (Fig. 5B, 5.6 and 8.0 mg/L, corresponding to 70 % yield for amber suppression compared to sense codon translation, Fig. 5C).

5

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 11

DISCUSSION

Figure 5: Transfer of evolved sequence contexts to a second target protein. A: SDS-gel analysis of Ni-NTA purified expressions of E. coli thioredoxin bearing N-terminal fusions of sequence contexts 1-R5-2(TAT) with a tyrosine encoding sense codon, 1-R5-2 with an amber codon, and randomly picked sequence contexts L1 - 3 with amber codons. Expressions were conducted in absence (left) or presence (right) of 3 mM 1. B: SDS-gel analysis of Ni-NTA purified expressions of E. coli thioredoxin bearing N-terminal fusions of sequence contexts 6-R111(AAA) with a lysine encoding sense codon, 6-R11-1 with an amber codon, and randomly picked sequence contexts L1 - 3 with amber codons. Expressions were conducted in absence (left) or presence (right) of 3 mM 6. C: Expression yields of E. coli thioredoxin constructs of Fig. 5A, B in presence of 3 mM 1 or 6 as indicated, quantified by a BCA assay. D: ESI-MS spectra of NiNTA purified expressions of TRX either bearing 1-R5-2(TAT) conducted in absence of ncAA or bearing 1-R5-2 conducted in presence of ncAA 1.

Finally, expression in absence of ncAA was low for both sequence contexts, indicating that ncAA incorporation occurred with high fidelity. The slightly increased expression observed for 1-R5-2 in absence of 1 (Fig. 5A) again reflected promiscuity of p-CNF-RS (see above), since ESI-MS analysis of the purified TRX proteins bearing 1-R5-2(TAT) and 1-R5-2 expressed in absence and presence of 1, respectively, confirmed the selective incorporation only of 1 in the latter case (Fig. 5D, TRX-1-R5-2(TAT) mass found (Da): 15717; TRX1-R5-2 bearing ncAA 1, mass found (Da): 15743. Mass difference between 1 and Tyr calculated: 26 Da, protein mass difference found: 26 Da. For full ESI-MS spectra, see SI Fig. 13). Importantly, these ESI-MS analysis additionally proved that both identified sequence contexts are stable in E. coli (see the SI). These data show that the high amber suppression efficiencies observed in the context of CAT-based growth assays can be transferred between proteins and thus that the identified sequence contexts represent potential general tags for efficient and robust ncAA incorporation into proteins.

Our studies aimed at the empiric identification of sequence contexts for optimized amber suppression efficiency and thus highly efficient incorporation of ncAA into proteins. The studies were designed in an application-driven manner, i.e. multiple codons were assessed simultaneously to enable synergistic effects and the selective pressure was set to act on all relevant aspects of ribosomal translation simultaneously. Hence, multiple factors could be responsible for the observed enhancement of amber suppression, including interactions between mRNA and tRNA/ribosome, RF-1-mediated termination fidelity, folding of the nascent peptide and others. Nevertheless, our findings provide valuable empiric insights into the sequence context-dependence of amber suppression that can directly be exploited for applications. The data reveal individual sequence context preferences of the codons TAG, TAT and AAA and/or of the employed tRNA/aaRS pairs for highly efficient suppression/translation. The finding that selections conducted with the amber codon TAG and the two different employed orthogonal tRNA/aaRS pairs in combination with ncAA 1 and 6 resulted in related, but still different sequence contexts suggests individual influences of the aminoacyltRNAs themselves. It is currently not clear, if mRNA- or peptide-related aspects (or both) are responsible for the observed preferences, since the library design did not allow for differential codon usage for the most frequently identified amino acids (Fig. 2B). Several findings indicate that the identified sequence contexts represent potential general tags for robust and efficient incorporation of ncAA into proteins with both tested orthogonal tRNA/aaRS pairs in standard E. coli strains. The observed amber suppression efficiencies were similar and partially even higher than the efficiencies observed for sense codon translation and were transferable between proteins. They further did not exhibit a pronounced dependence on the structure of the incorporated ncAA. Moreover, in a comparison to the widely used FLAG tag, the sequence contexts did not result in an altered function of the target protein. Though the accuracy offered by the incorporation of ncAA at single, user-defined sites is one of the main advantages of genetic code expansion, the approach can suffer from low incorporation efficiencies in a poorly understood, context-dependent manner. A tag-based approach reduces this accuracy, but holds potential to generally offer reliable, highly efficient incorporation of ncAA. In fact, many applications do not fully require a maximal accuracy, such as fluorescent labeling, PEGylation or immobilization. 44-47 A future analysis of the applicability of the evolved tags in internal or C-terminal positions of target proteins may increase the scope of this approach. Besides affording novel tags, our study also provides design rules for the general engineering of amber codons into target genes, i.e. without the requirement for tags. On the basis of the present results, a high content of A nucleotides in the mRNA sequence is a beneficial feature. In particular, the choice of A at the +4 position (that occurred in 15 of 16 selected clones, Fig. 1A, B) is in agreement with previous studies in the context of homologous amber suppressor tRNAs and canonical amino acids.23,33,39 The use of these insights as design rules may help to identify permissive sites and/or to exploit codon degeneracy for the optimization of flanking codons by silent mutations. We therefore believe that our findings will find wide application in the community. 6

ACS Paragon Plus Environment

Page 7 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

METHODS Materials. The GH371 strain of E. coli (iGEM) was used to clone or propagate plasmid DNA. The Gene JET plasmid miniprep kit was used to purify plasmid DNA, the Gene JET PCR purification kit and Gene JET Gel extraction kit were used to purify DNA from polymerase chain reactions (PCRs) or restriction enzyme digests and from agarose gels, respectively (Thermo Fisher Scientific, Waltham, MA, USA). PhusionHF DNA polymerase (NEB, Ipswich, MA, USA) was used for ordinary PCRs and PfuUltraHF (Agilent, Santa Clara, CA, USA) for QuikChange mutagenesis. DNA oligonucleotides were purchased from Sigma. For oligonucleotide sequences, see the SI. Restriction enzymes and T4 DNA ligase were purchased from NEB. Antibiotics, LB media, LB agar media, and L-arabinose were obtained from Roth. B-Per lysis reagent and GelCode blue staining were purchased from Thermo Fisher Scientific. Phenylmethanesulfonyl fluoride (PMSF) was obtained from Sigma Aldrich and Ni-NTA agarose from Qiagen. Paper filter spin columns, BCA kit and Slide-A-Lyzer mini dialysis units (molecular weight cut-off 3.000 Da) were purchased from Pierce. Vector construction. For the construction of pREP_PylRS_AF (pMoP313), CAT was removed from pREP_PylRS_AF_CAT-only (pMoS188)42 by restriction digestion with SpeI / BamHI. The product was blunted with Klenow fragment and the vector re-ligated. For the construction of pREP_pCNF (pMoP317), a cassette containing M. jannaschii derived tRNACUA under control of a proK promoter and M. jannaschii TyrRS_pCNF under control of the glnRS’ promoter was amplified from pEVOL_pCNF35 using primers oDaS21 / oDaS22 and cloned into the XbaI / SphI sites of pREP_PylRS_AF. For the construction of the pBAD_GFPwt sequence context library, a HindIII site was introduced into plasmid pBAD_GFP-TAG39_NheI+ (pDaS93)41,42 by QuickChange mutagenesis using primers oMoP690 / oMoP691. CATwt was amplified from a pEVOL backbone8 using primers oMoP699 / oMoP700 and cloned into the NheI / NotI sites of the obtained plasmid, resulting in pBAD_Flag_CATwt (pMoP319). GFPwt was amplified from pBAD_Flag_GFPwt (pDaS72) using primer oMoP838 and primer library oMoP837 containing the random region, and cloned into the HindIII / NotI sites of pMoP319. For the construction of pBAD_CATwt sequence context libraries, CATwt was amplified from the pEVOL backbone8 using primer oMoP700 and primer library oMoP817 and cloned into the HindIII / NotI sites of pMoP315. The theoretical diversity was covered >19-fold. The pBAD_(TAT)_CATwt and pBAD_(AAA)_CATwt sequence context libraries were constructed like pBAD_CATwt sequence context libraries using primer oMoP700 in combination with primer libraries oMoP1120 and oMoP1121, respectively. The theoretical diversity was covered >6-fold. For the construction of pBAD_1-R5-2(TAT)_CATwt and pBAD_6R11-1(AAA)_CATwt, CATwt was amplified from a pEVOL backbone8 using primers oMoP1032 / oMoP700 and oMoP1033 / oMoP700, and cloned into the HindIII / NotI sites of pMoP315. For the assembly of pBAD_tag_TRX-6His constructs, tag_TRX-6His was amplified from pBAD_TRX-His6 (pSuE177)41 with primer oDaS338 combined with oMoP1103 (L1), oMoP1104 (L2), oMoP1105 (L3), oMoP1019 (1-R5-2), oMoP1020 (1-R5-2(TAT)), oMoP1060 (6-R11-1) or oMoP1061 (6-R11-1(AAA)). The PCR products were cloned into the NcoI / NotI sites of pSuE177.

Measurement of cellular GFP fluorescence. Randomly chosen single clones of described transformed library were used to inoculate 1 ml LB media supplemented with 50 µg/ml carbenicillin, 12.5 µg/ml tetracycline, and 1 mM 1 in 96deepwell plates. The cultures were incubated in a vibrating platform shaker at 1350 rpm and 37 °C until an OD600 of 0.4 was reached. Protein expression was induced with 0.04% (w/v) L-arabinose. The culture was harvested after 4 h of incubation by centrifugation (10 min, 3320 x g, room temperature). Cell pellets were washed twice with 1x PBS and fluorescence was determined using a Tecan M200 plate reader with an excitation wavelength of 475 nm and emission detection at 510 nm. Fluorescence values were corrected for cell densities based on OD600 measurements of the cultures. Directed evolution of sequence contexts. The selections were carried out on LB agar plates supplemented with 50 µg/ml carbenicillin, 12.5 µg/ml tetracycline, 34 µg/ml chloramphenicol (continuously increasing concentrations for pBAD_tag(TAT)_CATwt and pBAD_tag(AAA)_CATwt library selections), 0.2 % (w/v) L-arabinose and 1 mM ncAA 1 or 6. The libraries were plated and grown at 37 °C overnight. Cells were scraped and 106 cfu were replated on selection plates as above for additional selection rounds. DNA and amino acid sequence frequency plots were generated on the website http://weblogo.berkeley.edu/logo.cgi. Growth assays. Single clones obtained from selections were inoculated into 1 ml LB media supplemented with 50 µg/ml carbenicillin and 12.5 µg/ml tetracycline in 96-deepwell plates. The cultures were incubated in a vibrating platform shaker at 37°C, 1350 rpm over night. 5 µl of the culture were spotted on LB agar plates supplemented with 50 µg/ml carbenicillin, 12.5 µg/ml tetracycline, 0.2% (w/v) L-arabinose, increasing cloramphenicol concentrations (0, 40, 80, 120 µg/ml and 0, 100, 150 200 µg/ml ) and 1 mM 1 – 6 or equivalent volumes of H2O. Plates were incubated at 37°C and the growth was imaged 15 h / 24 h and 12 h post spotting. Purification of TRX proteins. Single clones were grown in LB media supplemented with 50 µg/ml carbenicillin and 34 µg/ml chloramphenicol at 37 °C and 200 rpm shaking overnight. These cultures were diluted 50-fold into 5 ml fresh LB media supplemented with analog antibiotic concentrations. The culture was grown at 37 °C, 200 rpm shaking and induced at an OD600 of 0.4 with 0.2 % (w/v) L-arabinose. ncAA 1 and 6 were used at 3 mM. After 6 hours growth, the cultures were harvested by centrifugation (10 min, 3320 x g, room temperature) and the pellets lysed with 500 µl B-Per reagent containing 1mM PMSF and shaking at room temperature at 1000 rpm for 20 min. The mixtures were pelleted by centrifugation (5 min, 20817 x g), the supernatants were recovered and 50 µl Ni-NTA agarose slurry were added. The suspensions were shaken at 1000 rpm for 20 min at room temperature and each transferred to a paper filter spin column. After centrifugation (106 x g, 10 s), the resins were washed 3x with 700 µl wash buffer (50 mM NaH2PO4, 300 mM NaCl, pH = 8.0, +20 mM imidazole) and 1x with 700 µl wash buffer (wash buffer +50 mM imidazole). Proteins were eluted two times with 50 µl elution buffer (wash buffer +500 mM imidazole). Purity and identity of the obtained protein was analyzed by SDS PAGE and GelCode blue staining. Quantification of the protein samples was performed using a BCA assay.

ASSOCIATED CONTENT 7

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information. Sequences of oligonucleotides, libraries, selected clones and proteins, complete ESI-MS mass spectra. This material is available free of charge via the Internet at http://pubs.acs.org.

AUTHOR INFORMATION Corresponding Author * [email protected]

Author Contributions The manuscript was written by D.S. / All authors have given approval to the final version of the manuscript.

Funding Sources This work was supported by grants from the Deutsche Forschungsgemeinschaft (SU 726/1-1 and SU 726/2-1 within SPP1623)

ACKNOWLEDGMENT M. Schmidt acknowledges a Hoechst Fellowship of the Aventis Foundation. We thank P. Schultz for plasmid pEVOL-pCNF.

REFERENCES (1) Liu, C. C., Schultz, P. G. (2010) Adding New Chemistries to the Genetic Code, Ann. Rev. Biochem. 79, 413. (2) Davis, L., Chin, J. W. (2012) Designer proteins: applications of genetic code expansion in cell biology, Nat. Rev. Mol. Cell Biol. 13, 168-182. (3) Wang, L., Brock, A., Herberich, B., Schultz, P. G. (2001) Expanding the genetic code of Escherichia coli, Science 292, 498-500. (4) Blight, S. K., Larue, R. C., Mahapatra, A., Longstaff, D. G., Chang, E., Zhao, G., Kang, P. T., Church-Church, K. B., Chan, M. K., Krzycki, J. A. (2004) Direct charging of tRNA(CUA) with pyrrolysine in vitro and in vivo, Nature 431, 333. (5) Wan, W., Tharp, J. M., Liu, W. R. (2014) PyrrolysyltRNA synthetase: an ordinary enzyme but an outstanding genetic code expansion tool, Biochim. Biophys. Acta 1844, 1059-1070. (6) Ryu, Y., Schultz, P. G. (2006) Efficient incorporation of unnatural amino acids into proteins in Escherichia coli, Nat. Methods 3, 263-265. (7) Cellitti, S. E., Jones, D. H., Lagpacan, L., Hao, X., Zhang, Q., Hu, H., Brittain, S. M., Brinker, A., Caldwell, J., Bursulaya, B., Spraggon, G., Brock, A., Ryu, Y., Uno, T., Schultz, P. G., Geierstanger, B. H. (2008) In vivo incorporation of unnatural amino acids to probe structure, dynamics, and ligand binding in a large protein by nuclear magnetic resonance spectroscopy, J. Am. Chem. Soc. 130, 9268-9281. (8) Young, T. S., Ahmad, I., Yin, J. A., Schultz, P. G. (2010) An enhanced system for unnatural amino acid mutagenesis in E. coli, J. Mol. Biol. 395, 361-374. (9) Chatterjee, A., Sun, S. B., Furman, J. L., Xiao, H., Schultz, P. G. (2013) A Versatile Platform for Single- and Multiple-Unnatural Amino Acid Mutagenesis in Escherichia coli, Biochemistry 52, 1828-1837. (10) Guo, J., Melancon, C. E., 3rd, Lee, H. S., Groff, D., Schultz, P. G. (2009) Evolution of amber suppressor tRNAs for efficient bacterial production of proteins containing nonnatural amino acids, Angew. Chem. Int. Ed. Engl. 48, 9148-9151. (11) Johnson, D. B., Xu, J., Shen, Z., Takimoto, J. K., Schultz, M. D., Schmitz, R. J., Xiang, Z., Ecker, J. R., Briggs, S. P., Wang, L. (2011) RF1 knockout allows ribosomal

Page 8 of 11

incorporation of unnatural amino acids at multiple sites, Nat. Chem. Biol. 7, 779. (12) Johnson, D. B. F., Wang, C., Xu, J. F., Schultz, M. D., Schmitz, R. J., Ecker, J. R., Wang, L. (2012) Release Factor One Is Nonessential in Escherichia coli, ACS Chem. Biol. 7, 1337. (13) Mukai, T., Hayashi, A., Iraha, F., Sato, A., Ohtake, K., Yokoyama, S., Sakamoto, K. (2010) Codon reassignment in the Escherichia coli genetic code, Nucleic Acids Res. 38, 8188-8195. (14) Lajoie, M. J., Rovner, A. J., Goodman, D. B., Aerni, H. R., Haimovich, A. D., Kuznetsov, G., Mercer, J. A., Wang, H. H., Carr, P. A., Mosberg, J. A., Rohland, N., Schultz, P. G., Jacobson, J. M., Rinehart, J., Church, G. M., Isaacs, F. J. (2013) Genomically Recoded Organisms Expand Biological Functions, Science 342, 357-360. (15) Zhang, S., Ryden-Aulin, M., Kirsebom, L. A., Isaksson, L. A. (1994) Genetic implication for an interaction between release factor one and ribosomal protein L7/L12 in vivo, J. Mol. Biol. 242, 614-618. (16) Ryden, S. M., Isaksson, L. A. (1984) A TemperatureSensitive Mutant of Escherichia-Coli That Shows Enhanced Misreading of Uag/a and Increased Efficiency for Some TransferRna Nonsense Suppressors, Mol. Gen. Genetics 193, 38-45. (17) Wu, I. L., Patterson, M. A., Carpenter Desai, H. E., Mehl, R. A., Giorgi, G., Conticello, V. P. (2013) Multiple siteselective insertions of noncanonical amino acids into sequencerepetitive polypeptides, Chembiochem 14, 968-978. (18) Huang, Y., Russell, W. K., Wan, W., Pai, P. J., Russell, D. H., Liu, W. (2010) A convenient method for genetic incorporation of multiple noncanonical amino acids into one protein in Escherichia coli, Mol. Biosyst. 6, 683-686. (19) Wang, K. H., Neumann, H., Peak-Chew, S. Y., Chin, J. W. (2007) Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion, Nat. Biotechnol. 25, 770-777. (20) Bossi, L., Roth, J. R. (1980) The influence of codon context on genetic code translation, Nature 286, 123-127. (21) Ayer, D., Yarus, M. (1986) The context effect does not require a fourth base pair, Science 231, 393-395. (22) Pedersen, W. T., Curran, J. F. (1991) Effects of the nucleotide 3' to an amber codon on ribosomal selection rates of suppressor tRNA and release factor-1, J. Mol. Biol. 219, 231-241. (23) Stormo, G. D., Schneider, T. D., Gold, L. (1986) Quantitative analysis of the relationship between nucleotide sequence and functional activity, Nucleic Acids Res. 14, 66616679. (24) Zhang, S., Ryden-Aulin, M., Isaksson, L. A. (1996) Functional interaction between release factor one and P-site peptidyl-tRNA on the ribosome, J. Mol. Biol. 261, 98-107. (25) Bjornsson, A., Mottagui-Tabar, S., Isaksson, L. A. (1996) Structure of the C-terminal end of the nascent peptide influences translation termination, EMBO J. 15, 1696-1704. (26) Mottagui-Tabar, S., Bjornsson, A., Isaksson, L. A. (1994) The second to last amino acid in the nascent peptide as a codon context determinant, EMBO J. 13, 249-257. (27) Mottagui-Tabar, S., Isaksson, L. A. (1997) Only the last amino acids in the nascent peptide influence translation termination in Escherichia coli genes, FEBS Lett. 414, 165-170. (28) Smith, D., Yarus, M. (1989) tRNA-tRNA interactions within cellular ribosomes, Proc. Natl. Acad. Sci. U S A 86, 43974401. (29) Kato, M., Nishikawa, K., Uritani, M., Miyazaki, M., Takemura, S. (1990) The difference in the type of codonanticodon base pairing at the ribosomal P-site is one of the determinants of the translational rate, J. Biochem. 107, 242-247. (30) Tate, W. P., Mannering, S. A. (1996) Three, four or more: the translational stop signal at length, Mol. Microbiol. 21, 213-219. 8

ACS Paragon Plus Environment

Page 9 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

(31) Cridge, A. G., Major, L. L., Mahagaonkar, A. A., Poole, E. S., Isaksson, L. A., Tate, W. P. (2006) Comparison of characteristics and function of translation termination signals between and within prokaryotic and eukaryotic organisms, Nucleic Acids Res. 34, 1959-1973. (32) Brown, C. M., Stockwell, P. A., Trotman, C. N., Tate, W. P. (1990) The signal for the termination of protein synthesis in procaryotes, Nucleic Acids Res. 18, 2079-2086. (33) Bossi, L. (1983) Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message, J. Mol. Biol. 164, 73-87. (34) Yang, F., Moss, L. G., Phillips, G. N., Jr. (1996) The molecular structure of green fluorescent protein, Nat. Biotechnol. 14, 1246-1251. (35) Young, D. D., Young, T. S., Jahnz, M., Ahmad, I., Spraggon, G., Schultz, P. G. (2011) An Evolved AminoacyltRNA Synthetase with Atypical Polysubstrate Specificity, Biochemistry 50, 1894-1900. (36) Schultz, K. C., Supekova, L., Ryu, Y., Xie, J., Perera, R., Schultz, P. G. (2006) A genetically encoded infrared probe, J. Am. Chem. Soc. 128, 13984-13985. (37) Wang, L., Zhang, Z. W., Brock, A., Schultz, P. G. (2003) Addition of the keto functional group to the genetic code of Escherichia coli, Proc. Natl. Acad. Sci. USA 100, 56-61. (38) Chin, J. W., Cropp, T. A., Anderson, J. C., Mukherji, M., Zhang, Z., Schultz, P. G. (2003) An expanded eukaryotic genetic code, Science 301, 964-967. (39) Miller, J. H., Albertini, A. M. (1983) Effects of surrounding sequence on the suppression of nonsense codons, J. Mol. Biol. 164, 59-71. (40) Yanagisawa, T., Ishii, R., Fukunaga, R., Kobayashi, T., Sakamoto, K., Yokoyama, S. (2008) Multistep engineering of pyrrolysyl-tRNA synthetase to genetically encode N(epsilon)-(o-

azidobenzyloxycarbonyl) lysine for site-specific protein modification, Chem. Biol. 15, 1187. (41) Schmidt, M. J., Summerer, D. (2013) Red-LightControlled Protein-RNA Crosslinking with a Genetically Encoded Furan, Angew. Chem. Int. Ed. Engl. 52, 4690. (42) Schmidt, M. J., Weber, A., Pott, M., Welte, W., Summerer, D. (2014) Structural Basis of Furan-Amino Acid Recognition by a Polyspecific Aminoacyl-tRNA-Synthetase and its Genetic Encoding in Human Cells, Chembiochem, 15, 165560. (43) Hopp, T. P., Prickett, K. S., Price, V. L., Libby, R. T., March, C. J., Cerretti, D. P., Urdal, D. L., Conlon, P. J. (1988) A Short Polypeptide Marker Sequence Useful for Recombinant Protein Identification and Purification, Biotechnol. 6, 1204-1210. (44) Deiters, A., Cropp, T. A., Summerer, D., Mukherji, M., Schultz, P. G. (2004) Site-specific PEGylation of proteins containing unnatural amino acids, Bioorg. Med. Chem. Lett. 14, 5743-5745. (45) Wang, J., Xie, J., Schultz, P. G. (2006) A genetically encoded fluorescent amino acid, J. Am. Chem. Soc. 128, 87388739. (46) Summerer, D., Chen, S., Wu, N., Deiters, A., Chin, J. W., Schultz, P. G. (2006) A genetically encoded fluorescent amino acid, Proc. Natl. Acad. Sci. U S A 103, 9785-9789. (47) Seo, M. H., Han, J., Jin, Z., Lee, D. W., Park, H. S., Kim, H. S. (2011) Controlled and oriented immobilization of protein by site-specific incorporation of unnatural amino acid, Anal. Chem. 83, 2841-2845.

9

ACS Paragon Plus Environment

ACS Chemical Biology

Page 10 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

10

Page 11 of 11

1 2 3 4 5 6 7 8 9 10

ACS Chemical Biology

ACS Paragon Plus Environment