A Web Interface for Codon Compression - ACS Publications

May 11, 2016 - implemented these algorithms in a dedicated Web site, with a friendly and .... since the actual computation is being made on a remote s...
2 downloads 0 Views 782KB Size
Technical Note pubs.acs.org/synthbio

A Web Interface for Codon Compression Andrea L. Halweg-Edwards,†,∥ Gur Pines,†,∥ James D. Winkler,† Assaf Pines, and Ryan T. Gill*,† †

Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States ABSTRACT: Saturation mutagenesis is widely used in protein engineering and other experiments. A common practice is to utilize the single degenerate codon NNK. However, this approach suffers from amino acid bias and the presence of a stop codon and of the wild type amino acid. These extra features needlessly increase library size and consequently downstream screening load. Recently, we developed the DYNAMCC algorithms for codon compression that find the minimal set of degenerate codons, covering any defined set of amino acids, with no off-target codons and with redundancy control. Additionally, we experimentally demonstrated the advantages of this approach over the standard NNK method. While the code is freely available from our Web site, we have now made this method more accessible to a broader audience without any computational background by building a user-friendly web-based interface for those algorithms. The Web site can be accessed through: www.dynamcc.com. KEYWORDS: protein engineering, synthetic biology, saturation mutagenesis, codon compression

T

further decreasing library size and screening load. (4) The codon compression algorithms use frequency tables, allowing the selection of the most adequate codons on a situation-dependent basis. These tables may reflect codon usage, tRNA abundance, or other parameters such as the extension of the genetic code to include non-natural amino acids, allowing full flexibility and user control. (5) While the DYNAMCC_0 algorithm eliminates redundancy, in some cases amino acid redundancy may be desired. A second algorithm, DYNAMCC_R, was developed to compress codons while keeping all codons that code for the specified amino acid list. This second algorithm is intended to be used when library size is manageably small, or when it is not a limitation, as in the case of selection in directed evolution experiments. To make these algorithms accessible to a wider audience, we implemented these algorithms in a dedicated Web site, with a friendly and easy-to-use interface. The Web site has usage tables for most model organisms and custom tables can be easily uploaded in cases of exotic organisms or when other parameters apart from usage are to be used. Moreover, since the tables also define the genetic code, custom genetic codes may be uploaded, making this tool accessible for scientists working with synthetic amino acids and genetically recoded organisms.5,6

he saturation mutagenesis approach targets specific sites within a protein for mutagenesis. Saturation may be partial, covering a specific group of amino acids, or complete, with the site being mutated to all possible amino acids. Unlike other methods such as random mutagenesis, gene shuffling, and errorprone PCR, saturation mutagenesis requires a priori knowledge of the sites to be targeted, and thus keeps the library size relatively controllable. Still, depending on the number of sites to be saturated in parallel, minimizing the library size is advantageous to reduce the required screening effort. The NNK (N covers all nucleotides, and K codes for G or T, as defined by the IUPAC alphabet1) codon reduces the library size in half when compared to the use of the NNN codon (32 vs 64 codons). However, since only 20 amino acids are coded for in the genetic code, library size can potentially be further reduced upon the elimination of the NNK redundancy. The effect of codon redundancy on library size and screening load increases exponentially with the number of sites to be saturated, as described by Kille et al.2 This issue has been examined previously, yielding codon collections that reduce redundancy significantly or even completely,2,3 but these solutions were general and static, keeping the wild type amino acid in the pool. We recently developed the DYNAMCC (dynamic management of codon compression) algorithms that specifically find the minimal set of degenerate codons for any set of desired amino acids.4 This approach solves the following problems: (1) The amino acid bias that occurs when using the NNK codon is eliminated, as every amino acid is coded-for only once. (2) While the NNK codon codes for the usually undesired stop codon (TAG), this can easily be removed from the calculated codon collection. (3) Similarly, as the desired amino acids can be precisely defined, the wild type amino acid can also be removed, © XXXX American Chemical Society



WORKING WITH THE WEB SITE The Web site currently includes three different tools: DYNAMCC_0, DYNAMCC_R, and Codon Exploder. DYNAMCC_0. This tool calculates the minimal set of degenerate codons for any given set of amino acids, without Received: January 27, 2016

A

DOI: 10.1021/acssynbio.6b00026 ACS Synth. Biol. XXXX, XXX, XXX−XXX

Technical Note

ACS Synthetic Biology redundancy. As depicted in Figure 1a, the user first selects an organism, which defines the relevant genetic code and codon

the aim is to modify properties in a more controlled manner (for example, based on polarity, hydrophobicity, etc.). The following step is to define the usage threshold definitions. Here, the threshold can be specified by Rank (the most used codon for a specific amino acid is ranked as number 1), or by Usage. If choosing the usage option, the maximal usage setpoint is generated, to make sure all selected amino acids can be included. The output is in a table form (Figure 1b), including the selected codons with their rank and usage, their respective amino acids, and the degenerate codons covering the whole collection. Stop codons are denoted as X. Note that in some instances there might be more than a single solution with a similar ranking weight, which will be reflected by having different outputs to the same query. DYNAMCC_R. This option is to be used when complete redundancy is desired. Since it does not take usage into account, as all relevant codons are included, the target organism is not relevant. Users can select standard genetic code or upload a custom one (Figure 2a). The next steps are similar to those in

Figure 1. An example of the input (a) and output (b) of DYNAMCC_0. Following the organism definition, the users select which amino acids are to be compressed. Here, the stop codon as well as all hydrophobic amino acids are to be removed from the compressed pool. The rank cutoff chosen here is 2. The output table (b) displays the resulting compressed codons, with some additional information. This figure is not a screenshot and was generated in a graphical software for better visualization purposes.

frequencies. Custom tables may also be uploaded at this stage and an example of a valid table format is available under the “Upload custom usage table” button. Next, the user defines whether they want the amino acid list to be kept or removed from the final collection. This option aims to reduce clicking burden when the desired collection includes almost all amino acids or vice versa. Then, the exact collection of amino acids is defined. The amino acids are ordered according to their properties and can be batch-selected for easy definition of functionally similar residues. If a custom genetic code was uploaded, the non-natural amino acids will appear at that window. Note that in most cases, the stop codon should be eliminated from the pool. If more than one site is to be mutated in parallel, the wild type usually should be kept in the codon pool to increase the combinatorial space. A small and focused group of amino acids can also be defined, when

Figure 2. An example of the input (a) and output (b) of DYNAMCC_R. The input parameters are similar to those in Figure 1, with the exception of the organism selection, which is not relevant in this case. Here, the whole redundant space of the selected amino acids is included in the compressed codons. As in Figure 1, this figure is not a screenshot and was generated in a graphical software for better visualization purposes. B

DOI: 10.1021/acssynbio.6b00026 ACS Synth. Biol. XXXX, XXX, XXX−XXX

ACS Synthetic Biology



MATERIAL AND METHODS Web Interface Design. The DYANMCC algorithms were written in Python 2.7 for ease of integration with Tornado, a Python web framework and asynchronous networking library (www.tornadoweb.org). The front-end design was built using a combination of custom JavaScript functions, which were built using general functionality from the JQuery (https://jquery. com/) and underscore (http://underscorejs.org/) libraries, as well as the mobile-friendly, responsive style sheets and JavaScript libraries from Bootstrap 3 (http://getbootstrap.com/). Currently, the www.dynamcc.com domain is hosted at www. pythonanywhere.com.

DYNAMCC_0, excluding the cutoff options. The output format is again the same as in DYNAMCC_0, with the compressed codons covering all codons for the selected amino acids (Figure 2b). Codon Exploder. This is a simple script for expanding compressed codons to their corresponding codons, with the additional data such as usage and the amino acids coded by them. This tool is useful for verifying compressed designs done by our or other tools and to test if they fit the target organism by means of usage. The input field accepts single or multiple compressed codons, and the output is as in the DYNAMCC tools.





DIFFERENCES BETWEEN THE PREVIOUS VERSION TO THE CURRENT ONE Our aim in building this online tool is to expand its potential users to those without computational background or to those who are not interested in installing third-party software on their computers. With this in mind, we tried to keep the Web site as user-friendly as possible. The use of the Web site does not require any registration, the simple user interface allows an easy parameter input, and the output is organized in a clear table format. The code was optimized and rewritten in Python 2.7 and is significantly faster than the original Perl 5 script. Moreover, since the actual computation is being made on a remote server, the Web site may be used from any computer or mobile device, regardless of its computational capabilities. As a result of the mentioned simplifications, some options that existed in the original code were removed since they were somewhat esoteric and will not be useful for most users. The removed functionalities are as follows. DYNAMCC_0: The option to add a small degree of redundancy. This option was very computational intensive and was originally developed for testing our approach against other codon compression solutions that include a small degree of redundancy.2 DYNAMCC_R: While the original algorithm’s output was a list of compressed codons with different degrees of redundancy, the Web site outputs the final, fully redundant solution. This choice was made since we assume that most DYNAMCC_R users will be interested mainly in this output. Advanced users who wish to use these functionalities are encouraged to download our previously reported Perl script from our Web site: http://www.gillgroup. org/links/. The current Python script implemented by the Web server is also publically available to enable its integration in other Python-based design pipelines and can be downloaded at: https://github.com/GillGroup/DYNAMCC. Both source codes are freely available under the BSD 3-clause license, allowing modifications and code redistribution.



Technical Note

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Author Contributions ∥

A.L.H. and G.P. contributed equally to this work.

Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS This work was funded by the U.S. Department of Energy Grant No. DE-SC008812. REFERENCES

(1) Cornish-Bowden, A. (1985) Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13, 3021−3030. (2) Kille, S., Acevedo-Rocha, C. G., Parra, L. P., Zhang, Z.-G., Opperman, D. J., Reetz, M. T., and Acevedo, J. P. (2013) Reducing Codon Redundancy and Screening Effort of Combinatorial Protein Libraries Created by Saturation Mutagenesis. ACS Synth. Biol. 2, 83−92. (3) Tang, L., Gao, H., Zhu, X., Wang, X., Zhou, M., and Jiang, R. (2012) Construction of “small-intelligent” focused mutagenesis libraries using well-designed combinatorial degenerate primers. Biotechniques 52, 149− 158. (4) Pines, G., Pines, A., Garst, A. D., Zeitoun, R. I., Lynch, S. A., and Gill, R. T. (2015) Codon compression algorithms for saturation mutagenesis. ACS Synth. Biol. 4, 604−614. (5) Lajoie, M. J., Rovner, A. J., Goodman, D. B., Aerni, H.-R., Haimovich, A. D., Kuznetsov, G., Mercer, J. A., Wang, H. H., Carr, P. A., Mosberg, J. A., Rohland, N., Schultz, P. G., Jacobson, J. M., Rinehart, J., Church, G. M., and Isaacs, F. J. (2013) Genomically Recoded Organisms Expand Biological Functions. Science 342, 357−360. (6) Mukai, T., Hayashi, A., Iraha, F., Sato, A., Ohtake, K., Yokoyama, S., and Sakamoto, K. (2010) Codon reassignment in the Escherichia coli genetic code. Nucleic Acids Res. 38, 8188−8195. (7) Firth, A. E., and Patrick, W. M. (2008) GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res. 36, W281−5. (8) Engqvist, M. K. M., and Nielsen, J. (2015) ANT: Software for Generating and Evaluating Degenerate Codons for Natural and Expanded Genetic Codes. ACS Synth. Biol. 4, 935. (9) Plotkin, J. B., and Kudla, G. (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32−42. (10) Firnberg, E., Labonte, J. W., Gray, J. J., and Ostermeier, M. (2014) A Comprehensive, High-Resolution Map of a Gene’s Fitness Landscape. Mol. Biol. Evol. 31, 1581−1592.

DISCUSSION

Unlike other codon compression tools, such as the AA calculator,7 DC-analyzer3, or ANT,8 our approach takes into account codon frequency and results in no off-target codons. The cost of such an approach is that it may result in a larger degenerate codon pool hence it is up to the users to decide which approach is best suited for their needs. To our knowledge, DYNAMCC_R is a new concept in codon compression, providing a fully redundant codon collection, allowing one to investigate the space of synonymous mutations.9,10 In addition, since the genetic code and usage tables can be customized by the user, the Web site can be used with any target organism and by a broad range of users, including basic scientists, protein engineers, and synthetic biologists. C

DOI: 10.1021/acssynbio.6b00026 ACS Synth. Biol. XXXX, XXX, XXX−XXX