Encrypted Oligonucleotide Arrays for Molecular Authentication | ACS

Jul 5, 2019 - dicyanoimidazole in acetonitrile was purchased from Glen Research ... 20), ethanol, and ethylenediamine were purchased from Sigma Aldric...
1 downloads 0 Views 3MB Size
Letter Cite This: ACS Comb. Sci. XXXX, XXX, XXX−XXX

pubs.acs.org/acscombsci

Encrypted Oligonucleotide Arrays for Molecular Authentication Matthew T. Holden and Lloyd M. Smith* Department of Chemistry, University of WisconsinMadison, 1101 University Avenue, Madison, Wisconsin 53706, United States

Downloaded via BUFFALO STATE on July 23, 2019 at 12:37:09 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Counterfeiting is an incredibly widespread problem, with some estimates placing its economic impact above 2% of worldwide GDP. The scale of the issue suggests that current preventive measures are either technologically insufficient or too impractical and costly to be widely adopted. High-density arrays of biomolecules are explored here as security devices that can be coupled to a valuable commodity as proof of its authenticity. Light-directed DNA array fabrication technology is used to synthesize arrays that are designed to resist analysis with sequencing-by-hybridization approaches. A relatively simple sequence design strategy forces a counterfeiter to undertake a prohibitively high number of complex experiments to decipher the array sequences employed. KEYWORDS: Encryption, Anticounterfeiting, Authentication, Photolithography, Maskless array synthesis

T

there has not yet been an approach for rendering the arrays themselves resistant to analysis and forgery.13−17 In order for a DNA array to be replicated, a forger would need to determine both the identities of the sequences and their spatial locations on the surface. Sequencing-by-hybridization (SBH) is the most general route to decrypt an array in that it would allow a forger to decipher arrays even of unnatural nucleic acids, such as LNA, L-DNA, or PNA, that are not amenable to current sequencing-by-synthesis approaches.18−20 A drawback of SBH is that de novo sequencing by SBH requires hybridization with relatively short oligonucleotides (reads) so that complete coverage can be achieved in a tractable number of hybridization cycles. The short reads cause a loss of long-range sequence information that can produce ambiguities during assembly, particularly when applied to repetitive sequences. We make use of this limitation of SBH and further complicate the task of a would-be forger by placing two related sequences within each array feature. Arrays containing two sequences per feature can be produced using a maskless fabrication approach that has previously been applied to single nucleotide polymorphism (SNP) detection and RNA array fabrication.14,21,22 Briefly, the extent of photodeprotection is controlled at an early step of the fabrication process so that an orthogonal protecting group is installed, typically by coupling a dimethoxytrityl-protected phosphoramidite, on approximately half of the molecules populating a given site. This protecting group remains attached throughout the light-directed synthesis of the first sequence and is then removed so the second sequence can be generated

he storage and transmission of biological information is the fundamental role of genetic material. As DNA sequencing and synthesis costs have fallen, a variety of encoding schemes have been developed to repurpose DNA as a medium to store arbitrary forms of data. Many of these are concerned with increasing the information storage capacity of DNA, accurate retrieval of the information, or chemical means of preserving the integrity of the DNA molecules.1−4 A small yet diverse group of approaches have also sought to restrict access to the underlying data,5−10 though relatively few examples of such DNA cryptography have entered routine use. Here, a maskless array synthesizer is used for disguising information in a high-density DNA array in an analogy to keybased encryption. The strategy relies upon rendering the arrays resistant to sequencing-by-hybridization techniques, which could otherwise be employed for decryption. The approach is explored as an anticounterfeiting technology. Scheme 1 outlines how an array-based system could be used to ensure the authenticity of a product. Under the assumption that there is a secure mode of communication, fluorescently labeled oligonucleotides act as a private “key” to reveal the information in the array upon hybridization. Each location on the surface exhibits a distinct signal, for multiple fluorescence channels, while the arrangement of the array features can form shapes such as barcodes, lot numbers, or watermarks. Despite this appeal, existing DNA-based anticounterfeiting approaches have not utilized the array itself as the marker of authenticity.11,12 This is likely because the cost of array fabrication has been too high for routine use with inexpensive goods, yet they are too vulnerable to forgery to secure highvalue items. Improvements in millichip technology and increased array manufacture speeds suggest opportunities for drastic enhancements to array fabrication throughput, but © XXXX American Chemical Society

Received: April 28, 2019 Revised: June 25, 2019 Published: July 5, 2019 A

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Letter

ACS Combinatorial Science Scheme 1. Analogy of Private Key Encryption to an Array-Based Antiforgery Device

Scheme 2. Synthesis Approach for Arrays Containing Two Sequences per Feature

length ≥3 × l (making authentication easy for those with the hybridization key). For an array comprised of N features, this design produces 14N possible assemblies for the SBH reads generated when r ≤ l. An underlying assumption is that, when there is little homology between A, B, and C, hybridization conditions exist that can differentiate between the binding of perfectly matched sequences and those which are ∼2/3 complementary. The strategy was investigated computationally to assess the generality of the approach. The 14 feature types shown in Scheme 3 are comprised of 14 different surface oligonucleotides. A, B, and C blocks were randomly generated and used to create the private key sequences and surface sequences. The hybridization specificity was then assessed using the DINAMelt server’s two-state function with DNA energy parameters at 37 °C, and 150 mM Na+ concentration.23 Each private key sequence was tested against the set of the 14 surface sequences for every assignment of A, B, and C. The Tm difference (ΔTm) between the highest calculated value and the other members of the set was then determined. Figure 1 shows

with the remaining fraction of site density. Scheme 2 provides more detail on this approach. An overview of the sequence design is shown in Scheme 3. The concept is to use pairs of sequences at each site so that hybridizations with short oligonucleotides produce reads with multiple possible assemblies. This renders the elucidation of the surface-bound sequences using SBH much more difficult. Consider the parent string ABCAC where A, B, and C are blocks of sequence each of length l. Reads generated from iterative hybridizations to the ABCAC string can identify the sequences of the blocks as well as the junctions between adjacent blocks. If the read length, r, is less than l, a single read cannot span more than one junction simultaneously, so the overall string must be inferred from the block junctions (AB, BC, CA, and AC in the above example). If one employs pairs of sequences made from the same A, B, C sequence blocks in each feature, this design yields 14 configurations that (1) give the same hybridization signature as the parent string when r ≤ l (making decryption difficult for the would-be forger) and (2) can be uniquely identified by hybridization of sequences of B

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Letter

ACS Combinatorial Science Scheme 3. Sequence Design Strategya

a

In the upper left, an example of A, B, and C sequence blocks with a length of 10 nt each, composed into the ABCAC string. The theoretical reads generated from sequencing by hybridization using pentamers are shown below the string. The table at the right provides the architecture (“block structure”) of pairs of sequences which yield the same theoretical SBH reads as the ABCAC string. The “binding pattern” gives the design of fluorescently labeled three-block sequences that when hybridized to the array will distinguish the 14 array features from one another. Subscript “C” indicates complementarity to the sequence of the given block structure (“ABCC” is the complement of sequence ABC). Yellow denotes hybridization signal will be observed, blue indicates no hybridization signal will be observed.

Figure 1. Calculated average Tm differences for hybridization of key oligonucleotides to their targeted surface complements versus other partially complementary background sequences. Box denotes the interquartile range, and whiskers denote outermost values.

C

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Letter

ACS Combinatorial Science

Figure 2. (A) Hybridization of a private key to an array comprised of over 87 000 features. In principle, the hybridization behavior of any combination of features throughout the array may act as an indicator of authenticity. (B) Measurement of relative populations of first and second strands. The inset shown in panel A was bordered by replicate features as shown on the left-hand side. Half of the edges were synthesized as part of the first set of sequences (“I”), while the remainder were synthesized as part of the second set (“II”). Hybridization with a 5′-Cy5 labeled oligonucleotide was used to determine that the proportion of sequence I to sequence II is approximately 1:2 for this example. The DMT-protected amidite used here was a 2′-F-3′-DMT-rC(Ac) 5′-phosphoramidite.

the average ΔTm for 500 randomly generated sets of A, B, and C blocks 10 nt in size. All the average ΔTm values between the undesigned and designed interactions differed by more than 5 °C, indicating a general basis to observe the expected differential hybridization. Approximately 70% of the assignments tested had no undesigned ΔTm values above −3 °C for all key and surface block combinations. This approach is so powerful because the forger does not know the composition of the private key and must decipher every feature on an array to guarantee a successful replicate. Figure 2A shows the hybridization of an oligonucleotide key to an array comprised of 87 164 features, while Figure 2B depicts

the quality control elements used as a measure of the relative population of the sequences within the features. The key is comprised of six Hex-labeled oligonucleotides 32 nt in length and of the form ABCC. The foreground pattern shown in the inset of Figure 2A was fashioned after the Hadamard matrix used to communicate with the Mariner 9 spacecraft and confirms the authenticity of the array.24 However, the wouldbe forger would not be able to determine the identities of the sequences that comprise the array and thus would be unable to fabricate a replicate. The background sequences, where hybridization signal is not observed, were designed using randomly generated sequences of block sizes of either 10 or 11 D

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Letter

ACS Combinatorial Science

(4) Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; LeProust, E. M.; Sipos, B.; Birney, E. Toward Practical High-Capacity LowMaintenance Storage of Digital Information in Synthesised DNA. Nature 2013, 494 (7435), 77−80. (5) Chandrasekaran, A. R.; Levchenko, O.; Patel, D. S.; MacIsaac, M.; Halvorsen, K. Addressable Configurations of DNA Nanostructures for Rewritable Memory. Nucleic Acids Res. 2017, 45, 11459− 11465. (6) Halvorsen, K.; Wong, W. P. Binary DNA Nanostructures for Data Encryption. PLoS One 2012, 7 (9), No. e44212. (7) Gehani, A.; LaBean, T.; Reif, J. DNA-based Cryptography. In Aspects of Molecular Computing; Jonoska, N., Păun, G., Rozenberg, G., Eds.; Springer: Berlin, Heidelberg, 2004; pp 167−188. (8) Clelland, C. T.; Risca, V.; Bancroft, C. Hiding Messages in DNA microdots. Nature 1999, 399, 533. (9) Shoshani, S.; Piran, R.; Arava, Y.; Keinan, E. A Molecular Cryptosystem for Images by DNA Computing. Angew. Chem., Int. Ed. 2012, 51 (12), 2883−2887. (10) Arppe, R.; Sørensen, T. J. Physical Unclonable Functions Generated Through Chemical Methods for Anti-Counterfeiting. Nature Reviews Chemistry 2017, 1, 31. (11) Butland, C. L.; Baggot, B. Labeling Technique for Countering Product Diversion and Product Counterfeiting. U.S. PatentUS6030657A, November 1, 1994. (12) Lawrence, J.; Liang, B. In-field DNA Extraction, Detection and Authentication Methods and Systems Therefor. Canadian PatentCA2959312A1, August 28, 2014. (13) Heinrich, K. W.; Wolfer, J.; Hong, D.; LeBlanc, M.; Sussman, M. R. DNA Millichips as a Low-Cost Platform for Gene Expression Analysis. Plant Physiol. 2012, 159 (2), 548−557. (14) Holden, M. T.; Carter, M. C. D.; Wu, C.-H.; Wolfer, J.; Codner, E.; Sussman, M. R.; Lynn, D. M.; Smith, L. M. Photolithographic Synthesis of High-Density DNA and RNA Arrays on Flexible, Transparent, and Easily Subdivided Plastic Substrates. Anal. Chem. 2015, 87 (22), 11420−11428. (15) Sack, M.; Hölz, K.; Holik, A.-K.; Kretschy, N.; Somoza, V.; Stengele, K.-P.; Somoza, M. M. Express Photolithographic DNA Microarray Synthesis with Optimized Chemistry and High-Efficiency Photolabile Groups. J. Nanobiotechnol. 2016, 14 (1), 14. (16) Kretschy, N.; Holik, A.-K.; Somoza, V.; Stengele, K.-P.; Somoza, M. M. Next-Generation o-Nitrobenzyl Photolabile Groups for Light-Directed Chemistry and Microarray Synthesis. Angew. Chem., Int. Ed. 2015, 54 (29), 8555−8559. (17) Sack, M.; Kretschy, N.; Rohm, B.; Somoza, V.; Somoza, M. M. Simultaneous Light-Directed Synthesis of Mirror-Image Microarrays in a Photochemical Reaction Cell with Flare Suppression. Anal. Chem. 2013, 85 (18), 8513−8517. (18) Hauser, N. C.; Martinez, R.; Jacob, A.; Rupp, S.; Hoheisel, J. D.; Matysiak, S. Utilising the Left-Helical Conformation of L-DNA for Analysing Different Marker Types on a Single Universal Microarray Platform. Nucleic Acids Res. 2006, 34 (18), 5101−5111. (19) Liu, Z.-C.; Shin, D.-S.; Shokouhimehr, M.; Lee, K.-N.; Yoo, B.W.; Kim, Y.-K.; Lee, Y.-S. Light-Directed Synthesis of Peptide Nucleic Acids (PNAs) Chips. Biosens. Bioelectron. 2007, 22 (12), 2891−2897. (20) Yang, F.; Dong, B.; Nie, K.; Shi, H.; Wu, Y.; Wang, H.; Liu, Z. Light-Directed Synthesis of High-Density Peptide Nucleic Acid Microarrays. ACS Comb. Sci. 2015, 17 (10), 608−614. (21) Nie, B.; Yang, M.; Fu, W.; Liang, Z. Surface Invasive Cleavage Assay on a Maskless Light-Directed Diamond DNA Microarray for Genome-Wide Human SNP Mapping. Analyst 2015, 140 (13), 4549− 4557. (22) Wu, C.-H.; Holden, M. T.; Smith, L. M. Enzymatic Fabrication of High Density RNA Arrays. Angew. Chem., Int. Ed. 2014, 53 (49), 13514−13517. (23) Markham, N. R.; Zuker, M. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res. 2005, 33, W577−W581. (24) Agaian, S. S.; Sarukhanyan, H. G.; Egiazarian, K. O.; Astola, J. Hadamard Transforms; SPIE Press: 2011; Vol. PM207, p 520.

nt, so that reads 10 bases or shorter produce data consistent with 1487164 or 1.27 × 1099901 possible array designs. SBH with 11 nt read lengths would require over 4 million hybridization events, rendering decryption impractical. Developing a defense against SBH is a significant step toward adapting arrays as an anticounterfeiting tool. Though it is not possible to rule out future techniques that could decrypt such arrays, they could still function as effective deterrents if the cost of decryption exceeds the value of a particular good. Even arrays with a few thousand sequences may be impractical to decrypt, suggesting a new application for millichip technology and related polymeric array substrates.13,14,25 The sequence design strategy described here is quite general, applies to any nucleic acid-based polymer, and could conceivably extend to other array fabrication platforms. Any oligonucleotide chemistry that produces a discontinuity in a long sequence, such as a long spacer or inversion of polarity, would likely achieve the same result as placing two sequences within a feature. This would allow SBH-resistant arrays to be made using DMT-protected phosphoramidites on inkjet platforms. Spotted array platforms may be especially useful as well because they can utilize oligonucleotides of a length beyond the limits of in situ chemistry. This would enable blocks of high l, more complicated block arrangement schemes using longer strings, or features comprised of three or more distinct sequences for enhanced levels of security.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscombsci.9b00088. Materials and methods (PDF) Calculated melting temperature raw data (XLSX) Table of array sequences (XLSX)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Matthew T. Holden: 0000-0001-5430-7268 Author Contributions

All authors have given approval to the final version of the manuscript. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by National Institutes of Health grants 1RO1GM108727, 1RO1GM109099, and 5T32GM08349.



REFERENCES

(1) Erlich, Y.; Zielinski, D. DNA Fountain Enables a Robust and Efficient Storage Architecture. Science 2017, 355 (6328), 950−954. (2) Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angew. Chem., Int. Ed. 2015, 54 (8), 2552−2555. (3) Church, G. M.; Gao, Y.; Kosuri, S. Next-Generation Digital Information Storage in DNA. Science 2012, 337 (6102), 1628. E

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Letter

ACS Combinatorial Science (25) Holden, M. T.; Carter, M. C. D.; Ting, S. K.; Lynn, D. M.; Smith, L. M. Parallel DNA Synthesis on Poly(ethylene terephthalate). ChemBioChem 2017, 18 (19), 1914−1916.

F

DOI: 10.1021/acscombsci.9b00088 ACS Comb. Sci. XXXX, XXX, XXX−XXX